Monday 19 January
|
Keynote Session
Date: Monday 19 January
Time: 9:30 AM - 11:30 AM
Session Chairs: Bernice E. Rogowitz, IBM Thomas J. Watson Research Ctr.; Thrasyvoulos N. Pappas, Northwestern Univ.
|
Towards a true spherical camera
Paper 7240-61
Time: 9:30 AM - 10:10 AM
Author(s): Guru Krishnan, Shree K. Nayar, Columbia Univ. (United States)
|
|
 |
Behavioral and neural correlates of visual preference decision
Paper 7240-62
Time: 10:10 AM - 10:50 AM
Author(s): Shinsuke Shimojo, California Institute of Technology (United States)
|
 |
Perceptual experiments on the Web
Paper 7240-63
Time: 10:50 AM - 11:30 AM
Author(s): Ken Nakayama, Harvard Univ. (United States)
|
 |
Lunch Break 11:30 AM - 1:00 PM
|
Session 2:
Social Software, Internet Experiments, and New Paradigms for the Web
Date: Monday 19 January
Time: 1:00 PM
- 2:30 PM
Session Chair: Jeffrey B. Mulligan, NASA Ames Research Ctr.
|
Thousands of on-line observers is just the beginning
(Invited Paper)
Paper 7240-64
Time: 1:00 PM - 1:30 PM
Author(s): Nathan Moroney, Hewlett-Packard Labs. (United States)
|
 |
Presentation of calibrated images over the Web
Paper 7240-82
Time: 1:30 PM - 1:50 PM
Author(s): Jeffrey B. Mulligan, NASA Ames Research Ctr. (United States)
|
 |
Tagging, micro-tagging, and tag editing: using the wisdom of the crowds to improve metadata on shared content
Paper 7240-65
Time: 1:50 PM - 2:10 PM
Author(s): Mercan Topkara, Bernice E. Rogowitz, IBM Thomas J. Watson Research Ctr. (United States)
|
Show Abstract
|
Social tagging is an emerging methodology that allows individual users to assign semantic keywords to content on the web. Popular web services allow the community of users to search for content based on these user-defined tags. Tags are typically attached to a whole entity such as a web page (e.g., del.icio.us), a video (e.g., YouTube), a product description (e.g., Amazon) or a photograph (e.g., Flickr). However, finding specific information within a whole entity can be a difficult, time-intensive process. This is especially true for content such as video, where the information sought may be a small segment within a very long presentation. Moreover, the tags provided by a community of users may be incorrect, conflicting, or incomplete when used as search terms. In this paper we introduce a system that allows users to create "micro-tags," that is, semantic markers that are attached to subsets of information. These micro-tags give the tagger the ability to direct attention to specific subsets within a larger and more complex entity, and the set of micro-tags provides a more nuanced description of the full content. Also, when these micro-tags are used as search terms, there is no need to do a serial search of the content, since micro-tags draw attention to the semantic content of interest. This system also provides a mechanism that allows users in the community to edit and delete each others' tags, using the "wisdom of the crowds" to refine and improve tag quality. We will also report on empirical studies that demonstrate the value of micro-tagging and tag editing and will describe various applications.
|
 |
Internet experiments: methods, guidelines, metadata
(Invited Paper)
Paper 7240-88
Time: 2:10 PM - 2:30 PM
Author(s): Ulf-Dietrich Reips, Univ. Zürich (Switzerland)
|
|
Show Abstract
|
Methods for Internet-based research are currently one of the hot areas in methodology. Within fourteen years since the first Internet experiment was created, the field has seen a massive increase in the number of studies conducted on the Internet, marking a grass-roots change in how psychological research is done (for examples of Internet experiments see the web experiment list at http://genpsylab-wexlist.unizh.ch/). What do we now know about the new method? Several Internet research methods and guidelines will be presented, such as non-obvious file naming, the seriousness check technique, the multiple site entry technique, the high hurdle technique, and methods for dropout analysis. Results from a ten year Internet experiment and metadata from Web services for Internet experimentation will be described.
|
 |
Session 3:
Multimodal Interactive Environments
Date: Monday 19 January
Time: 2:30 PM
- 5:50 PM
Session Chair: Huib de Ridder, Technische Univ. Delft (Netherlands)
|
Ecological optics of natural materials and light fields
(Invited Paper)
Paper 7240-66
Time: 2:30 PM - 3:00 PM
Author(s): Sylvia Pont, Technische Univ. Delft (Netherlands)
|
|
Show Abstract
|
The appearance of objects in scenes is determined by their shape, material properties and by the light field, and, in contradistinction, the appearance of those objects provides us with cues about the shape, material properties and light field. The latter so-called inverse problem is underdetermined and therefore suffers from interesting ambiguities. Therefore, interactions in the perception of shape, material, and luminous environment are bound to occur. Textures of illuminated rough materials depend strongly on the illumination and viewing directions. Luminance histogram-based measures such as the average luminance, its variance, shadow and highlight modes, and the contrast provide robust estimates with regard to the surface structure and the light field. Human observers\u2019 performance agrees well with predictions on the basis of such measures. If we also take into account the spatial structure of the texture it is possible to estimate the illumination orientation locally. Image analysis on the basis of second order statistics and human observers\u2019 estimates correspond well and are both subject to the bas-relief and the convex-concave ambiguities. The systematic robust illuminance flow patterns of local illumination orientation estimates on rough 3D objects are an important entity for shape from shading and for light field estimates. Human observers are able to match and discriminate simple light field properties (e.g. average illumination direction and diffuseness) of objects and scenes, but they make systematic errors, which depend on material properties, object shapes and position in the scene. Moreover, our results show that perception of material and illumination are basically confounded.
|
 |
Stereoscopic displays in medical domains: a review of perception and performance effects
Paper 7240-84
Time: 3:00 PM - 3:20 PM
Author(s): Maurice van Beurden, Wijnand A. Ijsselsteijn, Technische Univ. Eindhoven (Netherlands); Gert van Hoey, Barco N.V. (Belgium); Harry Hatzakis, Biotronics3D (United Kingdom)
|
 |
Roughness in sound and vision
Paper 7240-67
Time: 3:20 PM - 3:40 PM
Author(s): Rene van Egmond, Paul Lemmens, Technische Univ. Delft (Netherlands); Thrasyvoulos N. Pappas, Northwestern Univ. (United States); Huib de Ridder, Technische Univ. Delft (Netherlands)
|
 |
Coffee Break 3:40 PM - 4:10 PM
|
Sign language perception research for improving automatic sign and gesture recognition
Paper 7240-33
Time: 4:10 PM - 4:30 PM
Author(s): Gineke A. ten Holt, Huib de Ridder, Andrea J. Koenderink-van Doorn, Marcel J. T. Reinders, Emile A. Hendriks, Technische Univ. Delft (Netherlands)
|
|
Show Abstract
|
This paper describes a number of sign perceptual experiments and how they are combined to gain more insight into sign language perception. It also discusses the practical application of these insights in the field of automatic sign language recognition (ASLR). Results include the fact that not all phases within a sign are equally informative. Certain phases could be discarded, which is benecial for ASLR both in terms of computational load and because there is less variation to handle. Another important insight is that there is no clear, objective definition for allowed variation within a sign: signers differ in their strictness. Also, the criteria of ASLR algorithms for correctness of a sign differ from human criteria, even when an algorithm gives good recognition.
|
 |
Quantifying the effect of disruptions to temporal coherence on the intelligibility of compressed American Sign Language video
Paper 7240-32
Time: 4:30 PM - 4:50 PM
Author(s): Frank M. Ciaramello, Sheila S. Hemami, Cornell Univ. (United States)
|
|
Show Abstract
|
Real-time, two-way transmission of American Sign Language (ASL) video over cellular networks has the potential to significantly benefit members of the Deaf community. Unfortunately, at the rates provided by current networks, ASL videos encoded using techniques designed to maximize fidelity yield sign language sequences that are unintelligible. These low bandwidth constraints are often met by increasing quantization step size or by reducing the frame rate. Understanding how these reductions in fidelity affect an observer?s comprehension of the conversation is an essential component both to objectively evaluating coded sign language video and to developing appropriate compression algorithms. As an extension of the author's previous work on spatial distortions, this paper quantifies the effect of temporal artifacts on sign language intelligibility. These artifacts can be the result of either frame rate reductions or motion-compensation residuals that distract the observer. A subjective study was performed in which fluent ASL participants rated the intelligibility of sequences encoded at a range of 5 different frame rates and 3 different levels of spatial quality. The subjective data is used to parameterize an objective intelligibility measure which is highly correlated with subjective ratings across all frame rates. This measure is incorporated into an H.264 rate-distortion optimization algorithm and achieves as high as 50% reduction in bitrate without impacting intelligibility.
|
 |
Virtual microscopy: merging of computer mediated communication and intuitive interfacing
Paper 7240-69
Time: 5:10 PM - 5:30 PM
Author(s): Huib de Ridder, Technische Univ. Delft (Netherlands); Johanna G. de Ridder-Sluiter, Dutch Child Oncology Group (Netherlands); Philip H. Kluin, Univ. Medical Ctr. Groningen (Netherlands); Henri H.C.M. Christiaans, Technische Univ. Delft (Netherlands)
|
|
Show Abstract
|
Ubiquitous computing (or Ambient Intelligence) is an upcoming technology that is usually associated with futuristic smart environments in which information is available anytime anywhere and with which humans can interact in a natural, multimodal way. However spectacular the corresponding scenarios may be, it is equally challenging to consider how this technology can enhance existing situations. This will be illustrated by a case study from the Dutch medical field: central quality reviewing for pathology in child oncology. The main goal of the review is to assess the quality of the diagnosis based on patient material. The sharing of knowledge in social live interaction during such meeting is an important advantage. At the same time there is the disadvantage that the experts from the seven Dutch academic hospitals have to travel to the review meeting and that the required logistics to collect and bring patient material and data to the meeting is cumbersome and time-consuming. This paper focuses on how this time-consuming, non-efficient way of reviewing can be replaced by a virtual collaboration system by merging technology supporting Computer Mediated Collaboration and intuitive interfacing. This requires insight in the preferred way of communication and collaboration as well as knowledge about preferred interaction style with a virtual shared workspace.
|
 |
A model of memory for incidental learning
Paper 7240-50
Time: 5:30 PM - 5:50 PM
Author(s): Roger A. Browse, Lisa Y. Drewell, Queen's Univ. (Canada)
|
|
Show Abstract
|
This paper describes a radial-basis memory system that is used to model the performance of human participants in a task of learning to traverse mazes in a virtual environment. The memory model is a multiple-trace system, in which each event is stored as a separate memory trace. In the modeling of the maze traversal task, the events that are stored as memories are the perceptions and decisions taken at the intersections of the maze. As the virtual agent traverses the maze, it makes decisions based upon all of its memories, but those that match best to the current perceptual situation, and which were successful in the past, have the greatest influence. As the agent carries out repeated attempts to traverse the same maze, memories of successful decisions accumulate, and performance gradually improves. The system uses only three free parameters, which include the variance of the underlying Gaussian used as the radial-basis function. It is demonstrated that adjustments of these parameters can easily result in exact modeling of the average human performance in the same task, and that variation of the parameters matches the variation in human performance. We conclude that human memory interaction that does not involve conscious control, as in learning navigation routes, may be much more primitive and more simply explained than has been previously thought.
|
 |
Human Vision and Electronic Imaging Banquet
Date: Monday 19 January
Time: 7:30 PM - 10:00 PM
The Perception of Pictures
Banquet Speaker: Martin S. Banks, Univ. of California, Berkeley
|
Tuesday 20 January
|
|
|
Session 4:
Haptics
Date: Tuesday 20 January
Time: 10:30 AM
- 12:40 PM
Session Chairs: Bernice E. Rogowitz, IBM Thomas J. Watson Research Ctr.; Thrasyvoulos N. Pappas, Northwestern Univ.
|
The interaction of vision and haptics during the perception of 3D shape
(Proceedings only)
Paper 7240-70
Time: 10:30 AM - 11:00 AM
Author(s): Flip Phillips, Eric Egan, Skidmore College (United States)
|
|
No abstract available
|
|
 |
Haptics cuing
(Invited Paper)
Paper 7240-71
Time: 11:00 AM - 11:30 AM
Author(s): Hong Z. Tan, Purdue Univ. (United States)
|
|
|
 |
Psychophysical evaluation of a variable friction tactile interface
(Invited Paper)
Paper 7240-73
Time: 11:30 AM - 12:00 PM
Author(s): Evren Samur, J. Edward Colgate, Michael A. Peshkin, Northwestern Univ. (United States)
|
|
|
 |
Perceptual dimensions for a dynamic tactile display
Paper 7240-74
Time: 12:00 PM - 12:20 PM
Author(s): Vivien Tartter, City College/CUNY (United States); Thrasyvoulos N. Pappas, Northwestern Univ. (United States)
|
|
|
 |
Haptics disambiguates vision in the perception of pictorial relief
Paper 7240-75
Time: 12:20 PM - 12:40 PM
Author(s): Maarten W. A. Wijntjes, Technische Univ. Delft (Netherlands); Robert Volcic, Westfaelische Wilhelms-Univ. (Germany); Jan J. Koenderink, Sylvia C. Pont, Technische Univ. Delft (Netherlands); Astrid M. L. Kappers, Univ. Utrecht (Netherlands)
|
|
|
 |
Lunch/Exhibition Break 12:40 PM - 2:00 PM
|
Session 5:
High Dynamic Range
Date: Tuesday 20 January
Time: 2:00 PM
- 5:30 PM
Session Chair: John J. McCann, McCann Imaging
|
Dynamic range of visual activities of space
Paper 7240-76
Time: 2:00 PM - 2:20 PM
Author(s): Albert J. Ahumada, Jr., Mary K. Kaiser, Jeffrey B. Mulligan, NASA Ames Research Ctr. (United States)
|
|
|
 |
Adaptive display of high-dynamic range images
Paper 7240-11
Time: 2:20 PM - 2:40 PM
Author(s):
|
|
|
 |
Exploring eye movements for tone mapped images.
Paper 7240-42
Time: 2:40 PM - 3:00 PM
Author(s): Marina Bloj, Glen Harding, Univ. of Bradford (United Kingdom); Alan Chalmers, Univ. of Warwick (United Kingdom)
|
|
Show Abstract
|
In the real world we can find large intensity ranges: the ratio from the brightest to the darkest part of the scene can be of the order of 10 000 to 1. Since most of our electronic displays have a limited range of around 100 to 1, the last 20 years has seen much work done to develop different algorithms (or tone mappers) that compress the actual dynamic range of an image to that available in the display device. An increasing amount of research has also been done to try to evaluate the \u2018best\u2019 tone mapper. There is evidence that the spatial and chronological path of fixations made by observers\u2019 when viewing an image (i.e. the scanpath) is repeated to some extent when the same image is again presented to the observer. In this paper we are the first to investigate the potential of using eye movement recordings, particularly scanpaths, as a discriminatory tool. We propose that if a tone-mapped image gives rise to scanpaths that are different from those obtained when viewing the original image this might be an indication of a poor quality tone mapper since it is eliciting eye movements that are different from those observed when viewing the original image.
|
 |
SS-SSIM and MS-SSIM for digital cinema applications
Paper 7240-13
Time: 3:00 PM - 3:20 PM
Author(s): Fitri N. Rahayu, Ulrich Reiter, Norwegian Univ. of Science and Technology (Norway); Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland); Andrew Perkis, Peter Svensson, Norwegian Univ. of Science and Technology (Norway)
|
|
Show Abstract
|
One of the key issues for a successful roll out of Digital Cinema in the market is in the service assurance of the quality it offers. The most practical way of measuring this quality is to use an objective metric. Currently, the most widely used and popular objective metrics are peak signal-to-noise ratio (PSNR) and mean squared error (MSE). However, it is known that these metrics do not correlate well with how humans perceive visual quality. Single-Scale SSIM (SS-SSIM) and Multi-Scale SSIM (MS-SSIM), objective metrics introduced by Wang and Bovik, have shown a good correlation with perceived quality. Therefore, SS-SSIM and MS-SSIM have a big potential to also become the perceptual metrics for measuring perceived quality in Digital Cinema applications. Our goal is to design SS-SSIM and MS-SSIM metrics with input parameters that take into account the Digital Cinema source material characteristics (resolution, dynamic range, frame rate / motion, content) and viewing conditions. These metrics are then utilized to measure the perceived quality of high quality digital imagery. To validate and to confirm the results, these are compared with the PSNR and with a subjective evaluation/assessment carried out by human observers in a DCI specified movie theater environment.
|
 |
Measuring perceptual contrast in a multi-level framework
Paper 7240-5
Time: 3:20 PM - 3:40 PM
Author(s): Gabriele Simone, Marius Pedersen, Jon Yngve Hardeberg, Gjøvik Univ. College (Norway); Alessandro Rizzi, Univ. degli Studi di Milano (Italy)
|
|
Show Abstract
|
In this paper, we propose and discuss new approaches for measuring perceptual contrast in digital images. We improve previous algorithms by using different local measures of contrast and a parameterized way to recombine local contrast maps and color channels. We propose the idea of recombining the local contrast maps and the channels using particular measures taken from the image itself as weighting parameters. Exhaustive tests and results are presented and discussed, in particular we capture the performance of each algorithm in relation to perceived contrast by observers. Current results clearly show a considerable improvement in correlation between contrast measures and observers perceived contrast when the variance of the three color channels separately is used as weighting parameter for local contrast maps.
|
 |
Coffee Break 3:40 PM - 4:10 PM
|
A perceptual evaluation of 3D unsharp masking
Paper 7240-34
Time: 4:10 PM - 4:30 PM
Author(s): Matthias B. Ihrke, Max-Planck-Institut für Dynamik und Selbstorganisation (Germany) and Bernstein Ctr. for Computational Neuroscience (Germany); Tobias Ritschel, Kaleigh Smith, Thorsten Grosch, Karol Myszkowski, Hans-Peter Seidel, Max-Planck-Institut für Informatik (Germany)
|
|
Show Abstract
|
Enhancing the contrast of images and scenes in a way that aids to their understanding and interpretation as well as increases their visual appeal is a worthwhile and challenging task. In our study, we evaluated an algorithm recently proposed by Ritschel et al. (2008) that provides a general technique for enhancing the perceived contrast in synthesized scenes and is based on a perceptual effect, the Cornsweet-Illusion. Participants were asked to adjust the strength of the enhancement until (i) enhancement was just visible, (ii) enhancement was objectionable and (iii) enhancement was optimal, in a direct-comparison task. The experiment featured four different scenes and a highly standardized experimental setup. We found that all participants preferred enhanced images over the originals and that artifacts appeared only for comparatively large values. The crucial parameter was strength of the enhancement rather than the gradient size. Furhermore, the results indicate a general pattern over all scenes for selecting adequate values for the enhancement strength. A value twice the visibility-threshold was typically rated as preferred while an enhancement of four times this threshold was mostly perceived as objectionable. We conclude that our results confirm the hypothesis that 3D unsharp-masking increases the perceived contrast of a scene in a way that is generally perceived as attractive and coherent.
|
 |
Objective evaluation of tone mapping operator parameters
Paper 7240-41
Time: 4:30 PM - 4:50 PM
Author(s): Tunc O. Aydin, Karol Myszkowski, Hans-Peter Seidel, Max-Planck-Institut für Informatik (Germany)
|
|
Show Abstract
|
Current methods of tone mapping operator evaluation rely on various rating and ranking experiments that require the involvement of test subjects. In these experiments, a large amount of experimental data on a relatively large set of images gathered from multiple subjects is needed to make correct predictions. Each subject typically performs a lengthy evaluation depending on the number of images and tone mapping operators being tested, resulting in prohibitably expensive studies. To cope with the already large number of trials, all current methods we know of have been limited to a single parameter configuration for each tone mapping operator. But in most cases the resulting tone mapped image is strongly affected by the operators parameters, and a single configuration is not representative. We present an objective approach to tone mapping operator evaluation that relies on a perceptual image distortion metric instead of subjective experiments. The main advantage of our method is that the evaluation process is significantly more efficient because we rule out out human involvement, allowing us to examine the effect of multiple tone mapping operator parameters.
|
 |
Influence of surround luminance upon perceived blackness
Paper 7240-6
Time: 4:50 PM - 5:10 PM
Author(s): Tetsuya Eda, Yoshiki Koike, Sakurako Matsushima, Koichi Ozaki, Miyoshi Ayama, Utsunomiya Univ. (Japan)
|
|
Show Abstract
|
An appreciation of shadows, darkness is one of the traditional beauty senses of Japanese. In the reproduction of scenes and objects using digital images without losing artistic quality, the role of blackness and the way to express blackness are so important that they should be studied from the view points of art, color perception, and imaging technology. In this study, we investigated how the luminance ratio of the surround field (Ls) to that of the central field (Lc) influence the perceived blackness of the central field in a simple configuration of concentric circle (Experiment 1) and in digital images of masterpieces (Experiment 2). Results of Experiment 1 showed that perceived blackness of the central field becomes more blackish and deeper as the contrast between Lc and Ls increases. Results of Experiment 2 showed that perceived blackness of black area surrounded by relatively bright area in artistic images is stronger than the perceived blackness given by the same luminance contrast between the center and surround in a concentric circular configuration.
|
 |
Preservation of edges: the mechanism for improvements in HDR imaging
Paper 7240-39
Time: 5:10 PM - 5:30 PM
Author(s): John J. McCann, McCann Imaging (United States); Alessandro Rizzi, Univ. degli Studi di Milano (Italy)
|
|
Show Abstract
|
There are a number of modern myths about High Dynamic Range (HDR) imaging. There have been claims that multiple-exposure techniques can accurately record scene luminances over a dynamic range of more than a million to one. There are assertions that human appearance tracks the same range. The most common myth is that HDR imaging accurately records and reproduces actual scene radiances. Regardless, there is no doubt that HDR imaging is superior to conventional imaging. We need to understand the basis of HDR image quality improvements. This paper shows that multiple exposure techniques can preserve spatial information, although they cannot record accurate scene luminances. Synthesizing HDR renditions from relative spatial records accounts for improved images.
|
 |
When are HDR Images better than Conventional Images?
Date: Tuesday 20 January
Time: 6:00 PM
- 7:00 PM
Panel Moderators: John J. McCann, McCann Imaging; Albert J. Ahumada, Jr., NASA Ames Research Ctr.; Marina Bloj, Univ. of Bradford (United Kingdom); James O. Larimer, NASA Ames Research Ctr.; Karol Myszkowski, Max-Planck-Institut für Informatik (Germany); Alessandro Rizzi, Univ. degli Studi di Milano (Italy); Sabine E. Süsstrunk, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
|
Interactive Paper and Symposium Demonstration Session
Date: Tuesday 20 January
Time: 6:00 PM
- 8:30 PM
|
Model validation of channel zapping quality
Paper 7240-31
Author(s): Robert E. Kooij, TNO TPD (Netherlands) and University of Technology Delft (Netherlands); Floris Nicolai, University of Technology Delft (Netherlands); Kanal Ahmed, TNO TPD (Netherlands); Kjell E. Brunnström, Acreo AB (Sweden)
|
|
Show Abstract
|
In an earlier paper we showed, that perceived quality of channel zapping is related to the perceived quality of download time of web browsing, as suggested by ITU-T Rec.G.1030. We showed this by performing a subjective test resulting in an excellent fit with a 0.99 correlation. This was a what we call a lean forward experiment and gave the rule of thumb result that the zapping time must be less than 0.43 sec to be good ( > 3.5 on the MOS scale). To validate the model we have done new subjective experiments. These experiments included lean backwards zapping i.e. sitting in a sofa with a remote control. The subjects are more forgiving in this case and the requirement could be relaxed to 0.67 sec. We also conducted subjective experiments where the zapping times are varying. We found that the MOS rating decreases if zapping delay times are varying. In our experiments we assumed uniformly distributed delays, where the variance cannot be larger than the mean delay. We found that in order to obtain a MOS rating of at least 3.5, that the maximum allowed variance, and thus also the maximum allowed mean zapping delay, is 0.46 s.
|
 |
Application of a visual model to the design of an ultra-high definition up-scaler
Paper 7240-54
Author(s): Jon M. Speigle, Dean S. Messing, Scott J. Daly, Sharp Labs. of America, Inc. (United States)
|
|
No abstract available
|
|
 |
Hyperbolic modeling for metaphorical processing and visual computations
Paper 7240-79
Author(s): Hawley K. Rising III, Sony Electronics Inc. (United States)
|
|
|
 |
Visual harmony and image statistics: an empirical investigation
Paper 7240-59
Author(s): Elena A. Fedorovskaya, Wei Hao, Carman Neustaedter, Eastman Kodak Co. (United States)
|
|
Show Abstract
|
Visual harmony is an aspect of aesthetic appeal which can be defined as the pleasing or congruent arrangement of image parts producing internal calm or tranquility. It is thought to be related to balance and equilibrium in a composition determined by the choice of elements, such as shapes, lines, textures, values and colors, and their arrangement. In a series of experiments, we collected subjective judgments of image harmony and other overall perceived attributes for images representative of typical consumer photography. The ratings were used to reveal underlying statistical properties and semantic factors that can explain peoples\u2019 perception of harmony. We also analyzed a much larger database of images to determine patterns of color combinations and spatial configuration with the probabilities of their occurrences to hypothesize whether the characteristics representing typical living environments affect individual harmony response. The results demonstrate the potential of this approach for understanding visual harmony and aesthetic appeal.
|
 |
Sketch recognition robust to the sketch order
|
|
|
 |
Facilitation of listening comprehension by visual information under noisy listening condition
Paper 7240-9
Author(s): Chiho Kashimada, Kazuki Ogita, Hiroshi Hasegawa, Kazuo Kamata, Miyoshi Ayama, Utsunomiya Univ. (Japan)
|
|
Show Abstract
|
This paper investigated the influence of asynchronization between the visual and auditory information of talking head movies under conditions that difficult to correctly recognize the speech contents. In the experiment, talking headmovies of a female announcer uttering sentences that consist of five words were used as the auditory-visual stimuli. We presented the stimuli under the following thirteen conditions of the delay time between auditory and visual stimuli: 0, ±1,±2, ±4, ±8, ±16, ±32 frames (1 frame = 1/30 s, +: visual preceding condition, \u2212: audio preceding condition) and the condition of audio only (no visual stimulation). We evaluated the word recognition rates under S/N conditions of \u221210 dB and \u221215 dB when the utterance speed was 120 or 150 ms/mora. As a result of comparing the recognition rates between under the conditions of audio only and each delay time, the visual stimulus helped increase the recognition rate within ±4 frames. On the other hand, when the delay time was large (over ±8 frames), the visual stimulus did not help to improve the recognition rate. Moreover, we evaluated the word recognition rates depending on the sentence structure (word order). It is found that the recognition rates were extremely high at the first word or at the subject word of the sentences, and that the recognition rates were high at the word order used for news reading in Japanese well (when / where / who / what / how).
|
 |
Quantifying the image-sticking phenomenon for checkerboard stimuli: contrast, spatial frequency, edge effect, and noise interference
Paper 7240-89
Author(s): Jung-Chih Su, Industrial Technology Research Institute (Taiwan)
|
|
|
 |
Wednesday 21 January
|
 |
Session 6:
Video Perception and Quality
Date: Wednesday 21 January
Time: 9:30 AM
- 12:00 PM
Session Chair: Sheila S. Hemami, Cornell Univ.
|
HVS-based quantization steps for validation of digital cinema extended bitrates
Paper 7240-27
Time: 9:30 AM - 9:50 AM
Author(s): Chaker M. Larabi, Univ. de Poitiers (France); Pascal Pellegrin, Univ. Catholique de Louvain (Belgium); Olivier Tulet, Univ. de Poitiers (France); Pedro Correa, Univ. Catholique de Louvain (Belgium); Ghislain Anciaux, Univ. de Poitiers (France); Parvatha Elangovan, Benoît Macq, Univ. Catholique de Louvain (Belgium)
|
|
Show Abstract
|
In Digital Cinema, the video compression must be as transparent as possible to provide the best image quality to the audience. The goal of compression is to simplify transport, storing, distribution and projection of films. For all those tasks, equipments need to be developed. It is thus mandatory to reduce the complexity of the equipments by imposing limitations in the specifications.
|
 |
Statistics of natural image sequences: temporal motion smoothness by local phase correlations
Paper 7240-37
Time: 9:50 AM - 10:10 AM
Author(s): Zhou Wang, Univ. of Waterloo (Canada); Qiang Li, The Univ. of Texas at Arlington (United States)
|
|
Show Abstract
|
Statistical modeling of natural image sequences is of fundamental importance to both the understanding of biological visual systems and the development of Bayesian approaches for solving a wide variety of machine vision and image processing problems. Previous methods are based on measuring spatiotemporal power spectra and by optimizing the best linear filters to achieve independent or sparse representations of the time-varying image signals. Here we propose a different approach, in which we investigate the temporal variations of local phase structures in the complex wavelet transform domain. We observe that natural image sequences exhibit strong prior of temporal motion smoothness, by which local phases of wavelet coefficients can be well predicted from their temporal neighbors. We also observe that such statistical regularity becomes stronger with the increase of the magnitude of the complex wavelet coefficients. We study how such a novel statistical regularity is interfered with \u201cunnatural\u201d distortions in image sequences, which include line jittering, frame jittering, frame dropping, noise contamination, and blurring. The temporal motion smoothness prior as well as the proposed measurement approach may be applied in many real world problems. We demonstrate its potentials for reduced-reference video quality assessment.
|
 |
Motion-based perceptual quality assessment of video
Paper 7240-56
Time: 10:10 AM - 10:30 AM
Author(s): Kalpana Seshadrinathan, Alan C. Bovik, The Univ. of Texas at Austin (United States)
|
|
Show Abstract
|
Monitoring the quality of videos plays a critical role in maintaining quality of service in networked video applications and there is interest in automatic methods to evaluate the perceptual quality of digital video. Existing algorithms to measure video quality focus on capturing spatial degradations in the video signal, and are inadequate at capturing temporal degradations in videos. However, motion plays an important role in human perception of video. Additionally, several commonly occurring distortions in video such as motion compensation mismatch, jitter and ghosting have a temporal component. It is critical that objective video quality assessment algorithms are able to account for the perceptual effects of both spatial and temporal distortions. We seek to address this by developing a full reference framework and an algorithm for video quality assessment known as the MOtion-based Video Integrity Evaluation (MOVIE) index. MOVIE integrates both spatial and temporal aspects of distortion assessment. MOVIE captures temporal distortions in the video by evaluating the quality of the distorted video along the motion trajectories of the reference video. MOVIE is shown to deliver scores that correlate quite closely with human subjective judgment, using the Video Quality Expert Group (VQEG) FR-TV Phase 1 database as a test bed.
|
 |
Coffee Break 10:30 AM - 11:00 AM
|
No reference perceptual quality metrics: approaches and limitations
Paper 7240-2
Time: 11:00 AM - 11:20 AM
Author(s): David S. Hands, Damien Bayart, Andrew Davis, Alex Bourret, British Telecommunications plc (United Kingdom)
|
|
Show Abstract
|
Objective quality measurement may be performed using a number of alternative techniques. To predict subjective quality it is necessary to develop and validate approaches that accurately predict video quality. For perceptual quality models, developers have implemented methods that utilise information from both the original and the processed signals (full reference and reduced reference methods). For many practical applications, no reference (NR) methods are required. It has been a major challenge for developers to produce no reference methods that attain the necessary predictive performance for the methods to be deployed by industry. In this paper, we present a comparison between no reference methods operating on either the decoded picture information alone or using a bit-stream / decoded picture hybrid analysis approach. Two NR models are introduced: one using decoded picture information only; the other using a hybrid approach. Validation data obtained from a subjective quality test is used to examine the predictive performance of both models. The strengths and limitations of both methods are discussed.
|
 |
Subjective video quality assessment methods for recognition tasks
Paper 7240-1
Time: 11:20 AM - 11:40 AM
Author(s): Carolyn G. Ford, Mark McFarland, Irena Stange, Institute of Telecommunication Sciences (United States)
|
|
Show Abstract
|
In order to develop accurate objective measurements for video quality assessment, subjective data must be collected via human subject testing. The ITU has a series of Recommendations that address methodology for performing subjective tests in a rigorous manner. These methods are targeted at the entertainment application of video. However, video is often used for many applications outside of the entertainment sector, and generally this class of video is used to perform a specific task. Examples of these applications include security, Public Safety, remote command and control, and sign language, in which the video is used to recognize objects, people or events. The existing methods, developed to assess a person\u2019s perceptual opinion of quality, are not appropriate for task-based video. The Institute for Telecommunication Sciences (ITS), under a program from the Department of Homeland Security and the National Institute for Standards and Technology\u2019s Office of Law Enforcement (NIST/OLES) have developed a subjective test method to determine a person\u2019s ability to perform recognition tasks using video, thereby rating the quality according to the usefulness of the video quality within its application. This new method will be presented, along with the results of two rounds of subjective testing using this method.
|
 |
Image utility assessment and a relationship with image quality assessment
Paper 7240-49
Time: 11:40 AM - 12:00 PM
Author(s): David M. Rouse, Cornell Univ. (United States); Romuald Pepion, Univ. de Nantes (France); Sheila S. Hemami, Cornell Univ. (United States); Patrick Le Callet, Univ. de Nantes (France)
|
|
Show Abstract
|
Natural images are meaningful to humans. Distorting a reference image through additional processing (e.g., via enhancement or compression) affects the usefulness of the information conveyed by the processed image. This paper explores the utility assessment task, which aims to quantify the usefulness of a processed image as a surrogate for a reference image. The utility assessment task quantifies an observer's understanding of the content of a processed natural image. This paper presents results from two experiments that investigate the relationship between quality assessment and utility assessment tasks. In the quality assessment task, observers evaluate processed natural images in terms of their perceptual resemblance to a reference natural image. Image utility is not directly related to image quality: lower quality images are shown to have the same utility as higher quality images. Full-reference image assessment algorithms have been evaluated for the quality assessment task but not the utility assessment task. Algorithms suitable to the quality assessment task implicitly assess utility insofar as an image of high quality is also of high utility. To predict image utility, this paper evaluates the ability of several full-reference image assessment algorithms as well as a new approach based on the degradation of image contours.
|
 |
Lunch/Exhibition Break 12:00 PM - 1:30 PM
|
Session 7:
Region of Interest, Sharpness and Blurring
Date: Wednesday 21 January
Time: 1:30 PM
- 3:10 PM
Session Chair: Scott J. Daly, Sharp Labs. of America, Inc.
|
Optimal region-of-interest-based visual quality assessment
Paper 7240-17
Time: 1:30 PM - 1:50 PM
Author(s): Ulrich Engelke, Hans-Jürgen Zepernick, Blekinge Tekniska Högskola (Sweden)
|
|
Show Abstract
|
Visual content typically exhibits regions-of-interest (ROI) that particularly attract the viewer\u2019s attention. In the context of visual quality one may expect that quality distortions occurring in the ROI are perceived more annoyingly than distortions in the background. This is particularly true given that the human visual system is highly space variant in sampling visual signals, with the highest accuracy in the central point of focus, the fovea. However, this regional attention to distortions is only seldom taken into account in visual quality metric design. In this paper, we apply region-based quality assessment based on extraction of structural information from the image. Subjective experiments have been conducted both to support the metric design and validation and also to identify ROI in a set of reference images. Multiobjective optimization has been applied to determine the optimal weighting of the metrics computed independently on the ROI and the background. It turns out that the ROI based metric design strongly increases the quality prediction performance of the metric in terms of prediction accuracy and monotonicity.
|
 |
Perceptually significant spatial pooling techniques for image quality assessment
Paper 7240-36
Time: 1:50 PM - 2:10 PM
Author(s): Anush K. Moorthy, Alan C. Bovik, The Univ. of Texas at Austin (United States)
|
|
Show Abstract
|
Recent image quality assessment (IQA) metrics achieve very high correlation with human perception of image quality. Naturally, it is of interest to produce even better results. One promising method is to weight image quality measurements by visual importance. To this end, we describe two strategies - visual fixation-based weighting, and quality-based weighting. By contrast with some prior studies we find that these strategies can improve the correlations with subjective judgment significantly. We demonstrate improvements on the SSIM index in both its multi-scale and single-scale versions, using the LIVE database as a test-bed.
|
 |
A methodology for coupling a visual enhancement device to human visual attention
Paper 7240-57
Time: 2:10 PM - 2:30 PM
Author(s): Aleksandar Todorovic, John A. Black, Jr., Sethuraman Panchanathan, Arizona State Univ. (United States)
|
|
Show Abstract
|
The Human Variation Model views disability as simply "an extension of the natural physical, social, and cultural variability of mankind." Given this human variation, it can be difficult to distinguish between a prosthetic device such as a pair of glasses (which extends limited visual abilities into the "normal" range) and a visual enhancement device (such as a pair of binoculars) that extends visual abilities beyond the "normal" range). Indeed, there is no inherent reason why the design of visual prosthetic devices should be limited to just providing "normal" vision. One obvious enhancement to human vision would be the ability to visually "zoom" in on objects that are of particular interest to the viewer. Indeed, it could be argued that humans already have a limited zoom capability, which is provided by their high-resolution foveal vision. However, humans find additional zooming useful, as evidenced by their purchases of binoculars equipped with mechanized zoom features. However, these zoom features are manually controlled. This raises two questions: (1) Could a visual enhancement device be developed to monitor attention and control visual zoom automatically? (2) If such a device were developed, would its use be experienced by users as a simple extension of their natural vision? This paper details the results of work with two research platforms (called the Remote Visual Explorer and the Interactive Visual Explorer) that were developed specifically to answer these two questions.
|
 |
Analysis of sharpness increase by image noise
Paper 7240-21
Time: 2:30 PM - 2:50 PM
Author(s): Takehito Kurihara, Naokazu Aoki, Hiroyuki Kobayashi, Chiba Univ. (Japan)
|
|
Show Abstract
|
Motivated by the increase in sharpness by image noise reported in preceding experiments, in this research, we further examined the effect. To avoid the difficulty of ordinary natural images which contain very fine texture, we used as simple stimuli as possible: images consisting of only one frequency. We prepared achromatic 1D grating or 2D uni-frequency image of different frequency and added different levels of achromatic white Gaussian noise to them. We then asked human observers to evaluate sharpness of each image. The result showed, for 1D grating images, sharpness seemed independent of noise level, while for 2D uni-frequency images, it increased with increasing noise. This means image noise is capable of increasing sharpness of 2D texture, whereas has an adverse effect on 1D shape such as edge. The result also showed lower-frequency patterns seemed to gain more sharpness than higher-frequency patterns. More detailed investigation into individual observer\u2019s response showed the disagreement in the judgments between observers: some perceived more sharpness with increasing noise while others not. We think this can be attributed to the different interpretation of sharpness between observers, which is reported in our preliminary experiment.
|
 |
Psychophysical study of LCD motion blur perception
Paper 7240-51
Time: 2:50 PM - 3:10 PM
Author(s): Sylvain Tourancheau, Patrick Le Callet, University of Nantes (France); Kjell Brunnström, Borje Andrén, Acreo AB (Sweden)
|
|
Show Abstract
|
Motion blur is still an important issue on liquid crystal displays (LCD). In the last years, efforts have been done in the characterization and the measurement of this artefact. These methods permit to picture the blurred profile of a moving edge. The blur extension varies with the scrolling speed and with the grey-to-grey transition considered. In this last years, a couple of works have addressed the problem of LCD motion blur perception, but only few speeds and transitions have been tested. In this paper, we have explored motion blur perception over 20 grey-to-grey transitions and up to 10 scrolling speeds. Moreover, we have used three different displays, to explore the influence on the perception of the luminance range as well as the blur shape. A blur matching experiment has been designed to obtain the relation between measurements and perception. In this experiment, observers must adjust a stationary blur (simulated from measurements) until it matches their perception of the blur occurring on a moving edge. Result shows that the adjusted blur is always lower than the measured blur. This could be related to the motion sharpening phenomenon.
|
 |
Coffee Break 3:10 PM - 3:40 PM
|
Session 8:
Image Analysis and Perception
Date: Wednesday 21 January
Time: 3:40 PM
- 6:00 PM
Session Chair: Thrasyvoulos N. Pappas, Northwestern Univ.
|
Pattern masking investigations of the 2nd order visual mechanisms
Paper 7240-14
Time: 3:40 PM - 4:00 PM
Author(s): Pi-Chun Huang, Chien-Chung Chen, National Taiwan Univ. (Taiwan)
|
|
Show Abstract
|
Human visual system is sensitive to both the first-order and the second-order variations in an image. The latter one is especially important for the digital image processing as it allows human observers to perceive the envelope of the pixel intensities as smooth surface instead of the discrete pixels. Here we used pattern masking paradigm to measure the detection threshold of contrast modulated (CM) stimuli, which comprise the modulation of the contrast of horizontal gratings by a vertical Gabor function, under different modulation depth of the CM stimuli. The threshold function showed a typical dipper shape: the threshold decreased with modulation depth (facilitation) at low pedestal depth modulations and then increased (suppression) at high pedestal modulation. The data was well explained by a modified divisive inhibition model that operated on depth modulation rather than contrast in the input images. Hence the divisive inhibition, determined by both the first- and the second-order information in the stimuli, is necessary to explain the discrimination between two second-order stimuli.
|
 |
Parsed and fixed block representations of visual information for image retrieval
Paper 7240-47
Time: 4:00 PM - 4:20 PM
Author(s): Soo Hyun Bae, Biing-Hwang Juang, Georgia Institute of Technology (United States)
|
|
Show Abstract
|
The theory of linguistics teaches us the existence of a hierarchical structure in linguistic expressions. By applying syntax and semantics beyond words, one can further recognize the grammatical relationship between among words and the meaning of a sequence of words. A class of techniques that have a similar nature to the linguistic parsing is found in the Lempel-Ziv incremental parsing scheme. Based on a new class of multidimensional incremental parsing algorithms extended from the Lempel-Ziv incremental parsing, a new framework for image retrieval was proposed recently. With the incremental parsing technique, a given image is decomposed into a number of patches, called a parsed representation. In this work, we examine the properties of two-dimensional parsed representation in the context of imagery information retrieval and in contrast to vector quantization; i.e. fixed square-block representations and minimum average distortion criteria. We implemented four image retrieval systems for the comparative study; three, called IPSILON image retrieval systems, use parsed representation with different perceptual distortion thresholds and one uses the convectional vector quantization. We compare the effectiveness of the use of the parsed representations under the latent semantic analysis paradigm. The result demonstrates the superiority of the parsed representation.
|
 |
Efficient construction of saliency map
Paper 7240-45
Time: 4:20 PM - 4:40 PM
Author(s): Wen-Fu Li, Tai-Hsiang Huang, Yi-Hsin Huang, Mei-Lan Chu, Homer H. Chen, National Taiwan Univ. (Taiwan)
|
|
Show Abstract
|
The saliency map is useful for many applications such as image compression, display, and visualization. The bottom-up model used in most saliency map construction methods is computationally expensive. The purpose of this paper is to present an efficient method for automatic construction of the saliency map of an image while preserving its accuracy. In particular, we remove the contrast sensitivity function and the visual masking component of the bottom-up visual attention model and retain the components related to perceptual decomposition and center-surround interaction that are critical properties of human visual system. The simplified model is verified by performance comparison with the ground truth. In addition, a salient region enhancement technique is adopted to enhance the contour of the salient areas, and the saliency maps of three color channels are fused to enhance the prediction accuracy. Experimental results show that the average correlation between our algorithm and the ground truth is close to that between the original model and the ground truth, while the computational complexity is reduced by 98%.
|
 |
Unsupervised image segmentation by adaptive gradient thresholding for dynamic region growth in the CIE L*a*b* color space
Paper 7240-3
Time: 4:40 PM - 5:00 PM
Author(s): Sreenath Rao Vantaram, Eli Saber, Vincent Amuso, Rochester Institute of Technology (United States); Mark Q. Shaw, Ranjit Bhaskar, Hewlett-Packard Co. (United States)
|
|
Show Abstract
|
In this paper, we propose a novel unsupervised color image segmentation algorithm named GSEG. This Gradient-based SEGmentation method is initialized by a vector gradient calculation in the CIE L*a*b* color space. The obtained gradient map is utilized for initially clustering low gradient content, as well as adaptively generating thresholds for a computationally efficient dynamic region growth procedure, to classify areas of subsequent higher gradient densities in the image. The resultant classification is combined with an entropy-based texture model in a statistical merging procedure to obtain the final segmentation. Qualitative and quantitative evaluation of our results on several hundred images, utilizing a recently proposed evaluation technique called the Normalized Probabilistic Rand index shows that, the GSEG is robust to various image scenarios and performs favorably against published segmentation techniques.
|
 |
Harmonic analysis for cognitive vision: perisaccadic vision
Paper 7240-19
Time: 5:00 PM - 5:20 PM
Author(s): Jacek Turski, Univ. of Houston (United States)
|
|
Show Abstract
|
Projective Fourier analysis gives data model for image representation that is well adapted to perspective transformations and the retino-cortical mapping. Here we model first aspects of the human visual process in which the understanding of a scene is built up in a sequence of attentional visual information acquisition followed by a fast saccadic eye movement that repositions the fovea on the next target. This sequence, called the scanpath, is the most basic feature of the foveate vision. We make three saccades per second with the maximum eyeball\u2019s speed of 700 deg/sec. The visual sensitivity is reduced during saccadic movements as we do not see moving images on the retinas. Therefore, three times per second, there are instant large changes in the retinal images without almost any information consciously carried between images. Inverse projective Fourier transform is computable by FFT in coordinates given by a complex logarithm that also approximates the retino-cortical mapping. Thus it gives the cortical image representation and a simple translation in log coordinates brings the presaccadic scene into the postsaccadic reference frame, eliminating the need for starting processing anew three times per second at each fixation. It also builds up perceptual continuity across fixations in the scanpath.
|
 |
Improved colour to greyscale via integrability correction
Paper 7240-40
Time: 5:20 PM - 5:40 PM
Author(s): Mark S. Drew, Simon Fraser Univ. (Canada); David Connah, Graham D. Finlayson, Univ. of East Anglia Norwich (United Kingdom); Marina Bloj, Univ. of Bradford (United Kingdom)
|
|
Show Abstract
|
For several years now there has been a sustained interest in the problem of converting colour images to greyscale. The classical solution to this problem is to code the luminance signal as a grey value image. However, the problem with this approach is the detail at equiluminant edges vanishes and in the worst case the greyscale reproduction of an equiluminant image is a single uniform gray value. The solution to this problem, adopted by all algorithms in the field, is to try and code colour difference (or contrast) in the grey-scale image. Significantly, a preference experiment presented at CIC 15 showed that incorporating colour contrast in reproductions was consistently preferred by observers when compared to luminance for a variety of images In this paper we make two contributions. First, on the algorithm side we reconsider the Socolinsky and Wolff algorithm for colour to greyscale conversion . This algorithm, which is the most mathematically elegant, often scored well in preference experiments but occasionally it introduces artefacts which spoil the appearance of the final image. These artefacts are intrinsic to the method and stem from the underlying approach which computes a grey scale image by a) calculating approximate luminance-type derivatives for the colour image and b) reintegrating these to obtain a greyscale image. Unfortunately, the sign of the derivative vector is occasionally unknown and, in the current theory, is set arbitrarily. However, choosing the wrong sign can lead to unnatural contrast gradients (not apparent in the colour original). Our first contribution is to show how this sign problem can be ameliorated using a generalised definition of luminance and the Markov Random Field theory. In the second part of this paper we will report on a large set of preference experiments which involve more images and more algorithms than reported in CIC 15. We will also pay particular attention to testing our modified Socolinsky and Wolff algorithm. Based on initial results we expect improved preference scores.
|
 |
Preserving visual saliency in image to sound substitution systems
Paper 7240-48
Time: 5:40 PM - 6:00 PM
Author(s):
|
|
Discussion Session: Image Analysis and Quality
Date: Wednesday 21 January
Time: 6:00 PM
- 7:00 PM
Joint Discussion Session with Conference 7242, Image Quality and System Performance
|
Thursday 22 January
|
Session 9:
3D Perception, Environments, and Applications
Date: Thursday 22 January
Time: 9:20 AM
- 12:00 PM
Session Chair: Bernice E. Rogowitz, IBM Thomas J. Watson Research Ctr.
|
Model based evaluation of human perception of stereoscopically visualized semi-transparent surfaces
(Invited Paper)
Paper 7240-28
Time: 9:20 AM - 9:50 AM
Author(s): Michael Kleiber, Carsten Winkelholz, Verena Kinder, Research Establishment for Applied Science (Germany)
|
|
Show Abstract
|
Depicting three dimensional surfaces in such a way that distances between these surfaces or underlying opaque surfaces can be estimated quickly and accurately is a challenging task. Traditional visualizations use transparent surfaces which are blended together \u2013 this allows the user to see all the layers. However there is a conflict between the amount of transparency and the ability to perceive structure and shape which can not be resolved. Another approach is the use of semi-transparent textures i.e. only some parts of the surface are colored. We conducted an experiment to determine the performance of subjects in perceiving distances between an opaque ground surface and specific points on an overlayed surface which was visualized using isolines, curvature oriented strokes and the traditional shaded transparent surfaces. For the experiments we used a stereoscopic visualization system. We found that there is a 70% improvement in perception concerning the accuracy of distance estimation between semi-opaque surfaces and purely transparent surfaces. In addition, the results show that response times for curvature oriented strokes were faster compared to isolines. For a trusted interpretation of these results, a plausible explanation has to be given. We hypothesize that users visually integrate the available three dimensional positions and thereby come to an estimate. Different patterns on the surface guide visual attention differently, leading to different strategies in estimating and integrating the spatial relations within the structure. We describe it as a process of several attention shifts during which inter-object relations are represented as noisy values with a specific variance.
|
 |
Influence of chroma variations on naturalness and image quality in stereoscopic images
Paper 7240-85
Time: 9:50 AM - 10:10 AM
Author(s): André Kuijsters, Marc Lambooij, Wijnand A. Ijsselsteijn, Technische Univ. Eindhoven (Netherlands); Ingrid E. J. Heynderickx, Philips Research (Netherlands)
|
|
 |
Coffee Break 10:10 AM - 10:40 AM
|
Color rendering indices in global illumination methods
Paper 7240-8
Time: 10:40 AM - 11:00 AM
Author(s): David Geisler-Moroder, Arne Dür, Univ. Innsbruck (Austria)
|
|
Show Abstract
|
Human perception of material colors depends heavily on the nature of the light sources used for illumination. One and the same object can cause highly different color impressions when lit by a vapor lamp or by daylight, respectively. Based on state-of-the-art colorimetric methods we present a modern approach for calculating color rendering indices (CRI), which were defined by the CIE to characterize color reproduction properties of illuminants. We update the standard CIE method in three main points: firstly, we use the CIELAB color space, secondly, we apply a Bradford transformation for chromatic adaptation, and finally, we evaluate color differences using the CIEDE2000 total color difference formula. Moreover, within a real-world scene, light incident on a measurement surface is composed of a direct and an indirect part. Neumann and Schanda have shown for the cube model that interreflections can influence the CRI of an illuminant. We analyze how color rendering indices vary in a real-world scene with mixed direct and indirect illumination and recommend the usage of a spectral rendering engine instead of an RGB based renderer for reasons of accuracy of CRI calculations.
|
 |
Luminance, disparity, and range statistics in 3D natural scenes
Paper 7240-46
Time: 11:00 AM - 11:20 AM
Author(s): Yang Liu, Alan C. Bovik, Lawrence K. Cormack, The Univ. of Texas at Austin (United States)
|
|
Show Abstract
|
Recent laser range scanners produce co-registered range and luminance simultaneously, which is an ideal tool for learning human depth perception. Using the range maps, and assuming a 6.5cm interocular distance and an empirically measured fixation distance distribution, we can convert ranges into disparity maps, which represent natural binocular disparities experienced by a human observer in 3D natural scenes. We are interested in the co-location statistics of edge discontinuities in corresponding luminance and disparity images, a topic that has not been studied much but which may prove quite useful. We designed a Laplacian of Gaussian (LoG) filterbank with center frequencies roughly matching human contrast sensitivity. The LoG filters were used to detect edges in the grayscale intensity images, the co-registered range maps, and the derived disparity maps. At all detected luminance, disparity, and range edges, we studied the pair-wise linear correlation among luminance gradient, disparity gradient, and range gradient. We found that the strongest correlation (about 0.37) exists between luminance gradient and range gradient, yet the linear correlation between luminance gradient and disparity gradient was low. The correlation between luminance gradient and range gradient indicates the possibility of luminance gradient being used by the visual system in absolute distance estimation.
|
 |
Three-dimensional visualization of geographical terrain data using temporal parallax difference induction
Paper 7240-26
Time: 11:20 AM - 11:40 AM
Author(s): Christopher A. Mayhew, Craig M. Mayhew, Vision III Imaging, Inc. (United States)
|
|
Show Abstract
|
Vision III Imaging, Inc. (the Company) has developed Parallax Image Display (PID) software tools to critically align and display aerial images with parallax differences. Terrain features are rendered obvious to the viewer. The inclusion of digital elevation models in geographic data browsers now allows true three-dimensional parallax to be acquired from virtual globe programs like Google Earth. The authors have successfully applied PID methods to visualizing three-dimensional geographical terrain data. It is known that animals, and people can determine the relative spatial depth of a scene by moving one eye from side to side. The oscillating eye movement presents motion parallax depth information over time. This allows for the determination of depth order by the relative movement of objects in the scene. A flight path is determined, each image frame is subsequently offset in its parallax position and dynamically critically aligned to one another down to the sub pixels level based on various scene parameters. Geographical data is presented to reveal obscure irregularities and unrecognized terrain features using standard displays.
|
 |
Measuring hand, head, and vehicle motions in commuting environments
Paper 7240-22
Time: 11:40 AM - 12:00 PM
Author(s): Feng Li, Jeff B. Pelz, Rochester Institute of Technology (United States); Scott J. Daly, Sharp Labs. of America, Inc. (United States)
|
|
Show Abstract
|
Viewing video on mobile devices is becoming increasingly common. The small field-of-view and the vibrations in common commuting environments present challenges (hardware and software) for the imaging community. By monitoring the vibration of the display, it could be possible to stabilize an image on the display by shifting a portion of a large image with the display (a field-of-view expansion approach). However, the image should not be shifted exactly per display motion because eye movements have a \u2018self-adjustment\u2019 ability to partially or completely compensate for external motion that can make a perfect compensation appear to overshoot. In this work, accelerometers were used to measure the motion of a range of vehicles, and observers\u2019 heads and hands as they rode in those vehicles to support the development of display motion compensation algorithms.
|
 |
Lunch Break 12:00 PM - 1:30 PM
|
Session 10:
Art and Perception
Date: Thursday 22 January
Time: 1:30 PM
- 4:40 PM
Session Chairs: Elena Federovskaya, .; Hawley K. Rising, III, Sony Electronics Inc.; David G. Stork, Ricoh Innovations, Inc.; Michael H. Brill, Datacolor
|
Aesthetic science: understanding human preferences for spatial composition
(Invited Paper)
Paper 7240-80
Time: 1:30 PM - 2:10 PM
Author(s): Stephen Palmer, Univ. of California, Berkeley (United States)
|
|
No abstract available
|
|
 |
Visually representing reality: aesthetics and accessibility aspects
Paper 7240-15
Time: 2:10 PM - 2:30 PM
Author(s): Floris L. van Nes, Technische Univ. Eindhoven (Netherlands)
|
|
No abstract available
|
|
 |
Estimating the position of illuminants in paintings under weak model assumptions: an application to the works of two Baroque masters
Paper 7240-23
Time: 2:30 PM - 2:50 PM
Author(s): David Kale, Stanford Univ. (United States); David G. Stork, Ricoh Innovations, Inc. (United States) and Stanford Univ. (United States)
|
|
Show Abstract
|
The problem of estimating the position of illumination in realist paintings has been addressed recently using algorithms from computer vision. These algorithms fall into two general categories. In model independent methods (cast-shadow analysis, occluding-contour analysis), there is no need to know or assume a three-dimensional geometry of the scene. In model-dependent methods (shape-from-shading, full computer graphics synthesis), one needs to know or assume a three-dimensional geometry, for instance the direction of a normal vector at each point on a complex surface. We explore the intermediate- or weak-model condition, where the three-dimensional object rendered is so simple one can confidently assume its three-dimensional properties. For instance, we can assume that a floor and a wall are both flat and are horizontal and vertical, respectively. We derived the maximum-likelihood estimator for the spatial location of a point source as a function of the pattern of lightness over such a planar surface. We applied our methods to two paintings of the Baroque, paintings for which the question of the illuminant position is of great interest to art historians: Caravaggio\u2019s \u201cThe calling of St. Matthew\u201d and Georges de la Tour\u2019s \u201cChrist in the carpenter\u2019s studio.\u201d The pattern of light on the rear wall in \u201cThe calling\u201d implies the illumination source is local, a few meters outside the picture frame, thereby rebutting art historical claims that the source is solar (and hence distant). The pattern of light on the floor in \u201cChrist\u201d implies the illumination source is very close to the depicted candle, thereby rebutting art historical claims that the source is in place of the figures. Ours are the first application of weak model methods for inferring the location of illuminants in realist paintings.
|
 |
Intensity statistics of artwork: connections to human visual perception
Paper 7240-77
Time: 2:50 PM - 3:10 PM
Author(s): Daniel J. Graham, Dartmouth College (United States); Jay Friedenberg, Manhattan College (United States); Daniel N. Rockmore, Dartmouth College (United States); David J. Field, Cornell Univ. (United States)
|
|
Show Abstract
|
Studies of categorization for natural images have demonstrated relationships between certain content classes and statistics relevant to early visual system coding. Paintings of known authorship and/or provenance also show consistent spatial and intensity statistics. However, intensity statistics are markedly different for paintings and scenes. Transforming scene luminances into painting luminances requires nonlinear scaling, much as the retina must perform nonlinear scaling. Recently, we tested the hypothesis that perceived similarity of art can be predicted using intensity statistics to which the early visual system is attuned. We employed multidimensional scaling analysis of observers' similarity ratings for paired paintings, and we compared resulting dimensions to a host of statistical measures, modeled neural responses, and semantic variables derived from image metadata. Three image sets, classified earlier as "landscapes," "portraits/still-life," and "abstract," were tested separately. In all cases, one of the first two similarity dimensions was highly correlated with a measure of intensity distribution sparseness, and for landscapes this correlation explained a greater portion of data variance than did semantic variables. We discuss these results in the context of previous studies of similarity and in terms of statistical regularities in natural scenes. We also discuss ongoing work to determine artists' nonlinear luminance scaling functions.
|
 |
Coffee Break 3:10 PM - 3:40 PM
|
Painted or printed? Correlation analysis of the brickwork in Jan van der Heyden's View of Oudezijds Voorburgwal with the Oude Kerke in Amsterdam
Paper 7240-78
Time: 3:40 PM - 4:00 PM
Author(s): David G. Stork, Ricoh Innovations, Inc. (United States); Sean Meador, Stanford Univ. (United States)
|
|
Show Abstract
|
It has been speculated that fine art painters have evolved an open methodology, which exploits specific human vision and perception techniques that has only recently been validated through modern cognitive and biological scientific methods. While seemingly a qualitative task, artists have used known techniques such as using edge and textural "sharpness" to create a centre of interest, using edges to move the viewers gaze, and other techniques to filter and emphasize their varied goals. We overview our interdisciplinary parameterized knowledge domain system for understanding and re-implementing in a computer system these portrait painting perception based techniques. This computer based \u2018painterly\u2019 rendering system allow us to parameterize the open cognitive and vision-based methodology of human artists have intuitively evolved over centuries into a domain toolkit to explore aesthetic realizations and interdisciplinary questions about the act of portrait painting. These experiments and questions can be explored by traditional and new media artists, art historians, cognitive scientists and other scholars.
|
 |
Chiasmus
Paper 7240-53
Time: 4:20 PM - 4:40 PM
Author(s): Stephen Cady, The Univ. of Advancing Technology (United States)
|
|
Show Abstract
|
Chiasmus is a responsive and dynamically reflective, two-sided volumetric projection surface that embodies phenomenological issues such as the formation and reception of images, observer and machine perception and the dynamics of the screen as a space of image reception. It investigates the dynamics of projected light as it interacts with different depths of a dynamically mobile surface, the resulting effects upon the projected imagery and its apprehension. It consists of a square grid of 64 individually motorized acrylic cube elements engineered to move linearly. Each cube is controlled by custom software that analyzes video imagery for luminance values and sends these values to the motor control mechanisms to coordinate the individual movements. The resolution of the sculptural screen from the individual movements allows its volume to dynamically alter in depth and shape, providing novel and unique perspectives of its mobile form to an observer.
|
 |
Panel Discussion on Art and Perception
Date: Thursday 22 January
Time: 4:50 PM
Panel Members: Stephen Palmer, Univ. of California, Berkeley; David G. Stork, Ricoh Innovations, Inc.; Stephen Cady, The Univ. of Advancing Technology
|
 |