Organizers: Brian J. White & Douglas P. Munoz; Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
Presenters: Jared Abrams, Wolfgang Einhäuser, Brian J. White, Michael Dorr, Neil Mennie
Symposium Description
Understanding how we perceive and act upon complex natural environments is one of the most pressing challenges in visual neuroscience, with applications that have potential to revolutionize our understanding of the brain, machine vision, and artificial intelligence, to clinical applications such as the detection of visual or mental disorders and neuro-rehabilitation. Until recently, the study of active vision – how visual stimuli give rise to eye movements, and conversely how eye movements influence vision – has largely been restricted to simple stimuli in artificial laboratory settings. Historically, much work on the visual system has been accomplished in this way, but to fully understand vision it is essential to measure behavior under the conditions in which visual systems naturally evolved. This symposium covers some of the latest research on vision and eye movements in natural environments. The talks will explore methods of quantifying natural vision, and compare/contrast behavior across various levels of stimulus complexity and task constraint, from visual search in natural scenes (Abrams, Bradley & Geisler), to unconstrained viewing of natural dynamic video in humans (Dorr, Wallis & Bex), and non-human primates during single-cell recording (White, Itti & Munoz), and real-world gaze behavior using portable eye-tracking (Einhäuser & ‘t Hart; Mennie, Zulkifli, Mahadzir, Miflah & Babcock). Thus, the symposium should be of interest to a wide audience from visual psychophysicists, to oculomotor neurophysiologists, and cognitive/computational scientists.
Presentations
Fixation search in natural scenes: a new role for contrast normalization
Speaker: Jared Abrams; Center for Perceptual Systems, University of Texas, Austin, USA
Authors: Chris Bradley, Center for Perceptual Systems, University of Texas, Austin; Wilson S. Geisler, Center for Perceptual Systems, University of Texas, Austin.
Visual search is a fundamental behavior, yet little is known about search in natural scenes. Previously, we introduced the ELM (entropy limit minimization) fixation selection rule, which selects fixations that maximally reduce uncertainty about the location of the target. This rule closely approximates the Bayesian optimal decision rule, but is simpler computationally, making the ELM rule a useful benchmark for characterizing human performance. Previously, we found that the ELM rule predicts several aspects of fixation selection in naturalistic (1/f) noise, including the distributions of fixation location, saccade magnitude, and saccade direction. However, the ELM rule is only optimal when the detectability of the target (the visibility map) falls off from the point of fixation in the same way for all potential fixation locations, which holds for backgrounds with relatively constant spatial structure, like statistically stationary 1/f noise. Most natural scenes do not satisfy this assumption; they are highly non-stationary. By combining empirical measurements of target detectability in natural backgrounds with a straight-forward mathematical analysis, we arrive at a generalized ELM rule (nELM rule) that is optimal for non-stationary backgrounds. The nELM searcher divides (normalizes) the current target probability map (posterior-probability map) by the estimated local contrast at each location in the map. It then blurs (convolves) this normalized map with the visibility map for a uniform background. The peak of the blurred map is the optimal location for the next fixation. We will describe the predictions and performance of the nELM searcher.
Eye movements in natural scenes and gaze in the real world.
Speaker: Wolfgang Einhäuser; Philipps-University Marburg, Department of Neurophysics, Marburg, Germany
Authors: Bernard Marius ‘t Hart, Philipps-University Marburg, Department of Neurophysics, Marburg, Germany.
Gaze is widely considered a good proxy for spatial attention. We address whether such “overt attention” is related to other attention measures in natural scenes, and to what extent laboratory results on eye movements transfer to real-world gaze orienting. We find that the probability of a target to be detected in a rapid-serial-visual-presentation task correlates with its probability to be fixated during prolonged viewing, and that both measures are similarly affected by modifications to the target’s contrast. This shows a direct link between covert attention in time and overt attention in space for natural stimuli. Especially in the context of computational vision, the probability of an item to be fixated (“salience”) is frequently equated with its “importance”, the probability of it being recalled during scene description. While we confirm a relation between salience and importance, we dissociate these measures by changing an item’s contrast: whereas salience is affected by the actual features, importance is driven by the observer’s expectations about these features based on scene statistics. Using a mobile eye-tracking device we demonstrate that eye-tracking experiments in typical laboratory conditions have limited predictive power for real-world gaze orienting. Laboratory data fail to measure the substantial effects of implicit tasks that are imposed on the participant by the environment to avoid severe costs (e.g., tripping over) and typically fail to include the distinct contributions of eye, head and body for orienting gaze. Finally, we provide some examples for applications of mobile gaze-tracking for ergonomic workplace design and aiding medical diagnostics.
Visual coding in the superior colliculus during unconstrained viewing of natural dynamic video
Speaker: Brian J. White; Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
Authors: Laurent Itti, Dept of Computer Science, University of Southern California, USA; Douglas P. Munoz, Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
The superior colliculus (SC) is a multilayered midbrain structure with visual representations in the superficial-layers (SCs), and sensorimotor representations linked to the control of eye movements/attention in the intermediate-layers (SCi). Although we have extensive knowledge of the SC using simple stimuli, we know little about how the SC behaves during active-vision of complex natural stimuli. We recorded single-units in the monkey SC during unconstrained viewing of natural dynamic video. We used a computational model to predict visual saliency at any retinal location, any point in time. We parsed fixations into tertiles according to the averaged model-predicted saliency value (low, medium, high) in the response field (RF) around the time of fixation (50-400ms post-fixation). The results showed a systematic increase in post-fixation discharge with increasing saliency. We then examined a subset of the total fixations based on the direction of the next saccade (into vs. opposite the RF), under the assumption that saccade direction coarsely indicates the top-down goal of the animal (“value” of the goal-directed stimulus). SCs neurons showed the same enhanced response for greater saliency irrespective of next saccade direction, whereas SCi neurons only showed an enhanced response for greater saliency when the stimulus that evoked it was the goal of the next saccade (was of interest/value). This implies that saliency is controlled closer to the output of the saccade circuit, where priority (combined representation of saliency and relevancy) is presumably signaled and the saccade command is generated. The results support functionally distinct roles of SCs and SCi, whereby the former fit the role of a visual saliency map, and the latter a priority map.
Visual sensitivity under naturalistic viewing conditions
Speaker: Michael Dorr; Schepens Eye Research Institute, Dept of Ophthalmology, Harvard Medical School, and Institute for Neuro- and Bioinformatics, University of Lübeck, Germany
Authors: Thomas S Wallis, Schepens Eye Research Institute, Dept of Ophthalmology, Harvard Medical School, and Centre for Integrative Neuroscience and Department of Computer Science, The University of Tübingen, Tübingen, Germany; Peter J Bex, Schepens Eye Research Institute, Dept of Ophthalmology, Harvard Medical School.
Psychophysical experiments typically use very simple stimuli, such as isolated dots and gratings on uniform backgrounds, and allow no or only very stereotyped eye movements. While these viewing conditions are highly controllable, they are not representative of real-world vision, which is characterized by a complex, broadband input and several eye movements per second. We performed a series of experiments in which subjects freely watched high-resolution nature documentaries and TV shows on a gaze-contingent display. Eye-tracking at 1000 Hz and fast video processing routines allowed us to precisely modulate the stimulus in real time and in retinal coordinates. The task then was to locate either bandpass contrast changes or geometric distortions that briefly appeared in one of four locations relative to the fovea every few seconds. We confirm a well-known loss of sensitivity when video modulations took place around the time of eye movements, i.e. around episodes of high-speed retinal motion. However, we found that replicating the same retinal input in a passive condition, where subjects maintained central fixation and the video was shifted on the screen, led to a comparable loss in sensitivity. We conclude that no process of active, extra-retinal suppression is needed to explain peri-saccadic visual sensitivity under naturalistic conditions. We further find that the detection of spatial modifications depends on the spatio-temporal structure of the underlying scene, such that distortions are harder to detect in areas that vary rapidly across space or time. These results highlight the importance of naturalistic assessment for understanding visual processing.
Spatio-Temporal Dynamics of the use of gaze in natural tasks by a Sumatran Orangutan (Pongo abelli)
Speaker: Neil Mennie; University of Nottingham, Malaysia Campus, Malaysia
Authors: Nadia Amirah Zulkifli, University of Nottingham Malaysia Campus; Mazrul Mahadzir, University of Nottingham Malaysia Campus; Ahamed Miflah, University of Nottingham Malaysia Campus; Jason Babcock, Positive Science LLC, New York, USA.
Studies have shown that in natural tasks where actions are often programmed sequentially, human vision is an active, task-specific process (Land, et al., 1999; Hayhoe et al., 2003). Vision plays an important role in the supervision of these actions, and knowledge of our surroundings and spatial relationships within the immediate environment is vital for successful task scheduling and coordination of complex action. However, little is known about the use of gaze in natural tasks by great apes. Orangutans usually live high in the canopy of the rainforests of Borneo and Sumatra, where a good spatial knowledge of their immediate surroundings must be important to an animal that has the capability to accurately reach/grasp with four limbs and to move along branches. We trained a 9yr old captive born Sumatran orangutan to wear a portable eye tracker and recorded her use of gaze in a number of different tasks such as locomotion, visual search and tool use in an enclosure at the National Zoo of Malaysia. We found that her gaze was task specific, with different eye movement metrics in different tasks. Secondly we also found that this animal made anticipatory, look-ahead eye movements to future targets (Mennie et al., 2007) when picking up sultanas from a board using her upper limbs. This semi-social animal is likely to be capable of the similar, high-level use of gaze to that of a social species of hominidae – humans.