Eye Movements: Gaze strategies

Talk Session: Monday, May 19, 2025, 10:45 am – 12:15 pm, Talk Room 1

Talk 1, 10:45 am

The interpretation attributed to observed gaze shifts affects their attention cueing effect

Shlomit Yuval-Greenberg1,2 (), Amit Zehngut1; 1School of Psychological Sciences, Tel Aviv University, 2Sagol School of Neuroscience, Tel Aviv University

In environments rich with stimuli, individuals must rely on attentional cues to identify the most relevant targets. As social beings, a key strategy involves observing where others focus their attention and following their lead, under the assumption that areas attended to by others are more likely to be of interest. Since gaze shifts represent a visible indicator of attention, mirroring the gaze of others serves as an effective social-attentional strategy. Indeed, research has shown that observing others' gaze redirection triggers a reflexive shift in attention, improving perceptual performance for objects located at the gazed-at positions. This phenomenon, known as the Gaze Cueing Effect (GCE), is well-documented and often regarded as reflexive. However, in social interactions, gaze shifts do not always signify attentional orientation. For instance, people often avert their gaze during effortful cognitive processing. Correctly interpreting gaze shifts is essential for effective gaze-based attention shifts. Here, we challenge the reflexive nature of the GCE by examining its dependency on gaze interpretation. Across two preregistered experiments (total N = 110), participants watched videos of gaze shifts while performing a perceptual task. The experimental context was manipulated: one group was primed to interpret the gaze shifts as reflecting cognitive processing rather than overt attentional shifts, and the other was not. Results revealed that the GCE was suppressed in the group primed to view gaze shifts as cognitive processing. This finding suggests that the GCE is influenced by the social interpretation of observed gaze shifts. We conclude that the Gaze Cueing Effect is modulated by social context and is not purely reflexive. These findings highlight how social interpretation can significantly shape fundamental attentional mechanisms.

The study was funded by ISF grant 1960/19 to S-Y.G and by BSF grant 2020308 to S-Y.G and Prof. Robert Knight

Talk 2, 11:00 am

Scene viewing from kindergarten to retirement - learning canonical gaze

Ben de Haas1, Marcel Linka1; 1Experimental Psychology, Justus-Liebig-Universität Giessen

Two adults viewing the same scene tend to fixate overlapping parts of it. This has led to decades of computational modeling, trying to predict average fixation densities. However, individual fixation patterns deviate from this in highly reliable ways. How do we get to this point? Do young children fixate scenes in stereotypical ways and acquire individual preferences over time? Or is children’s gaze idiosyncratic before becoming more canonical? We present eye-tracking data from >6,500 participants in a museum. Participants ranged from 5-72 years of age, freely viewing 40 complex scenes. This large dataset allowed us to trace the development of individual differences by estimating average pairwise correlations of fixation patterns, separately for age bins of two years. We find that preschool children tend to fixate fewer elements of a scene and agree on those to an appreciable degree. For older children, image exploration and the number of fixated elements rapidly increase. In parallel to this, pairwise similarities steeply drop. The gaze of children becomes increasingly idiosyncratic until age 14. Then, the trend reverses and patterns of gaze become more and more similar despite a continued increase in image exploration. Pairwise similarity only plateaus from the early twenties, at its highest level. These results show that the degree to which adult gaze is canonical takes decades to develop. I will speculate on the reasons for this protracted development and on its potential relationship with scene understanding.

ERC StG INDIVISUAL

Talk 3, 11:15 am

Quantifying spatiotemporal gaze dynamics using perceptual segmentation

Concetta Brusco1 (), Sophie Molholm1, Nathaniel Killian1, Ruben Coen-Cagli1; 1Albert Einstein College of Medicine

Eye movements reveal the spatiotemporal dynamics of visual attention. Image saliency is a well-known predictor of gaze location. However, it has been proposed that spatiotemporal gaze patterns are also organized by the spatial boundaries of visual objects. Here, we test this rigorously with natural images for the first time. We leverage our novel measurements of perceptual segmentation and concurrent eye-tracking to test the hypothesis that subjectively perceived segments predict spatial and temporal dynamics of scanpaths. In each of 35 sessions, participants (n=8) viewed a natural or texture image for a few seconds, then made a series of quick perceptual judgments that we used to algorithmically reconstruct their perceptual segmentation maps (PSMs, i.e. the most probable segment for each pixel). First, we tested if PSMs predict gaze location by calculating the mutual information (MI) between PSMs and gaze density during initial image viewing. We found significant MI for 12/22 natural image sessions (p<0.05, two-sided permutation test with 1000 shuffles of gaze data). Comparatively, all sessions had significant MI between gaze and the image’s saliency map generated by DeepGazeIIE, the state-of-the-art scanpath prediction model. However, the PSMs had greater information gain than the saliency maps (162% vs 56% increase from shuffled gaze data, average of 12 sessions), suggesting that subjectively perceived segments influence the spatial density of the scanpath. We also investigated temporal dynamics: on average, segments were viewed for 910-msec periods containing 2 consecutive fixations and 2-3 saccades. These values did not depend on segment size but were significantly shorter for natural images than textures. This suggests that semantic features of segments, and not just the spatial area they occupy, influence temporal gaze dynamics. Overall, our findings indicate that perceptual segments are meaningful spatial divisions that impact spatiotemporal gaze and attentional dynamics, and could potentially improve individualized scanpath prediction.

R01EY031166 (RCC), Rose F. Kennedy Intellectual and Developmental Disabilities Research Center, Albert Einstein College of Medicine (RCC)

Talk 4, 11:30 am

Eye motion improves acuity under emulated cone loss

Hannah K. Doyle1 (), James Fong1, Ren Ng1, Austin Roorda1; 1University of California, Berkeley

Retinal degenerative diseases degrade vision through cone loss. Prior work has probed the visual function of patients with these diseases on measures such as acuity (Ratnam 2013, Foote 2018), contrast sensitivity (Alexander 1992), and motion detection (Turano 1992). However, this area of study is limited by recruitment of patients with such diseases. We have developed a system for performing optical stimulation on a cone-by-cone basis, which we used to emulate cone loss in healthy subjects and study the impact of eye motion on acuity under retinal degeneration. We used an adaptive optics scanning light ophthalmoscope to simultaneously image the retina at 840 nm and deliver cone-by-cone stimulation at 543 nm. We conducted a 4AFC Landolt C task with “cone dropout,” where a random percentage of cones were excluded from the stimulation, emulating cone loss that was fixed to the retina as it moved across the stimulus. For a separate “image dropout” condition, we removed pixels fixed to the stimulus itself, preventing the subject from gathering new information about the stimulus through eye motion. First, we measured acuity thresholds for varying percentages of cone and image dropout through an interleaved staircase procedure. Second, we varied stimulus duration and compared performance between the cone and image dropout conditions. For all 3 subjects tested, acuity declined logarithmically with the fraction of cones removed from the mosaic. Above 50% dropout, acuity was improved under the cone dropout condition as compared to image dropout. This benefit was also apparent in experiments varying stimulus duration, where acuity improved with duration under the cone dropout condition and showed an advantage over the image dropout condition, especially at longer durations. Our results show that the visual system makes use of information gathered through eye motion in order to improve its acuity when sampling with a degraded cone mosaic.

Funded by Air Force Office of Scientific Research grants FA9550-20-1-0195 and FA9550-21-1-0230, and National Institutes of Health grant R01EY023591.

Talk 5, 11:45 am

Language Guided Search with Multi-Modal Foveated Deep Neural Network Finds Objects at Unexpected Locations

Parsa Madinei1, Miguel P Eckstein1; 1University of California, Santa Barbara

Introduction: Human eye movements during visual search are guided by scene context and object co-occurrence. Search accuracy deteriorates, and response times increase when the target appears at an unexpected location (e.g., a toothbrush on a toilet seat). In everyday life, observers can overcome the detrimental effects of out-of-context target placement when linguistic instructions guide their search: “The toothbrush on the toilet seat”. Here, we present a foveated language-guided search model (FLGSM) that combines a multi-modal transformer with foveated architecture (Freeman & Simoncelli, 2011) and a reinforcement learning agent to locate objects in real-world scenes using language instructions. We assess FLGSM’s accuracy for images with targets at expected and unexpected locations accompanied by informative language. Methods: The FLGSM uses a transformer to perform visual grounding using referring expressions and foveated feature maps, while an Advantage Actor-Critic agent with a Long-Short Term Memory architecture to select fixation points and optimize detection performance (rather than mimicking human scanpaths; Yang et al., 2020). Results: FLGSM’s detection accuracy (Area under the ROC, AUC) was significantly higher when the target appeared at expected (AUC=0.766) vs. unexpected location (AUC=0.597). When the FLGSM received language input about the location of the target, the performance gap between expected and unexpected locations substantially decreased (AUC=0.722 vs. 0.694, difference=0.028). Conclusions: These results demonstrate that integrating language guidance with foveated visual processing enables more robust object detection, particularly for targets in unexpected locations. Our model's performance suggests that combining linguistic information with strategic eye movements can help overcome the limitations of context-based visual search, more closely matching human visual search capabilities.

Talk 6, 12:00 pm

Distinct human eye fields unraveled by fMRI visual field and oculomotor mapping tasks

Uriel lascombes1, Sina Kling1, Guillaume S Masson1, Martin Szinte1; 1Institut de Neurosciences de la Timone, CNRS, Aix-Marseille Université, Marseille, France

Eye movements, primarily saccades and pursuit, are essential to active visual perception and are therefore intricated with visual neural representations. While research in animals has elucidated the roles of the frontal eye fields (FEF) and parietal eye fields (PEF) in controlling these eye movements, the topography of their human homologues remains poorly understood. To comprehensively map the human eye fields, we used advanced neuroimaging and modeling techniques. High-field functional MRI was used to first localize cortical frontal and parietal regions involved in generating saccadic and smooth pursuit eye movements in 20 participants. We then explored the visuospatial characteristics of these regions using population receptive field mapping, identifying visuomotor clusters within the parietal and frontal cortices that exhibits retinotopic organization. Our findings reveal a topographically organized architecture of the human eye fields underlying the sophisticated control of saccades and smooth pursuit. Interestingly, we found differences in cortical magnification patterns between the FEF and PEF clusters, suggesting specialized functional roles. This study challenges the conventional understanding observed in primates, as the human eye fields appear to be composed of distinct visuomotor areas.

This research was supported by an ANR JCJC and a Fyssen Foundation grant to MS.