Semantic information shapes gaze patterns during naturalistic movie viewing

Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Eye Movements: Cognition

Sophie Su1, Aditya Upadhyayula1; 1Washington University in Saint Louis

Visual attention in naturalistic scene viewing is guided by high-level knowledge-based, and low-level saliency-based features. Recent work has begun to quantify what these knowledge-based effects are in the context of naturalistic images (Henderson & Hayes, 2017; 2019; 2023). Here we used a state-of-the-art transformer vision model to show that eye movement patterns in naturalistic videos are informed by the underlying semantic knowledge. Participants in our experiment were eye-tracked as they watched videos of everyday activities. Subsequently, the same videos were input to the OpenAI’s CLIP transformer model as individual frames to generate an embedding for each frame. Linear regression models using these embeddings as input were further trained to predict the gaze heatmap obtained from (n=101) participants. These models were then tested on unseen gaze data to compute the correlation between the predicted and the observed gaze distributions. To quantify the effect of semantic knowledge in gaze prediction, we also repeated the same procedure with the CLIP embeddings of inverted scenes. Prior work has shown that scene inversion disrupts semantic processing while preserving low-level features (Shore & Klein, 2000). We, therefore, hypothesized that if the gaze prediction is driven by semantic knowledge during the movie viewing, there would be a significant difference in correlations between the model-generated and observed gaze distributions for both the intact and flipped video conditions. A linear mixed effects model showed a significant difference between the correlations for the intact and flipped video conditions (beta = 0.04, t= 6.116, p < 0.001). These results suggest that the gaze patterns during naturalistic movie viewing are informed by the underlying semantic knowledge. This work provides crucial groundwork for further exploration of gaze patterns informed by knowledge-based effects and their role in event cognition and memory.