Valence of the scene context, not the people in the scene is captured in the visual cortex and visual artificial neural networks
Poster Presentation: Tuesday, May 20, 2025, 2:45 – 6:45 pm, Pavilion
Session: Scene Perception: Neural mechanisms
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Elahe Yargholi1, Laurent Mertens2,3, Joost Vennekens2,3,5, Jan Van den Stock4, Hans Op de Beeck1; 1Department of Brain and Cognition, Leuven Brain Institute, Faculty of Psychology & Educational Sciences KU Leuven, 3000 Leuven, Belgium, 2KU Leuven, De Nayer Campus, Dept. of Computer Science J.-P. De Nayerlaan 5, 2860 Sint-Katelijne-Waver, Belgium, 3Leuven.AI - KU Leuven Institute for AI, 3000 Leuven, Belgium, 4Neuropsychiatry, Leuven Brain Institute KU Leuven, 3000 Leuven, Belgium, 5Flanders Make@KU Leuven, 3000 Leuven, Belgium
Humans can evaluate the emotional meaning of complex social interactions in real-life settings, but it is unclear how this assessment is achieved. Previous evidence from the human brain and AI models pointed to visual processing as the primary seat for processing emotional valence, but this conclusion may not generalize to complex scenes involving social interactions. Here, we prepared stimuli depicting social human interactions in emotionally loaded scene contexts, e.g. funerals. Across the full set, the valence of the people in the scene was partially dissociated from the valence of the scene context, e.g. people laughing at a funeral. Neuroimaging (fMRI) responses showed that visual areas represent the emotional valence of the scene context and not the valence of people in the scene. Category-selective areas are not the main regions for coding valence; instead, they respond to properties of elements related to the category preferences of these regions. Scene-selective regions have a significantly correlated level of activation with valence of the scene context; a negative correlation consistent with negativity bias theory. However, the valence of people in the scene is not captured in face/body selective regions. Neural responses selective to the valence of people in the scene are only generalized across images in the association cortex. AI responses showed existing models for image valence processing rely mostly on the valence of the scene context while advanced multi-modal AI models that integrate text and vision partially capture the valence of the social interactions on top of the valence of the scene context. We show how basic visual processing captures the basic emotional associations of objects and scenes. Yet, higher levels of processing are needed, in AI models and distributed across the human association cortex, to capture the valence of social scenes when it is at odds with basic properties of images.
Acknowledgements: This work was funded by KU Leuven grant IDN/21/010.