Face and Body Perception: Facial expressions, social relationships

Talk Session: Saturday, May 17, 2025, 2:30 – 4:15 pm, Talk Room 2

Talk 1, 2:30 pm

Eyes vs. attentiveness: Pupil dilation widens the perceived cone of direct gaze

Clara Colombatto1 (), Sarah D. McCrackin2, Brian Scholl3, Jelena Ristic2; 1University of Waterloo, 2McGill University, 3Yale University

One of the most important social signals we perceive is the direction in which another person looks, especially when they are looking at us. In fact, humans perceive a range of eye-gaze deviations as directed towards (vs. away from) us, a range known as the Cone of Direct Gaze (CoDG). Does the CoDG reflect perceived eye direction per se, or might it indirectly reflect the degree to which we see another person *attending* to us? We explored this by asking whether the CoDG is affected by another salient property of others’ eyes often linked to attention—how dilated the pupils are. We reasoned that perceived pupil dilation (vs. constriction) may lead to increased perception of the gazer attending to the observer, resulting in a wider CoDG. We tested this idea in two preregistered experiments (N=150 each). Observers viewed faces with constricted, normal, or dilated pupils, embedded in eyes looking with varying eccentricities to the left, right, or directly at the observer. The manipulation of pupil dilation was entirely task-irrelevant, as observers’ task was simply to report the gaze direction that they saw. In an initial experiment, faces with dilated pupils were more likely to be perceived as gazing directly toward the observers, compared to faces with either normal or constricted pupils. When the faces were inverted in another experiment, however, this effect vanished—suggesting that the impact of pupil dilation on the CoDG is not just a function of the brute physical differences of the pupils themselves (which of course still exist even when inverted). We interpret this effect—that pupil dilation widens the CoDG—in terms of enhanced percepts of attentiveness. Judgments of gazing direction are thus influenced not just by physical properties of the eyes, but also by higher-level percepts of the cognitive states behind the eyes.

This project was funded by grants from NSERC to J.R. and C.C.

Talk 2, 2:45 pm

Minimal effects of stereopsis on processing realistic faces

Camille Proszanski1, Erez Freud1, Laurie M. Wilcox1; 1York University

There is some evidence of an upper visual field advantage for face processing that has been taken as support for evolutionary pressures for face detection. However, these effects tend to be small and the outcomes variable; this may be due to the use of 2D images, which lack the volumetric 3D information present in the real world. Here, we evaluate the impact of naturalistic 3D face stimuli (relative to 2D), and their location in the visual field, on face detection and recognition. Stereopairs of photorealistic face stimuli were presented using a mirror stereoscope in 3D and 2D. In all experiments, the target was present in 50% of the trials, and proportion correct was used to compute sensitivity (d’). In Experiment 1 (N=28), we used a visual search paradigm; stimuli were presented in a semi-circular array in either the upper or lower visual field. The distractor faces were tilted 15 deg to the left (or right), and observers indicated if the target face (tilted in the opposite direction) was present. We varied the number of distractors, location and modality (2D vs. 3D). This low-level task showed no effect of modality and a weak effect of location. In subsequent experiments, we used the same stimuli but in high-level recognition-based tasks. We varied task difficulty, modality and also tested both upright and inverted faces (Experiment 2, N=22; Experiment 3, N=26). We found no effect of 3D viewing or location in either of the experiments, nor was there an interaction. The presence of a strong face inversion effect confirmed that observers were processing the faces holistically. Our results suggest that visual field asymmetries may only occur for tasks that rely on low-level properties. Further, the lack of effect of stereopsis implies that 2D images can be reasonable proxies for natural 3D faces.

NSERC grant RGPIN-2019-06694

Talk 3, 3:00 pm

Continuous theta-burst stimulation of the posterior superior temporal sulcus and intersubject synchrony during naturalistic viewing

James Thompson1, Peter Kakalec1, Courtney Marsh1, Rebecca Roy1; 1George Mason University

Sharing an understanding of commonly experienced events is important for forming and maintaining social connections. Friends share similar patterns of functional MRI activity (intersubject correlation, or ISC) when viewing the same naturalistic stimuli such as movies. The superior temporal sulcus (STS) is one region that might play an important role in forming the shared neural representations that underlie ISC. The STS tracks social interactions during task-based and naturalistic viewing. Here we propose that shared representations rely on the perception of social interactions in the pSTS. We used TMS to examine the contribution of the STS to ISC and encoding of social information during naturalistic viewing. In one session, participants viewed videos consisting of people, places, food, objects, and scrambled videos, to localize pSTS. Resting motor TMS thresholds were also acquired. In a second scanning session, we administered inhibitory (continuous) theta burst TMS (cTBS) to functionally-localized right pSTS or vertex (sham) in a between groups design, before participants viewed a 20min movie during multiband/multiecho fMRI scanning. After scanning, participants were asked to recall details from the movie. ISC and intersubject pattern similarity (ISPS) were calculated from fMRI responses from parcels derived using the Schaeffer 200 parcel atlas. Memory accuracy for social but not non-social details was lower following cTBS to the pSTS, relative to sham. Lower ISC and ISPS during movie watching following cTBS to the pSTS relative to sham TBS was observed in limbic and cortical-limbic regions, while increased in ISC following cTBS, relative to sham, was observed in parahippocampal and frontoparietal cortex. This study provides details about the causal role of the STS to ISC and ISPS across multiple brain networks during the viewing of naturalistic social stimuli, and helps identify the contribution of social perception to the formation of shared neural representations.

Talk 4, 3:15 pm

Semantic and Social Features Drive Human Grouping of Dynamic, Visual Events in Large-Scale Similarity Judgements

Kathy Garcia1 (), Leyla Isik1; 1Johns Hopkins University

How do humans perceive naturalistic social scenes, especially in dynamic contexts? Similarity judgments offer critical insight into the mental representations humans use to perceive actions, form categories, and predict behavior. Real-world perception is complex and dynamic, involving features that static representations fully cannot capture. While prior research has focused on static scenes, less is understood about the features driving human similarity judgments in dynamic settings and their alignment with representations learned by deep neural networks (DNNs). To address this, we collected ~20,000 triplet odd-one-out similarity judgments from ~2.5 million possible unique triplets generated from a curated dataset of 250 three-second videos depicting everyday human actions—a scale far exceeding comparable studies. We then constructed a similarity matrix for all 250 videos, computing p(i, j), the likelihood of participants selecting stimuli i and j as similar when paired with different third videos. Finally, these similarity judgments were compared to human ratings across visual and social scene features, fMRI responses, video DNN extractions, and word embeddings derived from human-written captions, using representational similarity analysis (RSA). The results reveal striking patterns in human perception: word embeddings of video captions, reflecting how humans choose to describe the videos, showed the strongest correlation with overall similarity judgments, followed by ratings of “intimacy” (relational closeness) and video DNN embeddings. Similarity judgements correlated significantly with neural responses in regions, including EBA, LOC, STS (lateral stream) and FFA (ventral stream), but not early visual (EVC) or scene-specific (PPA) areas, underscoring the dominance of social over scene features. Together, these findings highlight the alignment between linguistic descriptions and cognitive strategies, revealing how verbal encodings of social features capture key mental processes shaping our understanding of dynamic social scenes.

Talk 5, 3:30 pm

Developing a non-human primate model to dissect neural mechanisms of human facial expression processing

Maren Wehrheim1,2,3, Shirin Taghian3, Hamidreza Ramezanpour3, Kohitij Kar3; 1Department of Computer Science, Goethe University Frankfurt, Frankfurt, Germany, 2Frankfurt Institute for Advanced Studies (FIAS), Frankfurt, Germany, 3Department of Biology, York University, Toronto, Canada

Understanding how the human brain processes facial expressions requires quantitative models that bridge neural mechanisms with behavior. While many qualitative descriptions exist, the field lacks tight coupling between neural hypotheses and behavioral measurements. To address this, we developed a comprehensive approach combining behavioral measurements, large-scale neural recordings in macaques, and computational modeling to investigate the computations underlying facial emotion discrimination. We first established a rigorous behavioral paradigm, using a binary human facial expression discrimination task across six emotions (360 images), comparing facial emotion recognition between humans and macaques. To probe the neural mechanisms, we conducted chronic multi-electrode recordings in the macaque inferior temporal (IT) cortex during passive viewing of emotional faces. Using 205 logistic regression decoders, we tested how different transformations of the IT population activity predicted behavioral error patterns. We evaluated a suite of artificial neural networks (ANNs) to identify computational models that mirror these neural processes. Significant image-by-image correlations (r=0.69, p<0.001) validated macaques as a suitable model for studying the neural basis of facial emotion processing. We found that macaque IT activity significantly predicted image-level behavioral responses in both humans (70-170 ms, R=0.49, p<0.001) and monkeys (70-110 ms, R=0.69, p<0.001). Additionally, we observed that traditional action-unit models for facial expression analysis are significantly less aligned with human behavior than other ANNs (e.g., ImageNet-trained models, CLIP, simCLR). In addition, ANN-IT representations most closely matched monkey IT, as assessed by representational similarity analyses, also predicted human behavior more accurately. These findings provide critical insights into the neural computations underlying facial emotion discrimination and establish macaques as a robust model for studying these processes. By integrating neural, behavioral, and computational insights, this work provides a critical step toward developing biologically plausible models of facial expression recognition.

KK has been supported by funds from the Canada Research Chair Program, the Simons Foundation Autism Research Initiative (SFARI, 967073), Brain Canada Foundation (2023-0259), the Canada First Research Excellence Funds (VISTA Program), and Google Research Award and MW by DFG – 414985841.

Talk 6, 3:45 pm

Intimate relationship experience predicts sensitivity to facial emotional expressions

Katherine A. Billetdeaux1, Brittany E. Woodruff1, K. Suzanne Scherf1; 1The Pennsylvania State University

Facial expressions provide critical non-verbal communicative signals in social interactions. We have shown that emerging adulthood is an important period for developing sensitivity to socially complex expressions, like those that provide important signals about the status of romantic/sexual partnerships (e.g., sexual interest; Motta-Mena & Scherf, 2017). Here, we tested the hypothesis that ongoing experience in intimate partnerships during emerging adulthood is associated with increased sensitivity to perceive these complex facial expressions. Emerging adult participants (N = 410, ages 18-25) completed a relationship questionnaire and emotional expression perception task. The relationship questionnaire asked participants to self-report about the presence of ongoing romantic and/or sexual relationships. We derived three scores: relationship duration (in months), relationship commitment, and relationship intensity (duration x commitment). Facial emotional expression stimuli consisted of 4 basic (angry, fearful, happy, sad) and 4 socially complex (betrayed, brokenhearted, contempt, sexual interest) expressions taken from the CEED database (Benda & Scherf, 2020). Each expression was morphed with a neutral expression from the same actor to generate 12 images. On each trial, participants observed the neutral expression and one of the morphed stimuli. They picked the image that displayed “more expression.” Perceptual thresholds were computed separately for each expression. Approximately half (58.3%) of the participants reported being in a romantic and/or sexual relationship. While there were no differences at the group level in perceptual thresholds for basic or complex expressions between those in a relationship and those not, we were primarily interested in whether ongoing experience in an intimate relationship predicted expression sensitivity. Among individuals in a current relationship, we found both the duration and intensity of the relationship negatively predicted thresholds to detect to both basic and complex expressions. These results suggest that as people gain experience in intimate partnerships, they become increasingly sensitive to both basic and complex facial expressions.

Talk 7, 4:00 pm

Perception of social interactions: Compositional forces as underlying mental representations

Yiling Yun1 (), Yi-Chia Chen1, Shuhao Fu1, Hongjing Lu1; 1University of California, Los Angeles

When one shape moves closer to another on a screen, we naturally see social interactions, such as greetings. While this type of animation can be described using low-level visual features (e.g., location, speed, distance) or high-level semantic labels, we proposed that mid-level representations of forces are how humans effectively represent social interactions. The computational advantage lies in the compositional nature of forces: Multiple forces can combine through vector addition to jointly act on an entity. We tested such a force model that represents social interactions through two types of compositional forces: interactive forces, driven by interactions between agents; and self-propelled forces, driven by individual intentions. These forces are quantified using parameters that capture the strength of attraction and repulsion as well as the distance at which one transitions to the other. In Experiment 1, we used an odd-one-out task to collect human similarity judgments for 27 animations, each generated with distinct semantic labels. With a noise ceiling of .811, calculated by correlating split-half human responses over 50 iterations, the force dynamics model provided a strong correlation with human similarity judgments (r = .520). Ablation analyses revealed that interactive force features contributed more significantly than self-propelled force features. Human judgments were less aligned with other control models, including a model based on low-level visual features (r = .411), a deep learning model (LSTM) trained to discriminate social interactions (r = .329), and language-based models using semantic labels (r = .146). In Experiment 2, we tested a different set of animations and replicated the same pattern of results. These findings suggest that people interpret social dynamics through compositional forces driven by distinct goals. Furthermore, this study sheds light on the development of social perception, which may build upon perceptual processes underlying intuitive physics.

NSF BCS 2142269