Exploring Semantic and Visual Information in Face Perception and Self-Perception with Deep Neural Networks

Poster Presentation: Saturday, May 17, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Neural

Arijit De1, Shao Feng Liu1, Kinkini Monaragala1, Adrian Nestor1; 1University of Toronto

Extensive work has evaluated the contributions of visual and semantic information to face perception, with recent efforts leveraging deep neural networks. Building on this body of work, the present study evaluates the robustness and effectiveness of various neural network models in capturing these contributions. To this end, female White participants (n = 40) rated the pairwise similarity of unfamiliar and familiar (i.e., famous) faces, including their own faces. The stimuli comprised female White young adult faces with neutral expressions. In addition, participants rated all faces for attractiveness and familiarity. Regarding semantic information, a sentence generative pre-trained transformer (SGPT) (Muennighoff, 2022) reliably accounted for relevant variance in the behavioral data. Its explanatory power, as expected, was modulated by face familiarity and depended on the source of information (e.g., celebrity descriptions provided by AI conversational agents were more effective than Wikipedia entries). Regarding visual information, discriminative models (e.g., ArcFace; Deng et al., 2019) and generative models (e.g., StyleGAN; Karras et al., 2020) trained with face images provided complementary and overlapping contributions to explaining the data. Further, we found that explanatory power varied as a function of training set and architecture (e.g., StyleGAN2 outperformed StyleGAN3 in this respect). Last, StyleGAN2’s explanatory power was harnessed to map behavioral data into its latent space. Then, we used its generator to synthesize hyper-realistic approximations of unfamiliar and familiar face percepts, including the participants’ own faces. These findings demonstrate the utility of combining semantic and visual models to study face perception and highlight the potential of generative networks to recover visual representations. Further, this approach provides a novel framework for exploring the cognitive basis of self-perception.

Acknowledgements: Natural Sciences and Engineering Research Council of Canada