Humans vs. Large Language Models: Towards Addressing Visual Qualia

Poster Presentation: Sunday, May 18, 2025, 2:45 – 6:45 pm, Banyan Breezeway
Session: Decision Making: Models

Ziqian Cui1,2, Weisa Wu1,2, Shuai Chang3, Junting Hu4, Ming Meng1,2; 1Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China, 2School of Psychology, South China Normal University, China, 3State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, China, 4St. Paul’s School, 325 Pleasant St., Concord, NH, US.

Recent developments of large language models (LLMs) provide a unique opportunity to investigate how subjective visual experience (qualia) may function in human cognition, such as addressing the “Mary’s room” thought experiment. The current study examines how humans, influenced by emotional and social experiences, and LLMs, relying on statistical associations, evaluate imagined faces based on semantic facial feature descriptions. 2304 face descriptions were generated by combining ten facial features (gender, brow ridge, eyes, cheekbones, nose, nasal bridge, mouth, chin, skin, and the shape of the eye and mouth corners). Human participants (N=25) were asked to imagine faces based on descriptions and rate them on a 1–9 scale across dimensions, including maturity, emotional valence, gender, physique, trustworthiness, attractiveness, extraversion, leadership, dominance, flexibility, familiarity, and memorability. We also used API calls to five LLMs—Claude 3.5, GPT-4, GPT-3.5, Kimi, and ERNIE—to perform the same task with identical instructions. Representational similarity analysis (RSA) indicated a significant correlation between human ratings and the evaluations provided by the LLMs, with Claude 3.5 showing the highest similarity and GPT-3.5 the lowest. Uniform manifold approximation and projection (UMAP) dimension reduction demonstrated that both humans and LLMs distinguished faces with high and low scores across each dimension. However, significant differences were found in the distances between faces rated as high and low by humans and LLMs. A mixed-effects model examined the impact of facial features on dimensions, revealing that participants focused on eye and mouth corner shape, skin condition, and eye size, while LLMs considered a broader set of features. The results indicate that humans prioritize facial cues associated with emotional expression and social judgments, reflecting cognitive biases shaped by experiences, whereas LLMs, rely on statistical associations, resulting in a broader reliance on general facial features without the emotional and social stability inherent in human judgments.

Acknowledgements: This work is supported by (1) the National Nature Science Foundation of China (Grant No. 31871136), (2) the Science and Technology Ministry of China (Grant No. 2021ZD0204200), and (3) the Sino-German Center for Research Promotion (Grant No. M-0705).