A comparison of image naturalness perception between humans and image generative models
Poster Presentation: Monday, May 19, 2025, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Face and Body Perception: Parts and wholes
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Taiki Fukiage1 (); 1NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Japan
Image generative models, particularly diffusion models, have rapidly advanced, producing images increasingly indistinguishable from real photographs. By learning from large datasets of natural images, these models may capture features that humans perceive as "natural." This study examined how closely their assessments align with human perception through two experiments: one employing the Thatcher illusion and another focusing on lighting inconsistencies. In the first experiment involving the Thatcher illusion, we modified 140 face images by flipping the eyes and mouth. Human participants rated the naturalness of original and modified images in upright and inverted orientations using a five-point scale. Two diffusion models (Stable Diffusion v1.5 and Stable Diffusion XL) estimated naturalness using the variational lower bound of log-likelihood. For both humans and models, we computed "unnaturalness scores" as the difference in ratings or likelihoods between the modified and original images. As a result, both humans and models demonstrated the Thatcher illusion, showing significantly higher unnaturalness scores for upright than for inverted images. Moreover, significant correlations between human and model unnaturalness scores were found within both upright (r=0.52-0.59) and inverted conditions (r=0.29-0.49), suggesting reasonable alignment in perceived facial naturalness. In the second experiment, we rendered scenes with two or three objects using a physically based renderer. We introduced lighting inconsistencies by swapping object positions or flipping objects, resulting in 288 pairs of original and modified images with varying degree of unnaturalness. The results showed that both humans and models exhibited significantly positive unnaturalness scores, revealing their ability to detect lighting inconsistencies. However, humans were more sensitivity to these modifications, and correlations between human and model scores were modest (r=0.22-0.26), suggesting limited alignment at the individual image level. Overall, these findings demonstrate that while generative models can detect certain unnatural modifications similarly to humans, their sensitivity and alignment with human perception remain limited.
Acknowledgements: This work has been supported by JSPS Kakenhi 24H00721