Do deep neural networks perceive contextual visual illusions?
Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Object Recognition: Models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Hojin Jang1,2, Pawan Sinha2; 1Department of Brain and Cognitive Engineering, Korea University, 2Department of Brain and Cognitive Sciences, MIT
Through a combination of innate circuit mechanisms and visual experience, human visual perception comes to incorporate sensitivity to contextual cues, which provide important information for interpreting the environment. While these contextual influences enable accurate perception in complex and dynamic settings, they can also give rise to systematic biases under certain conditions, as observed in phenomena like visual illusions. Classic illusions such as the Delboeuf, Ebbinghaus, Ponzo, and Müller-Lyer vividly demonstrate how context alters the perception of relative size, often leading to significant misjudgments. Modern deep neural networks (DNNs), which have demonstrated remarkable success in emulating human perceptual behaviors, raise the following question: Can artificial vision systems, that incorporate some of the architectural properties of their biological counterparts, and are trained on natural imagery, also develop susceptibility to such illusions? Addressing this question is not straightforward, given the challenges of querying the perceptual ‘experience’ of these systems. To address this question, we employed neuroscience-inspired methodologies, including univariate spatiotopic analysis to assess neural responses at target locations and multivariate decoding analysis to examine representational patterns across network layers. Using ImageNet-trained neural network models, our preliminary results have revealed that while neural responses in DNNs scale predictably with physical size and remain unaffected by contextual cues, multivariate decoding shows that illusion-like effects emerge in deeper layers, aligning with human perceptual biases. These findings suggest that hierarchical processing architectures and extensive visual training may drive susceptibility to contextual illusions.
Acknowledgements: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (RS-2024-00451866).