Training convolutional neural networks with blurry images enables the learning of more human-aligned visual representations
Poster Presentation: Saturday, May 17, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Theory
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Ikhwan Jeon1, Connor Parde1, Frank Tong1; 1Vanderbilt University
Although convolutional neural networks (CNNs) can achieve human-level recognition accuracy on natural images, research has revealed systematic deviations between CNNs and human vision, including susceptibility to visual noise (Jang et al., 2021; Geirhos et al., 2018) and insensitivity to shape information (Geirhos et al., 2019). However, recent work has shown that these deviations can be reduced by providing CNNs with auxiliary training on blurry images (Jang and Tong, 2024). In this work, we further demonstrate how blur training can improve the alignment between CNNs and human vision by evaluating the quality of metameric stimuli generated from CNNs (Feather et al., 2023). The metamers of CNNs are defined as image pairs that produce nearly identical responses; such metamers can be generated by modifying an initially random noise image until it produces CNN responses that closely mimic the responses to a reference object image. To investigate the potential benefits of CNN blur training, we generated metamers from both standard and blur-trained CNNs. We also considered metamers generated from both RGB- and grayscale-trained models to test the hypothesis that color information may allow CNNs to learn “shortcut” strategies that are less aligned with human vision. All metamers were generated using the responses from the final convolutional layer of a given CNN. Human observers and cross-validated CNN models then classified the metamers generated from each CNN. Across all conditions, the metamers of blur-trained models were recognized more accurately than those generated from clear-trained CNNs. This general benefit of blur training for creating recognizable CNN metamers indicates that blur training improves alignment between the internal representations of CNNs and the human visual system. Furthermore, this effect was more pronounced in grayscale than RGB, suggesting that color-based short-cut learning may have been mitigated, facilitating the learning of more canonical visual representations.
Acknowledgements: This research was supported by NEI grants R01EY035157 to FT and P30EY008126 to the Vanderbilt Vision Research Center.