Small-scale adversarial perturbations expose key differences between ANN-based vision encoding models
Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Object Recognition: Models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Nikolas McNeal1,3, Mainak Deb, Ratan Murty2,3; 1Machine Learning, School of Mathematics, Georgia Tech, 2Cognition and Brain Science, School of Psychology, Georgia Tech, 3Center of Excellence in Computational Cognition, Georgia Tech
Artificial neural network (ANN)-based encoding models have emerged as powerful tools in vision neuroscience, offering unprecedented accuracy in predicting responses of neurons, voxels, population patterns, and visual behaviors. But how robust are these vision encoding models to subtle perturbations in stimulus inputs? Are some models better than others? Despite their predictive success, relatively little is known about the susceptibility of vision encoding models to small, targeted stimulus manipulations otherwise expected to leave neural responses unaffected. Here, we focused on a previously untested property of widely used vision encoding models: their susceptibility to targeted stimulus perturbations. To this end, we trained ANN-based encoding models for high-level visual regions using the Natural Scenes Dataset. Consistent with prior reports, all models exhibited high accuracy in predicting responses to held-out images (all R > .46, P<.0001). Next, we assessed their susceptibility to small-scale “adversarial attacks” (ε = 3/255), ensuring the image changes were imperceptible to the human eye. To our surprise, we found that all encoding models were highly sensitive to these small-scale adversarial attacks, often dramatically changing their response predictions for nearly identical images. We then asked whether adversarial sensitivity could help find more brain-aligned models. Our results showed that adversarial susceptibility discriminated between encoding models more effectively than prediction accuracy alone (normalized variance .002 for accuracy versus .025 for adversarial robustness). Finally, we explored strategies to improve model robustness to targeted noise. Training models specifically for adversarial robustness increased resistance to perturbations but reduced prediction accuracy on brain data. In contrast, using sparse feature-to-brain mappings improved robustness while preserving accuracy (up to 51% median percent change improvement). Together, these findings expose key vulnerabilities in current ANN-based encoding models, introduce adversarial sensitivity as a complementary evaluation metric, and offer new model-to-brain mapping strategies for balancing robustness and predictive accuracy in future vision models.
Acknowledgements: This work was funded by the NIH Pathway to Independence Award (R00EY032603), and a startup grant from Georgia Tech (to NARM)