Feature accentuation along the encoding axes of IT neurons uncovers hidden differences in model-brain alignment

Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Object Recognition: Models

Jacob S. Prince1 (), Binxu Wang1,2,3, Akshay V. Jagadeesh3, Thomas Fel1,2, Emily Lo1, George A. Alvarez1, Margaret S. Livingstone3, Talia Konkle1,2; 1Harvard University, 2Kempner Institute for the Study of Natural and Artificial Intelligence, 3Harvard Medical School

While deep neural network (DNN) encoding models increasingly achieve high predictivity of neural responses to natural images, it remains unclear whether these scores indicate algorithmic or mechanistic alignment between models and neural systems. Here we introduce a novel paradigm for rigorously testing DNN encoding models based on how well they can control neural responses. As a case study, we consider a resnet-50 and an adversarially robust variant, whose encoding models of IT neural responses to natural images achieve nearly identical R2 predictivity. However, using an explainable AI (xAI) technique called feature accentuation, we found dramatic differences in these models’ ability to control neural responses. Specifically, for each neural site, we synthesized image sets predicted to parametrically drive neural activity along the encoding axes in the target model’s feature space, which critically relies on the hierarchical computations and mechanisms of the target model. We presented these accentuated stimuli to the same monkey under identical recording conditions the day after synthesis. In this test of "parametric control," we found that stimuli from the robust model achieved precise modulation of neural firing: responses reliably and predictably aligned with each feature level. In contrast, baseline resnet-derived stimuli showed far weaker parametric control. Qualitatively, the robust model accentuations enhanced cohesive object contours, such as face-like curvatures, whereas baseline accentuations predominantly altered textural features, such as fur-like patterns. These results highlight that adversarially robust training may naturally pressure learning of more brain-relevant features, compared to standard objectives. More broadly, these results show that models with similar encoding predictivity for natural images can be distinguished through targeted tests of fine-grained parametric control along the encoding axes, revealing that some models offer better controllability than others. By bridging neuroAI and xAI, this approach emphasizes mechanistic alignment as a key goal for linking DNNs and brains.

Acknowledgements: This research was supported by NSF CAREER BCS-1942438 (TK), and an NDSEG grant (JSP)