Investigating temporal integration in primate visual cortex using naturalistic video stimuli

Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Temporal Processing: Neural mechanisms, models

Nathan C L Kong*1, Akshay V Jagadeesh*2, Margaret S Livingstone2; 1University of Pennsylvania, 2Harvard Medical School *Equal Contribution

Animal behavior is supported by dynamic neural representations that encode information from continuous visual experience. The vast majority of object vision research has examined representations of static images and behaviors that can be performed with static stimuli, such as object categorization. However, numerous behaviors, such as action recognition, causal attribution, or invariant object representation learning, may require temporally integrating information over continuous visual experience. It remains unknown whether the ventral visual cortex, widely thought to support object vision, is capable of such temporal integration. Here, we investigate the role of ventral visual cortex in encoding and integrating information over time. Using chronically-implanted microelectrode arrays in visual cortex of macaque monkeys, we collected 30 hours of neural responses from inferior temporal (IT) cortex while subjects viewed 960 three-second videos from the Moments in Time dataset. These naturalistic videos contain a wide variety of objects and actions and were presented either in their original form, in reverse, or statically (single video frame). We found that action decoding on time-averaged neural responses to original videos was significantly above chance, but not significantly different from decoding performance using responses to static frames, suggesting decoding was driven primarily by visual feature differences. To assess temporal integration in IT cortex, we evaluated how well responses at each timepoint predict every other timepoint and found that responses were only significantly predictive of each other within a relatively narrow temporal band. Finally, we evaluated a variety of computer vision models, including static image models and video models that integrate over video frames, and found substantial gaps between all models’ neural predictivity and the noise ceiling. Our results provide preliminary evidence that temporal integration in IT is limited and narrow.