A dynamic spatiotemporal normalization model for continuous vision
Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Temporal Processing: Neural mechanisms, models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Angus Chapman1, Rachel Denison1; 1Boston University
Motivation: How does the visual system process dynamic inputs? Both neural responses to and perception of a given stimulus are affected by temporal context from both past and future stimuli. Some effects of temporal context have been modeled using temporal normalization—the divisive suppression of neural activity by that occurring at other points in time. However, existing models do not compute responses in real-time, limiting biological feasibility, and do not pool suppression across neurons as is common in static normalization models. Here we ask whether the effects of temporal context on neural responses can be captured by a unified spatiotemporal receptive field structure that implements divisive normalization across space and time. Methods: We developed a dynamic spatiotemporal normalization model (D-STAN) that implements temporal normalization through excitatory and suppressive drives that depend on the recent history of stimulus input, controlled by separate exponential temporal windows. D-STAN extends on previous models, using a recursive neural network architecture with real-time simulation of sensory processing and decision-making, with spatiotemporal pooling of suppressive drives that allows for effects of temporal context both forward and backward in time. Results: Reverse correlation analysis of D-STAN’s sensory responses uncovered effective temporal receptive fields that followed a half “Mexican hat” profile, similar to empirical findings. This response profile was not built directly into D-STAN but emerged from the interaction between the excitatory and suppressive windows and the normalization computation. D-STAN also reproduced several non-linear properties of neural responses that depend on temporal context, including subadditivity, repetition suppression, and backward masking. Finally, D-STAN predicted changes in perception, capturing bidirectional contrast-dependent suppression between stimuli at different times. Conclusions: Temporal normalization within a population of neurons with spatiotemporal feature tuning can account for a wide range of neural and behavioral effects. D-STAN is a step toward dynamic movie-computable models for continuous vision.
Acknowledgements: Startup funds from Boston University to R.D.