Motion: Models, Neural mechanisms

Talk Session: Tuesday, May 20, 2025, 2:45 – 4:45 pm, Talk Room 2

Talk 1, 2:45 pm

Duration of occlusion effect is better explained by feedforward than recurrent computations

Giuliana Martinatti Giorjiani1,2, Rosanne L. Rademaker1; 1Ernst Strüngmann Institute for Neuroscience in Cooperation with the Max Planck Society, 2Department of Cognitive Neuroscience, Vrije Universiteit Amsterdam, The Netherlands

The ability to predict future locations of a moving object from its past trajectory underpins a wide range of complex behaviors. Already in the retina, motion extrapolation is found in ganglion cells that fire in anticipation of a stimulus entering their receptive fields, presumably compensating for neuronal transmission delays on short time scales (milliseconds) in a feedforward manner. In contrast, goal-oriented behaviors like target interception or tracking under occlusion rely on accumulated information from an object’s past trajectory over longer time scales (seconds), requiring a storage component in addition to feedback. Recurrent processing could serve as such a storage mechanism for goal-oriented motion extrapolation. We tested this by comparing the performance of feedforward and recurrent neural networks to human participants in a motion extrapolation task involving visual occlusion. Human participants covertly and continuously tracked a target moving along a circular trajectory at constant speed. An invisible occluder masked half of the trajectory. A “shot” (red dot) could briefly appear either ahead, on top of, or behind the (visible or occluded) target location at different time points. Participants judged the shot’s position relative to the target via a two-alternative forced-choice. Results show a decay in precision with increasing occlusion time, accompanied by an acceleration bias in target location estimates. The networks were trained on a synthetic dataset with 640.830 periodic trajectories. They were then tested by predicting the “occluded” segment of the trajectory from the input (“visible” segment). Both networks exhibited a decline in precision over time, with the feedforward network more closely resembling the decay found in human data. However, neither network replicated the human acceleration bias. These findings suggest that decay in precision may stem from feedforward computations, while the acceleration bias likely reflects learned priors, potentially carried through feedback processing rather than from feedforward or recurrent processing.

Talk 2, 3:00 pm

Conflicting heading biases explained by different reference frames

Renate Reisenegger1,2, Ambika Bansal3, Laurence R Harris3, Frank Bremmer1,2; 1Applied Physics and Neurophysics, Philipps-Universität Marburg, Karl-v-Frisch Str 8a, 35043 Marburg, Germany, 2Center for Mind, Brain and Behavior, Universities of Marburg, Giessen, and Darmstadt, Hans-Meerwein-Str. 6, 35032 Marburg, Germany, 3Center for Vision Research, York University, Toronto, Ontario, Canada

Walking or moving around the world generates a characteristic visual stimulus called optic flow. Optic flow provides essential information about our self-motion, including direction (heading). Heading perception has been studied extensively with some studies showing a bias towards straight-ahead (centripetal) while others report the opposite bias – away from straight-ahead (centrifugal). It was recently suggested that response ranges cause the opposing biases, with smaller ranges generating centripetal and larger ranges producing centrifugal biases (Sun et al., bioRxiv, 2024). However, we noticed that centripetal biases are observed when participants are asked to “point” to their perceived heading along a horizontal line in an egocentric reference frame. In contrast, centrifugal biases arise when participants imagine themselves from above (a “bird’s eye view”) and report their heading as an angle on a circle or arc in an allocentric reference frame. This suggests egocentric vs allocentric reference frames may induce different heading biases. We assessed heading perception using the “Edgeless Graphics Geometry display” (EGG), a very large-range (~224° field of view) edgeless display. Participants viewed an optic flow stimulus (600ms) simulating self-motion across a ground plane in various directions within ±55° of straight ahead. They then reported their perceived heading by positioning a target on either (i) a horizontal line spanning ±110° (egocentric condition) or (ii) an arc spanning ±110° as seen from above (the “bird’s eye view” or allocentric condition). Results showed a centripetal bias in the egocentric condition and a centrifugal bias in the allocentric condition. These findings challenge the recent proposal that heading biases are solely driven by response range. Both our conditions had the same large (±110°) response range, which would predict centrifugal biases, yet we observed a centripetal bias in the egocentric condition. Instead, we show that biases in heading perception are primarily driven by the reference frame used.

Supported by the Deutsche Forschungsgemeinschaft (DFG: IRTG-1901 and CRC/TRR-135), the Hessian Ministry of Higher Education, Science, Research and the Arts (HMWK: Clusterproject The Adaptive Mind), and the EU (PLACES). The EGG display was provided by the Canadian Foundation for Innovation.

Talk 3, 3:15 pm

Comparing young with older adults in terms of causal inference during complex motion perception

Maeve Silverman1, Xinyi Yuan1, Sabyasachi Shivkumar2, Ralf M. Haefner1; 1University of Rochester, 2Columbia University

Surround suppression has been documented neurally and psychophysically in multiple sensory modalities and brain areas (Tadin et al. 2003). Interestingly, prior work has found differences in the strength of psychophysical surround suppression correlated with IQ (Melnick et al. 2013), mental disorders, and age (Betts et al. 2005, 2009). Surround suppression is also conjectured to be of critical importance for scene segmentation. The strongest visual cue for segmentation is common motion (“common fate”). Recently, motion perception for center-surround stimuli was formalized as Bayesian causal inference over reference frames, and shown to accurately capture the range of percepts reported by human observers for different center-surround configurations (Shivkumar et al. 2024). Here, we leverage the previously developed subjective motion direction estimation task in order to study potential differences in motion perception and causal inference between young and older adults. Fitting a causal inference model to the data allowed us to characterize differences not just in terms of the raw data (e.g. strength of bias) but also in terms of the underlying model parameters. We collected data from 10 college age observers (age 18-25), and 10 older adults (above 65). Each observer contributed 330 perceived direction reports of coherently moving dots surrounded by either moving or stationary dots. Varying the relative direction of center and surround, we could constrain the underlying causal inference process that characterizes the transition from integration to segmentation. Interestingly, we found no significant differences in the response biases between young and old. However, we found significant differences between two model parameters: sensory noise and computational noise were almost an order of magnitude higher in older adults than in young adults (p<0.01 and p<0.05, respectively) implying a high sensitivity of our task to age-related changes, but no noticeable change to the beliefs underlying causal inference over the lifespan.

We acknowledge funding support from NIH/U19 NS118246, and NSF/CAREER IIS-2143440.

Talk 4, 3:30 pm

V1-independent development of direction tuning in higher order visual cortex

Brandon R. Nanfito1,2,3 (), Kristina J. Nielsen1,2,3; 1Johns Hopkins School of Medicine, 2Zanvyl Krieger Mind/Brain Institute, 3Kavli Neuroscience Discovery Institute

Early postnatal visual experience drives immature cortical circuits to refine their tuning for features like direction of motion. The network-level mechanisms that underly these changes remain unclear. Many believe visual cortical development occurs sequentially and beginning in primary visual cortex (V1). However, early projections from first order visual thalamus to more rostral visual areas suggest parallel streams of input could support V1-independent development of higher order visual cortex. In the present study, we used two processing stages in the ferret visual motion pathway, V1 and higher motion area PMLS, as a platform to test the necessity of canonically ‘first’ order visual inputs in the functional development of higher order visual cortex. Visually naïve kits (postnatal day (P) 28-32; ferrets open their eyes ~P30) were anesthetized and exposed to an acute experience paradigm previously observed to induce rapid functional development of direction tuning in ferret visual cortex. Using simultaneous electrophysiological recordings of spiking activity in both areas, the initial direction tuning was assessed before bidirectional drifting gratings were shown to the kits for 8 hours. We inactivated V1 during stimulus presentations with localized hypothermia to assess the contributions of its inputs to the stimulus-driven changes in PMLS response properties. After the 8 hours of visual experience, direction tuning was assessed again in both areas. Analysis of single- and multi- unit responses using conventional tuning metrics showed increased direction tuning in PMLS, but not in V1, suggesting that functional development of direction tuning in PMLS can occur independently of V1 inputs. Preliminary analysis using representational distances of population responses hints that, despite no significant change at the single unit level, orientation tuning in V1 may still increase at the population level. This would suggest encoding of stimulus orientation in V1 responses can improve independently of visually driven activity in V1.

This work was supported by the NIH (1R01EY035807) and the Kavli NDI

Talk 5, 3:45 pm

The role of active vision in the primary visual cortex of freely-moving marmosets

Jingwen Li1, Vikram Singh1, Jude Mitchell2, Alexander Huk3, Cory Miller1; 1UC San Diego, 2University of Rochester, 3UC Los Angeles

Historically, studies of visual cortex have been performed while nonhuman primates are head-fixed viewing visual stimuli on a screen. In the real world, however, visual processing must accommodate how we actively explore the environment. Despite its significance, little is known about how the primate visual system supports natural, active vision in freely moving animals. To address this problem, we leveraged an innovative, head-mounted eye-tracking system developed for marmosets in our lab while simultaneously recording the activity of single neurons in V1 to examine the effect of eye, head, and body movement on visual representations. In these experiments, monkeys are first head-fixed to quantify traditional visual stimuli for receptive field and tuning properties, and then brought to freely explore a large arena where high-contrast visual stimuli are shown on the wall. We first successfully recapitulated the receptive fields and turning properties of the V1 neurons in the head-restrained scenario. In the freely-moving scenario, we found that the primate V1 population consistently response to gaze shift and fixation with a suppression followed by an enhancement in a sequential latency. A model of gaze response built upon this mechanism well recapitulates the observed data. The effect of locomotion, however, modulates the baseline of neural activity and is speed correlated. To further dissect the neural responses driven by visual input and movement, we performed the experiment in the dark condition. Surprisingly, most responses to gaze movement are disrupted and the effect of locomotion is no longer significant in the dark. Preliminary analyses of visual scenes show a different statistic of visual input induced by locomotion, influencing V1 neurons differently depending on their turning properties. These data are the first to examine the neural basis of active vision in a freely-moving primate and have a significant influence on our conceptions of natural vision.

The research is funded by NIH UF1 NS116377, NIH R01 NS118457, AFOSR 19RT0316, Kavli Institute for Brain and Mind Postdoctoral Award.

Talk 6, 4:00 pm

Motion-Induced Object Position Adaptation in Macaque IT Cortex: Temporal Processing Limitations of Neural Networks

Elizaveta Yakubovskaya1, Hamidreza Ramezanpour1, Kohitij Kar1; 1York University

Recent studies have demonstrated that the macaque inferior temporal (IT) cortex, a key area in the ventral visual pathway, supports not only object identification but also object position estimation—a function previously attributed to dorsal-stream mechanisms. In parallel, artificial neural networks (ANNs) optimized for object recognition replicate this positional decoding capability. Such findings invite an intriguing question: If these ventral-stream-aligned ANNs can recapitulate positional coding, can they also exhibit systematic positional biases induced by adaptation to motion, analogous to the well-known human aftereffects? We tested this by simulating adaptation in ANNs through the exponential decay of model features (Vinken et al., 2020). Using "brain-mapped" ANN architectures—feedforward convolutional networks (AlexNet, VGG), networks with skip connections (ResNet), and transformer-based models (ViT)—pre-trained on ImageNet, we analyzed responses from their most "IT cortex-like" layers to naturalistic test images. While these ANNs robustly decoded object positions under static conditions, none exhibited motion-adaptation-induced positional shifts. In contrast, when we presented motion-adapting rightward or leftward moving grating stimuli (3000 ms) to two passively fixating macaques and recorded large-scale IT responses to succeeding test images (40 images containing 1 of 8 objects, with varying latent parameters embedded in naturalistic backgrounds), the resulting position decodes from IT population activity showed directionally specific biases matching human perceptual aftereffects. These results suggest that the neural mechanisms in IT supporting adaptive shifts in perceived object position are not fully captured by current ANN models. Therefore, we hypothesized that additional history-dependent, nonlinear transformations might explain these dynamic adaptation effects. Testing a state-of-the-art dynamic video recognition model (SlowFast with ResNet-50 backbone) showed that even this more temporally sophisticated model failed to reproduce adaptation-induced positional aftereffects. Our results reveal a key gap: while current ANNs can decode position, they lack the temporal processing needed to replicate the adaptive positional biases in the macaque ventral stream.

KK was supported by Canada Research Chair Program, Simons Foundation Autism Research Initiative (SFARI, 967073), Brain Canada Foundation (2023-0259), CFREF (VISTA Program), and a NSERC DG. EY was supported by NSERC CGS-M and CFREF (Connected Minds).

Talk 7, 4:15 pm

The modulatory effect of intentionality on neural tuning to perceptual cues of interactivity

Sajjad Torabian1, John A Pyles2, Hongjing Lu3, Emily D Grossman1; 1University of California, Irvine, 2University of Washington, 3University of California, Los Angeles

Introduction. Humans share strong intuitions about interactive behaviors that imply intentions and goals. Perceived intentionality exists on a spectrum ranging from minimal intentionality to clear attributions of goal-directedness. Here, we use a computational model that extracts two key attributes of interactions as conveyed in decontextualized animations: the degree of intentionality and the extent to which the movements violate the laws of physics (Shu et al., 2021). We then evaluate how each of these high-level attributes, and those of perceptual features, best explain the neural representational similarity structure in the multivariate patterns within social cognitive brain systems. Methods. In a rapid event-related fMRI design, participants rated short animations (3s each) of two moving shapes as either interacting agents or physical objects. Each animation was characterized by the degree of interactivity conveyed in perceptual features, such as distance between the shapes, as well as low-level properties (e.g. speed), and high-level attributes indexing the degree of intentionality and violations of physics. The similarity structure derived from feature scores were then compared against the neural similarity elicited by the animations. Results. Each parcel in the superior temporal sulcus (STS) reflected a unique weighting of features defining interactivity, perceptual properties, or human ratings that accounted for variance in similarity structure. The attribute capturing the extent of violations of physics best explained neural activity in the anterior intraparietal sulcus (AIP). Considering only the similarity of the agentic interactions revealed the similarity structure in the same parcels in the STS (but not the AIP) to be further weighted by the degree of intentionality. Conclusion. Together these results demonstrate neural tuning for features that convey interactivity are modulated by the extent to which those features convey intentionality. These results are consistent with a model in which the salience of social behaviors is enhanced by the implied goal-directedness.

BCS-1658560 to EG & BCS-1658078 to JP & NSF BCS 2142269 to HL

Talk 8, 4:30 pm

For MSTd autoencoding is all you need

Oliver Layton1, Scott Steinmetz2; 1Colby College, 2Sandia National Labs

Neurons in brain area MSTd demonstrate tuning to optic flow patterns that resemble those encountered during navigation. Computational neural models have been developed to elucidate the link between MSTd and self-motion perception, most of which possess a hierarchical design wherein MSTd-like optic flow tuning emerges after successive stages. For example, models tend to contain a V1-like stage in which local motion is extracted, a MT-like stage in which direction and speed is estimated over extended regions of space, and a MSTd-like stage that contains tuning to optic flow patterns. Deep neural networks (DNNs) likewise adopt this hierarchical design and when trained to accurately classify natural images they have been successful at modeling brain areas along the primate ventral stream. When we trained DNNs to perform the analogous task of accurately estimating self-motion from optic flow (Layton & Steinmetz, 2024), interestingly, we found poor correspondence with MSTd optic flow tuning properties compared to the simpler non-negative matrix factorization (NNMF) model of Beyeler et al. (2016). Rather than attempting to accurately estimate self-motion, NNMF reconstructs MT motion inputs from a low dimensional representation. To determine whether this difference in computational objective accounts for the discrepancy, here we investigate whether MSTd-like optic flow tuning emerges in autoencoders, artificial neural networks that share a common computational objective with NNMF. While we find that autoencoders produce more MSTd-like tuning than accuracy-optimized DNNs, the correspondence with MSTd is weaker than with NNMF. Training autoencoders on a MT-like representation rather than a motion vector representation of optic flow substantially improves the alignment with MSTd properties. Making the same adjustment to accuracy-optimized DNNs does not improve the correspondence. Our results suggest that the computational objective of autoencoders aligns more closely with that of MSTd and the motion representation in MT may critically shape optic flow tuning in MSTd.