Macaque spatiotemporal neural dynamics during perception of object-object occlusion images
Poster Presentation: Monday, May 19, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Categories
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Wenxuan Guo1 (), Matthew Ainsworth2, Tim Kietzmann3, Marieke Mur4, Nikolaus Kriegeskorte1; 1Columbia University, 2University of Oxford, 3University of Osnabrück, 4Western University
The dynamic neural mechanisms for recognizing partially occluded objects are not fully understood. Previous studies often used partial fragments (Tang et al., 2014) or geometric shapes (Bushnell et al., 2011; Namima & Pasupathy, 2021). We investigated neural dynamics under ecologically valid conditions where objects occlude each other as in natural scenes. We recorded neural activity using Utah arrays implanted in foveal V4 and posterior TE of two macaque monkeys. Fixating monkeys viewed eight single objects and 56 object-object occlusion stimuli (250 ms duration) — resulting from all pairings of the objects. Instantaneous firing rates were estimated using a 50 ms sliding window spike count. Using cross-temporal decoding (Meyers et al., 2008), we investigated the temporal dynamics of neural population coding for occluder and occluded objects using linear SVMs. Decodability of a front object versus other front objects was computed by averaging decoding accuracy across pairs sharing the same back object. Similarly, we assessed discriminability for each occluded object versus others. Results showed that representations of front objects emerge earlier, are more decodable and stable over time than the occluded objects. Additionally, representations in posterior TE lag behind those in V4 but exhibit a more stably decodable temporal code. To assess whether spatial and temporal representations are separable, we applied tensor component analysis (TCA; Williams et al., 2018). We modeled the neural data tensor by decomposing it into a spatial mode (representing object-specific neural patterns) and two temporal modes (capturing dynamics for front and back objects separately). We computed cross-validated variance explained (R²) compared to a baseline model. TCA models explained significantly more variance in V4 than in posterior TE, indicating that spatial and temporal representations are more separable in V4. TEp, intriguingly, exhibited less space-time separable dynamics, but more sustained decodability of front and back objects.