Representation of Object Depth with the Macaque Inferior Temporal Cortex

Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Features and parts

Esna Mualla Gunay1, Kohitij Kar1; 1Department of Biology, York University, Toronto, Canada

Recent studies have demonstrated that the inferior temporal (IT) cortex of macaques encodes various attributes of visual objects (Hong et al., 2016). However, the representation of object depth with respect to the background scene within IT has remained relatively unexplored. In this study, we investigate whether monocular depth information (via 2D images) can be decoded from the macaque IT activity and artificial neural network (ANN) models of the macaque ventral stream. Using 465 single-object images from the Microsoft COCO dataset, annotated with depth information across ten object categories, we found that ANN models of IT (VGG-16) significantly predict object depth (R = .77, p < 0.001). Partial correlation analysis, controlling for size, confirmed that depth is indeed independently represented. Interestingly, ANNs predicted object depth better for distant objects (R = .57, p < 0.001) compared to near objects (R = .23, p < 0.001). To test whether macaque IT shows similar patterns, we recorded neural activity from 576 IT sites in two macaques as they passively viewed the same images (100 ms presentations). Consistent with the ANNs, we significantly decoded depth information from the IT population activity (R = .71, p < 0.001). Furthermore, IT activity also better predicted the depth of distant objects (R = .55, p < 0.001) than the near ones (R = .25, p < 0.001 ). Interestingly, we also observed that size and depth are entangled together in the ANN representation during earlier layers and get increasingly disentangled with depth (as object identity becomes more linearly separable). Future work will test these hypotheses with recordings across the ventral stream. These findings highlight the capacity of both macaque IT and ANNs to encode depth information, revealing parallels in their representational hierarchies.

Acknowledgements: KK has been supported by funds from the Canada Research Chair Program, the Simons Foundation Autism Research Initiative (SFARI, 967073), Brain Canada Foundation (2023-0259), the Canada First Research Excellence Funds (VISTA Program), and Google Research Award