Re-Modeling the Inverted Face Effect for Unfamiliar Faces

Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Features

Garrison Cottrell1, Mrigankshi Kapoor2; 1UCSD, 2Computer Science and Engineering

Subjects perform poorly at recognizing upside-down faces. Previously, we presented an anatomically-inspired model with a foveated retina and the log-polar transform from the visual field to V1, followed by a standard CNN. The log-polar transformation causes changes in scale to appear as horizontal translations, while rotation in the image plane leads to vertical translations. When fed into a standard convnet, this provides rotation and scale invariance. However, because V1 is not a torus, for rotation, features “fall off” the top of the planar representation and reappear underneath, disrupting the configuration of the features, while preserving the features themselves, leading to the IFE. A standard CNN, fails to model the effect, being overly disrupted by inversion. Because the model was trained on these faces, this represents the IFE for familiar faces. For unfamiliar faces, in order to model the same effect, we created a simple memory model by storing noisy vector representations of the novel study faces from the penultimate layer of the network, without any weight updates. By adding noise at storage and/or inspection, the model will make a few errors on upright faces it has “studied” in this way. However, inverted faces also show small errors, as the representations still only differ slightly due to the noise. This failure to model the inversion effect on unfamiliar faces caused us to rethink the model. In order to bias the model towards upright faces, we trained the penultimate layer to be an attractor network for upright faces. Now, inverted faces with noise added do not match as well to novel inverted face activations after processing by the attractor network, restoring the inverted face effect. It is therefore important to consider dynamical representations of faces – in the form of attractor networks – in order to faithfully model the behavioral data.

Acknowledgements: This work was supported by NSF CRCNS grant #2208362