Learning Same-Different Relations of Visual Properties by Humans and Deep Convolutional Neural Networks
Poster Presentation: Monday, May 19, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Perceptual Organization: Parts, wholes, shapes and objects
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Philip Kellman1, Nicholas Baker2, Austin Phillips1, Patrick Garrigan3, Hongjing Lu1; 1University of California, Los Angeles, 2Loyola University of Chicago, 3St. Joseph's University
Considerable research suggests that deep convolutional neural networks (DCNNs) trained for image classification do not access abstract relational properties needed for perception of global shape (Baker, Lu, Erlikhman, and Kellman, 2018) or detection of relations, including same-different relations (Puebla & Bowers, 2022; Baker, Garrigan, Phillips & Kellman, 2023). Due to its inherently comparative architecture, we hypothesized that a twin network, built on AlexNet sub-networks and trained on same-different comparison tasks, might show better capability for learning and generalization of same-different relations. We tested the network (pre-trained on ImageNet classification) and human participants on same-different learning for pairs of objects that could match or differ in color, texture, or shape. Each condition contained same-different trials, where same trials could match on either of two dimensions, producing three conditions: color-texture (shape irrelevant), color-shape (texture irrelevant), and shape-texture (color irrelevant). Human participants were trained on 256 trials. Human asymptotic learning performance in the shape-color and shape-texture conditions was reliably greater than in the color-texture condition. In contrast, the model yielded opposite learning results, showing better learning performance in the color-texture (shape irrelevant) condition than in the shape-color and shape-texture conditions. This difference highlights the network's limited capability to compare shape information in same-different judgments. Generalization testing using new stimulus values on the trained dimensions in each condition showed poor performance by the model, with a maximum accuracy of .62 on color-texture test stimuli after color-texture training, and no other accuracies higher than .55. Conversely, human participants showed excellent generalization to new stimulus values (.8), slightly exceeding performance levels shown at the end of training. These results suggest two salient limitations of same-different learning in a twin network: poor performance where shape is the basis of comparison, and difficulty in generalizing the same-different relation, even to new stimulus values within a trained stimulus dimension.
Acknowledgements: We gratefully acknowledge support for this research from National Institutes of Health grant R01CA236791 to PK.