What does perceived similarity measure? A systematic comparison of eight similarity tasks

Poster Presentation: Tuesday, May 20, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Features and parts

Martin N Hebart1,2,3, Malin Styrnal2, Laura Stoinski1, Philipp Kaniuth1,2; 1Justus Liebig University Giessen, 2Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, 3Center for Mind, Brain and Behavior, Universities of Marburg, Giessen and Darmstadt

Similarity tasks are widely used in vision science and cognitive science for studying mental representations, object recognition, semantic processing, or categorization. Despite the many variants of different similarity tasks, to date little is known about how these tasks relate, the degree to which they measure the same underlying construct, and how efficient they are at measuring them when taking into account reliability. To address these questions, here we systematically compared eight different similarity tasks: (1) pairwise ratings, (2) pile sorting, (3) single and (4) multiple arrangement, (5) triplet odd-one-out judgments, (6) sequential forced choice similarity judgments, (7) speeded visual search and (8) speeded same-different judgments. We collected data from 100 online participants for each of the tasks using three sets of stimuli, ranging from natural objects embedded in scenes to abstract shapes. For the natural object images, we analyzed the similarity estimates between tasks and their alignment with deep neural network representations and a semantic embedding, respectively. The results showed that these tasks can be grouped into three types: (1) tasks that primarily capture visual features (response time tasks), (2) tasks that primarily capture semantic features (sorting tasks) and (3) tasks that capture both visual and semantic features (choice tasks). Within each group of tasks, we additionally determined their reliability and efficiency to support researchers with their choice of paradigm. For more abstract stimuli, differences between tasks became smaller, indicating that they tend to measure a similar construct, with less striking differences between tasks. Thus, when choosing a task one should consider not only what features are of interest, but also what stimuli are being used. Together, this work reveals the nature of the representations measured by different similarity tasks, provides suggestions for choosing one task over another, and highlights the role of task in the assessment of perceived similarity.