Human and machine perception of material similarities
Poster Presentation: Saturday, May 17, 2025, 2:45 – 6:45 pm, Pavilion
Session: Color, Light and Materials: Surfaces and materials
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Maarten Wijntjes1, Yuguang Zhao1; 1Delft University of Technology
Large Multimodal Modals can be subjected to similar psychophysical paradigms as human observers, affording comparison between human and machine vision. In this context, we explored material perception. We created 32 stimuli of a constant 3D shape but with various material properties. Then we presented them in 1193 triplets in an odd-one-out task for both humans (N=18) and machine. The machine judgements were performed with gpt-4o, which has vision capabilities. Triplet data was both analysed directly, and also used to create perceptual embeddings using Soft Ordinal Embedding (SOE). The raw triplet data revealed an interesting commonality between human and machine judgements when we compared the ‘popularity scores’ of odd-ones-out: a group of 6 stimuli was substantially more different from the remaining 26 stimuli. Furthermore, we found that 47% of the triplet judgements were similar for the human and gpt-4o data, which is well above chance level (33%). The SOE analysis revealed that the accuracy (agreement between raw triplet data and multidimensional embeddings) was substantially higher for machine than human vision, indicating a higher degree of internal consistency. Also, we found a full saturation at 6 dimensions for the machine data: all triplets could be accounted for by the embedding. Besides various commonalities, the embeddings themselves revealed some peculiar differences. Firstly, translucent stimuli were close for humans but distant for the machine. Secondly, the machine embedding showed a clear cluster of achromatic stimuli, while this was entirely absent in the human data. This suggests that computers use colour for material perception, while humans do not. With some imagination, one could argue that human material perception partly prepares for physical interaction where colour is irrelevant, while the algorithm does not (yet) have a body to interact with the outside world.