Comparing Deep Neural Network Architectures as Models of Human Lightness and Illusion Perception.
Poster Presentation: Monday, May 19, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Color, Light and Materials: Lightness and brightness
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Jaykishan Patel1,2, Richard Murray1,2, Javier Vazquez-Corral3,4, Konstantinos Derpanis5; 1York University, 2Department of Psychology and Center for Vision Research, 3Department of Computer Science, Universitat Autònoma de Barcelona, 4Computer Vision Center, 5Department of Electrical Engineering and Computer Science
Inferring surface reflectance from luminance images is challenging, as identical luminance patterns can result from different combinations of illumination, reflectance, and 3D shape. Classical models have struggled with this ambiguity, but convolutional neural networks and vision transformers show promise in estimating surface color under varying illumination. This study tests the effects of model size, architecture, and the transferability of features from pretraining on a different task for reflectance estimation, along with whether different models utilize similar features for illusion perception. Using ResNet34, VGG13, MobileNetV3, and dense prediction transformer as encoders in a UNet architecture, we trained decoders to estimate reflectance from luminance images in a custom Blender-generated dataset. Encoders with pre-trained ImageNet or depth prediction weights were trained for reflectance estimation. Frozen encoders were utilized to evaluate whether their features could transfer to reflectance estimation without fine-tuning. Both frozen and fine-tuned models performed well on reflectance estimation, with frozen models being slightly less accurate. Model responses were computed for several illusions, including the argyle, Koffka-Adelson, snake, simultaneous contrast, White's, and checkerboard assimilation. Model responses were consistent with illusions perceived by human observers. Furthermore, illusion-like responses were weaker in control conditions, except for the argyle and assimilation illusions. Low-parameter models performed as well as high-parameter models on illusion perception but were less accurate in reflectance estimation, challenging the hypothesis that illusions arise from efficient coding. Saliency analysis showed that for all models, similar regions were responsible for perceived illusions, often focusing on shadowed areas. Saliency maps showed high correlation for regions contributing to illusion perception in all models, with slightly lower agreement for frozen models. These results suggest lightness illusions arise from visual systems' using natural scene statistics to generate an accurate perceptual correlate to reflectance and highlight the potential of deep learning architectures for modeling human lightness and color perception.