Graphical Perception: Alignment of Vision-Language Models to Human Performance
Poster Presentation: Sunday, May 18, 2025, 2:45 – 6:45 pm, Banyan Breezeway
Session: Decision Making: Models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Jenna Kang1 (), Grace Guo2, Raj Sanjay Shah3, Hanspeter Pfister2, Sashank Varma3; 1New York University, 2Harvard University, 3Georgia Institute of Technology
Vision-Language Models (VLMs) show promise in chart comprehension tasks that integrate visual and textual information. However, their alignment with human cognitive behaviors in graphical perception is not fully understood. This study evaluates the performance of a VLM, GPT-4o-mini, on seven graphical perception tasks from seminal behavioral studies (Heer & Bostock, 2010; Cleveland & McGill, 1984). The tasks enable assessment of the model's ability to extract and compare numerical values embedded in visualizations across variations in stimulus design, prompt structure, and task difficulty. Results from 315 visualization stimuli reveal that the VLM achieves human-like accuracy in some conditions. The strongest alignment with human judgments (ρ = 0.90) was for default stimuli with default stimuli and color cues or explanation-augmented prompts; here, the model matched the task difficulty profiles of humans. The model's sensitivity to visual design elements, such as segment contiguity and color usage, negatively impacted accuracy, even when the underlying numerical information remains unchanged, decreasing model accuracy by up to 41% compared to the default stimuli. The model exhibited improved accuracy when segments were visually distinct, and when prompts included explicit references to color to provide explanations. These findings offer insights into the alignment of VLMs with human graphical perception and suggest the potential of VLMs for the design and evaluation of data visualizations. The observed decline in accuracy under specific stimulus conditions, such as contiguous versus separated segments, raises potential theories for ongoing behavioral studies. These studies aim to explore whether humans exhibit similar biases and assess the feasibility of using Vision-Language Models (VLMs) to simulate human visual processing of graphical data. This research paves the way for further work integrating AI into the design and evaluation of data visualizations, showing the promise of work at the intersection between the vision science and data visualization communities.