Visual Cognition in Vision-Language Models
Poster Presentation: Saturday, May 17, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Theory
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Krista A. Ehinger1; 1The University of Melbourne
Large language models (LLMs) show human-level performance in a range of language tasks such as question answering, text editing, and text composition. These models are trained on massive text datasets and show an impressive ability to flexibly recombine what they have learned in novel ways to perform arbitrary tasks (Brown, et al., 2020). The latest generation of LLMs are multimodal and able to process images as well as text. Do these vision-language models (VLMs) learn similarly flexible representations for visual tasks?