Visual Identity of Objects Facilitates Early Auditory Processing of Congruent Sound

Poster Presentation: Saturday, May 17, 2025, 2:45 – 6:45 pm, Pavilion
Session: Multisensory Processing: Audiovisual integration

Mincong Wu1,2, Andrew Marin1,2, Jacob Momsen2,3, Viola Störmer4, Seana Coulson2, Leslie Carver2; 1Columbia University, 2University of California, San Diego, 3Yale University, 4Dartmouth College

An object’s visual identity can elicit expectations of contingent sounds. We asked whether auditory responses are sensitive to auditory contingencies that are linked to the visual identity of an object. We were guided by the hypothesis that behavioral and neural responses are facilitated when sensory input matches expectations relative to violations. 20 neurotypical adults were exposed to audio-visual (AV) contingences in an exposure- test phase design. During exposure, participants viewed pairs of shapes that spun to generate high- or low-pitch tones that were predicted by the object's shape. Participants engaged in a 2-alternative forced choice pitch classification task while high- density EEG was recorded. During test, participants were shown three conditions: audio-only, AV-match, and AV-mismatch (3-level between-subjects factor). AV-match trials maintained original shape-sound pairings from exposure, while AV-mismatch trials switched these pairings in 20% of trials. We conducted three linear mixed-effects models that used amplitudes associated with the P50, N100, and P200 ERP components as the dependent measures. Another mixed-effects model was used to assess reaction times from the 2- AFC task across individual trials of the test phase. An analysis of response time identified a main effect of condition (p=.04), such that RT’s were faster for the AV-match condition compared to AV-mismatch, while the audio-only condition was not different from either. With regards to the EEG data, we observed a main effect of condition on the P50 amplitudes (p<.001), where amplitudes were greater for the AV- mismatch condition compared to AV-match (p=.002). We also observed a main effect of condition for the P200 amplitudes (p<.001), suggesting that the AV-match condition had smaller amplitudes compared to audio-only (p<.001) and AV-mismatch (p=.02). These findings show that an object’s visual identity can facilitate early sensory processing of sound linked to that object, advancing our understanding of how visual cognition affects auditory perception.