Detection of artifacts in clean and corrupted video pairs is influenced by artifact type and presentation modality

Poster Presentation: Sunday, May 18, 2025, 2:45 – 6:45 pm, Banyan Breezeway
Session: Spatial Vision: Natural image statistics, texture

Niall L. Williams1 (), Anatolii Evdokimov2, Budmonde Duinkharjav1, Anjul Patney3, Qi Sun1, Jae-Hyun Jung3, Ruth Rosenholtz3; 1New York University, 2University of Richmond, 3Nvidia

Modern computer-generated videos display a variety of artifacts. While image-computable metrics exist to quantify the visibility of artifacts in images and videos, designers often rely in part on human observers to find artifacts and assess video quality. Furthermore, human labeling of artifacts is often an essential component of building image and video quality metrics. Yet, relatively little research has studied the impact of different video comparison interfaces on an observer’s strategies and ability to detect different artifact types. Different presentation modalities may require higher memory load or may make differences more visible, e.g. presenting videos side-by-side forces the viewer to saccade between matching regions of the video to do comparisons, showing one video at a time and allowing the user to toggle between them emphasizes small differences, and a split-screen view of both videos with a movable seam between the two videos affords precise inspection of specific regions of the video. Here we study how artifact search performance and behavior changes as a function of the video playback interface and the types of artifacts. Five participants identified and labeled the locations of artifacts (ghosting, compression, or added noise) in pairs of videos, with and without artifacts, using one of three different interfaces: side-by-side simultaneous viewing, temporal toggling between videos, or split-screen simultaneous viewing with a movable sliding seam. Results showed that observers correctly located ~25% of all corrupted pixels regardless of the viewing interface, but artifact type had a significant effect on detection rate: ghosting artifacts were harder to detect than compression and noise. The side-by-side viewing condition caused viewers to scan a larger percentage of the display while the split-screen condition produced more mouse movements (although cursor position distribution remained consistent across viewing conditions). Finally, task completion time was not significantly different across presentation modalities, except for one participant.