Predicting pathologist attention during cancer-image readings

Poster Presentation: Monday, May 19, 2025, 8:30 am – 12:30 pm, Pavilion
Session: Attention: Temporal

Gregory Zelinsky1 (), Souradeep Chakraborty1, Joel Saltz1, Dimitris Samaras1; 1Stony Brook University

We obtained the attention behavior of pathologists conducting cancer readings of whole-slide-images (WSIs) of prostate. WSIs are giga-pixel in size and comparable in scale to geo-spatial imagery, precluding their exhaustive inspection. The pathology task is more comparable to the expert navigation through a 3D space in targeted pursuit of information rather than standard visual search. Accordingly, we measured the x, y, and m (magnification) movement of a pathologist's viewport as they navigated through a WSI using a digital microscope, which we named their attention trajectory. We did this for 43 pathologists providing Gleason gradings for 123 prostate WSIs, from which we obtained 1,016 attention trajectories. To pre-process the data for model training, we developed and used a fixation-extraction algorithm to convert the densely sampled attention trajectories into sparser scanpaths of attention “fixations”. We trained a single two-stage model on these attention fixations to predict spatio-temporal attention scanpaths from pathologists in disjoint test data. In the first stage of processing, we train a vanilla vision transformer to predict the attention heatmaps computed for multiple magnification levels and we show that this produces new state-of-the-art (SOTA) performance in attention heatmap prediction for WSI readings. In the second stage, we propose a new transformer-based model that takes the multi-magnification attention heatmaps predicted from first-stage processing and uses them as feature representations to predict the attention scanpath of the pathologist. It does this by sequentially predicting each fixation of the scanpath, starting from the WSI center, in an autoregressive manner until the entire scanpath is obtained. We show that our model outperforms baseline models, thereby establishing it as SOTA performance in the new task of pathologist attention-scanpath prediction. Tools developed from this model could assist pathology trainees in learning to allocate their attention during WSI reading like an expert.

Acknowledgements: This work was supported by NSF grants IIS-2212046 and IIS-2123920 and grants UH3-CA225021, U24-CA215109, and U24-CA180924 from the NCI and NIH.