Deep Reinforcement Learning's Struggle with Visuospatial Reasoning: Insights from the Same-Different Task

Poster Presentation: Sunday, May 18, 2025, 2:45 – 6:45 pm, Banyan Breezeway
Session: Decision Making: Actions

Markus Solbach1 (), John Tsotsos1; 1York University

Deep learning has dramatically changed the landscape of computational visual systems. One such prominent example is deep reinforcement learning, which is a type of machine learning solution that has seen numerous applications for a wide range of problems spanning game playing to finance, health care, natural language processing and embodied agents. We are interested in embodied agents that are free to visually examine their 3D environment, i.e., are active observers. We will show that deep reinforcement learning struggles to learn the fundamental visuospatial capability that is effortless for humans and birds, rodents and even insects. In order to collect data for our argument, we created a 3D physical version of the classic Same-Different task: Are two stimuli the same? The task was found to be easily solvable by human subjects with high accuracy from the first trial. Using human performance as the baseline, we sought to determine whether reinforcement learning could also solve the task. We have explored several reinforcement learning frameworks, including SAC, PPO, Imitation Learning and Curriculum Learning. Curriculum learning emerged as the only viable approach, but only when the task is simplified significantly to the point that it has only distant relevance to the original human task. Even with curriculum learning, the learned strategies differed significantly from human behaviour. Models exhibited a strong preference for a very limited set of viewpoints, often fixating on the same location repeatedly, lacking the flexibility and efficiency of human visuospatial problem-solving. Conversely, the outcomes of the human experiment were instrumental in developing a curriculum lesson plan that improved learning performance. Our human subjects seemed to develop correct strategies from the first trial and then, over additional trials, became more efficient, not more accurate. Reinforcement learning methods do not seem to have the foundation to match such human abilities.

Acknowledgements: This research was supported by grants to the senior author (John K. Tsotsos) from the following sources: Air Force Office of Scientific Research USA, The Canada Research Chairs Program, and the NSERC Canadian Robotics Network.