Time/Room: Friday, May 13, 2016, 2:30 – 4:30 pm, Talk Room 1-2
Organizer(s): Radoslaw Martin Cichy; Department of Psychology and Education, Free University Berlin, Berlin, Germany
Presenters: Kendrick Kay, Seyed-Mahdi Khaligh-Razavi, Daniel Yamins, Radoslaw Martin Cichy, Tomoyasu Horikawa, Kandan Ramakrishnan
Symposium Description
Visual cognition in humans is mediated by complex, hierarchical, multi-stage processing of visual information, propagated rapidly as neural activity in a distributed network of cortical regions. Understanding visual cognition in cortex thus requires a predictive and quantitative model that captures the complexity of the underlying spatio-temporal dynamics and explains human behavior. Very recently, brain-inspired deep neural networks (DNNs) have taken center stage as an artificial computational model for understanding human visual cognition. A major reason for their emerging dominance is that DNNs perform near human-level performance on tasks such as object recognition (Russakovsky et al., 2014). While DNNs were initially developed by computer scientists to solve engineering problems, research comparing visual representations in DNNs and primate brains have found a striking correspondence, creating excitement in vision research (Kriegeskorte 2015, Ann Rev Vis, Keynote VSS 2014 Bruno Olshausen; Jones 2014; Nature). The aim of this symposium is three-fold: One aim is to describe cutting-edge research efforts that use DNNs to understand human visual cognition. A second aim is to establish which results reproduce across studies and thus create common ground for further research. A third aim is to provide a venue for critical discussion of the theoretical implications of the results. To introduce and frame the debate for a wide audience, Kendrick Kay will start with thorough introduction to the DNN approach in the beginning and formulate questions and challenges to which the individual speakers will respond in their talks. The individual talks will report on recent DNN-related biological vision research. The talks will cover a wide range of results: brain data recorded in different species (human, monkey), with different techniques (electrophysiology, fMRI, M/EEG), for static as well as movie stimuli, using a wide range of analysis techniques (decoding and encoding models, representational similarity analysis). Major questions addressed will be: what do DNNs tell us about visual processing in the brain? What is the theoretical impact of finding a correspondence between DNNs and representations in human brains? Do these insights extend to visual cognition such as imagery? What analysis techniques and methods are available to relate DNNs to human brain function? What novel insights can be gained from comparison of DNNs to human brains? What effects reproduce across studies? A final 20-min open discussion between speakers and the audience will close the symposium, encouraging discussion on what aims the DNN approach has reached already, where it fails, what future challenges lie ahead, and how to tackle them. As DNNs address visual processing across low to mid- to high-level vision, we believe this symposium will be of interest to a broad audience, including students, postdocs and faculty. This symposium is a grass-roots first-author-based effort, bringing together junior researchers from around the world (US, Germany, Netherlands, and Japan).
Presentations
What are deep neural networks and what are they good for?
Speaker: Kendrick Kay; Center for Magnetic Resonance Research, University of Minnesota, Twin Cities
In this talk, I will provide a brief introduction to deep neural networks (DNN) and discuss their usefulness with respect to modeling and understanding visual processing in the brain. To assess the potential benefits of DNN models, it is important to step back and consider generally the purpose of computational modeling and how computational models and experimental data should be integrated. Is the only goal to match experimental data? Or should we derive understanding from computational models? What kinds of information can be derived from a computational model that cannot be derived through simpler analyses? Given that DNN models can be quite complex, it is also important to consider how to interpret these models. Is it possible to identify the key feature of a DNN model that is responsible for a specific experimental effect? Is it useful to perform ‘in silico’ experiments with a DNN model? Should we should strive to perform meta-modeling, that is, developing a (simple) model of a (complex DNN) model in order to help understand the latter? I will discuss these and related issues in the context of DNN models and compare DNN modeling to an alternative modeling approach that I have pursued in past research.
Mixing deep neural network features to explain brain representations
Speaker: Seyed-Mahdi Khaligh-Razavi; CSAIL, MIT, MA, USA
Authors: Linda Henriksson, Department of Neuroscience and Biomedical Engineering, Aalto University, Aalto, Finland Kendrick Kay, Center for Magnetic Resonance Research, University of Minnesota, Twin Cities Nikolaus Kriegeskorte, MRC-CBU, University of Cambridge, UK
Higher visual areas present a difficult explanatory challenge and can be better studied by considering the transformation of representations across the stages of the visual hierarchy from lower- to higher-level visual areas. We investigated the progress of visual information through the hierarchy of visual cortex by comparing the representational geometry of several brain regions with a wide range of object-vision models, ranging from unsupervised to supervised, and from shallow to deep models. The shallow unsupervised models tended to have higher correlations with early visual areas; and the deep supervised models were more correlated with higher visual areas. We also presented a new framework for assessing the pattern-similarity of models with brain areas, mixed representational similarity analysis (RSA), which bridges the gap between RSA and voxel-receptive-field modelling, both of which have been used separately but not in combination in previous studies (Kriegeskorte et al., 2008a; Nili et al., 2014; Khaligh-Razavi and Kriegeskorte, 2014; Kay et al., 2008, 2013). Using mixed RSA, we evaluated the performance of many models and several brain areas. We show that higher visual representations (i.e. lateral occipital region, inferior temporal cortex) were best explained by the higher layers of a deep convolutional network after appropriate mixing and weighting of its feature set. This shows that deep neural network features form the essential basis for explaining the representational geometry of higher visual areas.
Using DNNs To Compare Visual and Auditory Cortex
Speaker: Daniel Yamins; Department of Brain and Cognitive Sciences, MIT, MA, USA
Authors: Alex Kell, Department of Brain and Cognitive Sciences, MIT, MA, USA
A slew of recent studies have shown how deep neural networks (DNNs) optimized for visual tasks make effective models of neural response patterns in the ventral visual stream. Analogous results have also been discovered in auditory cortex, where optimizing DNNs for speech-recognition tasks has produced quantitatively accurate models of neural response patterns in auditory cortex. The existence of computational models within the same architectural class for two apparently very different sensory representations begs several intriguing questions: (1) to what extent do visual models predict auditory response patterns, and to what extent to do auditory models predict visual response patterns? (2) In what ways are the vision and auditory models models similar, and what ways do they diverge? (3) What do the answers to the above questions tell us about the relationships between the natural statistics of these two sensory modalities — and the underlying generative processes behind them? I’ll describe several quantitative and qualitative modeling results, involving electrophysiology data from macaques and fMRI data from humans, that shed some initial light on these questions.
Deep Neural Networks explain spatio-temporal dynamics of visual scene and object processing
Speaker: Radoslaw Martin Cichy; Department of Psychology and Education, Free University Berlin, Berlin, Germany
Authors: Aditya Khosla, CSAIL, MIT, MA, USA Dimitrios Pantazis, McGovern Institute of Brain and Cognitive Sciences, MIT, MA, USA Antonio Torralba, CSAIL, MIT, MA, USA Aude Oliva, CSAIL, MIT, MA, USA
Understanding visual cognition means knowing where and when what is happening in the brain when we see. To address these questions in a common framework we combined deep neural networks (DNNs) with fMRI and MEG by representational similarity analysis. We will present results from two studies. The first study investigated the spatio-temporal neural dynamics during visual object recognition. Combining DNNs with fMRI, we showed that DNNs predicted a spatial hierarchy of visual representations in both the ventral, and the dorsal visual stream. Combining DNNs with MEG, we showed that DNNs predicted a temporal hierarchy with which visual representations emerged. This indicates that 1) DNNs predict the hierarchy of visual brain dynamics in space and time, and 2) provide novel evidence for object representations in parietal cortex. The second study investigated how abstract visual properties, such as scene size, emerge in the human brain in time. First, we identified an electrophysiological marker of scene size processing using MEG. Then, to explain how scene size representations might emerge in the brain, we trained a DNN on scene categorization. Representations of scene size emerged naturally in the DNN without it ever being trained to do so, and DNN accounted for scene size representations in the human brain. This indicates 1) that DNNs are a promising model for the emergence of abstract visual properties representations in the human brain, and 2) gives rise to the idea that the cortical architecture in human visual cortex is the result of task constraints imposed by visual tasks.
Generic decoding of seen and imagined objects using features of deep neural networks
Speaker: Tomoyasu Horikawa; Computational Neuroscience Laboratories, ATR, Kyoto, Japan
Authors: Yukiyasu Kamitani; Graduate School of Informatics, Kyoto University, Kyoto, Japan
Object recognition is a key function in both human and machine vision. Recent studies support that a deep neural network (DNN) can be a good proxy for the hierarchically structured feed-forward visual system for object recognition. While brain decoding enabled the prediction of mental contents represented in our brain, the prediction is limited to training examples. Here, we present a decoding approach for arbitrary objects seen or imagined by subjects by employing DNNs and a large image database. We assume that an object category is represented by a set of features rendered invariant through hierarchical processing, and show that visual features can be predicted from fMRI patterns and that greater accuracy is achieved for low/high-level features with lower/higher-level visual areas, respectively. Furthermore, visual feature vectors predicted by stimulus-trained decoders can be used to identify seen and imagined objects (extending beyond decoder training) from a set of computed features for numerous objects. Successful object identification for imagery-induced brain activity suggests that feature-level representations elicited in visual perception may also be used for top-down visual imagery. Our results demonstrate a tight link between the cortical hierarchy and the levels of DNNs and its utility for brain-based information retrieval. Because our approach enabled us to predict arbitrary object categories seen or imagined by subjects without pre-specifying target categories, we may be able to apply our method to decode the contents of dreaming. These results contribute to a better understanding of the neural representations of the hierarchical visual system during perception and mental imagery.
Mapping human visual representations by deep neural networks
Speaker: Kandan Ramakrishnan; Intelligent Sensory Information Systems, UvA, Netherlands
Authors: H.Steven Scholte; Department of Psychology, Brain and Cognition, UvA, Netherlands, Arnold Smeulders, Intelligent Sensory Information Systems, UvA, Netherlands, Sennay Ghebreab; Intelligent Sensory Information Systems, UvA, Netherlands
A number of recent studies have shown that deep neural networks (DNN) map to the human visual hierarchy. However, based on a large number of subjects and accounting for the correlations between DNN layers, we show that there is no one-to-one mapping of DNN layers to the human visual system. This suggests that the depth of DNN, which is also critical to its impressive performance in object recognition, has to be investigated for its role in explaining brain responses. On the basis of EEG data collected from a large set of natural images we analyzed different DNN architectures – a 7 layer, 16 layer and a 22 layer DNN network using weibull distribution for the representations at each layer. We find that the DNN architectures reveal temporal dynamics of object recognition, with early layers driving responses earlier in time and higher layers driving the responses later in time. Surprisingly the layers from the different architectures explain brain responses to a similar degree. However, by combining the representations of the DNN layers we observe that in the higher brain areas we explain more brain activity. This suggests that the higher areas in the brain are composed of multiple non-linearities that are not captured by the individual DNN layers. Overall, while DNNs form a highly promising model to map the human visual hierarchy, the representations in the human brain go beyond the simple one-to-one mapping of the DNN layers to the human visual hierarchy.