Carnegie Mellon University

Infographic with information about Ruogu's thesis

October 16, 2024

Thesis Defense: Ruogu Lin | October 17, 2024 | 11am

Title: Bridging visual representations in deep neural networks and the human brain

Ruogu Lin
 
Thursday October 17th 
11:00AM, EST
GHC 7101
 
Committee:
Leila Wehbe, Chair, CMU
Tai Sing Lee, CMU
J. Patrick Mayo, PITT
Michael J. Tarr, CMU - Dept of Psychology 

Abstract: 
Deep neural networks have revolutionized computer vision in the past decade due to their remarkable ability to discern patterns in vast, complex datasets. This success has spurred numerous research into using deep neural networks to model the human visual system. However, fundamental questions persist regarding the nature and extent of the correspondence between visual representations in these networks and the human brain. Specifically, under what conditions do components of deep neural networks truly mimic the behavior and function of the human visual system? How do variations in network architecture and training impact this correspondence? And what are the inherent similarites and differences in how visual information is represented by brains and machines? Current research predominantly focuses on optimizing deep neural networks for brain encoding and decoding, often in simplified scenarios. This highlights the need for novel approaches and research paradigms to address these open questions.

This thesis proposes a novel framework to encode and interpret brain activity using ensemble learning and structured variance partitioning. This approach is specifically designed to address the complexities inherent in the high-dimensional representation spaces generated by deep neural networks. By leveraging the strengths of ensemble learning, we aim to capture a wider range of neural representations and improve the accuracy of brain encoding models. Additionally, structured variance partitioning will allow us to systematically analyze the contributions of different deep neural network features to the observed brain responses.

Using this framework, we investigate the correspondence between deep neural network representations and human brain responses measured through functional magnetic resonance imaging (fMRI). We systematically evaluate the impact of different network architectures and training paradigms on this correspondence. This analysis will provide valuable insights into the conditions under which deep neural networks effectively mimic the behavior and function of the human visual system. Furthermore, we explore the similarities and differences in how visual information is encoded across different brain regions and deep neural network layers.

To gain a more comprehensive understanding of brain dynamics, we extend our method to incorporate temporal information from magnetoencephalography (MEG) and electroencephalography (EEG). This allows us to investigate not only the spatial patterns of brain activity but also their temporal evolution. By integrating data from fMRI, MEG, and EEG, we aim to obtain a comprehensive spatio-temporal picture of visual processing in the brain and its relationship to deep neural network representations. These findings will contribute to a deeper understanding of how deep learning can aid in modeling the brain, how we can potentially advance deep learning itself, and how we can better understand visual processes in the brain.