Carnegie Mellon University

infographic with thesis title and photo of girl with dark hair

October 01, 2024

Thesis Defense: Monica Dayao | October 7, 2024 | 11am

Title: Machine learning methods for the analysis and modeling of highly multiplexed spatial proteomics data

Monday, October 7, 2024
11am EST
GHC 4405 
Zoom details (see email)

Committee:
Ziv Bar-Joseph, Chair, CMU
Matthew Ruffalo, CMU
Shikhar Uttam, University of Pittsburgh
James Zou, Stanford University

Abstract:
A comprehensive 3D molecular map of the human body at a single-cell resolution would provide valuable information critical for studying human-related processes and biological systems such as development, aging, and disease. Towards this goal of constructing such a map, multidisciplinary consortia, including the Human BioMolecular Atlas Program (HuBMAP) and the Human Cell Atlas (HCA), have developed technologies for profiling the transcriptome and proteome in single cells. Out of these technologies, methods for highly-multiplexed single-cell spatial proteomics have only recently been developed; for example, recent advances in multiplexed imaging have enabled the profiling of tens of proteins per cell. While this generation of spatial proteomics data promises to revolutionize our ability to study cell-cell interactions and the spatial distributions of cells, it also raises several computational and modeling challenges. Cell segmentation remains a long-standing problem that usually requires tailored solutions for each bioimaging experiment. Even after cells are segmented and assigned to cell types based on expression values, it is still unclear how we can best leverage the rich information from the image data to further our understanding of disease and patient outcomes. Previous approaches provide limited insight on the specific cell-cell interactions or cell spatial relationships that are relevant to clinical outcomes. Finally, acquisition of spatial proteomics data can be cost-prohibitive to many, especially compared to lower-cost histopathology imaging. A mapping between spatial proteomics histopathology imaging would enable more powerful inferences on histopathology imaging alone.

In this thesis, we present a set of computational methods to analyze and model highly multiplexed spatial proteomics data to provide a solution for its use for building molecular human maps and to give insight to disease outcomes. First, we introduce a method for Ranking Markers for CEll Segmentation (RAMCES) for improving cell segmentation in this data. Next, we present a framework to derive spatially-relevant features from spatial proteomics data for cancer patient outcome prediction. We apply this framework to a dataset of head and neck squamous cell carcinoma (HNSCC) and reveal novel insights into cell-cell interactions in the tumor microenvironment. Finally, we developed a method that leverages datasets with both spatial proteomic and histology (H&E) imaging to enhance cell type annotations in histology-only datasets. Together, these contributions address key challenges in spatial proteomics analysis and offer promising avenues for future research. The computational methods presented here provide a foundation for advancing spatial omics, with potential applications in personalized medicine and disease characterization.