Carnegie Mellon University

Wendy Yang thesis defense details

June 17, 2025

Thesis Defense: Wendy Yang | June 26, 2025 | 2pm

CBD and CPCB are proud to announce the following thesis defense:

Title: Harnessing Machine Learning to Decode Human Genome Sequence-Structure-Function Relationships
Wendy Yang

GHC 7501
2:00PM EST

Committee:

Jian Ma (Chair, Carnegie Mellon University) 
Russell Schwartz (Carnegie Mellon University) 
Harinder Singh (University of Pittsburgh) 
Christina Leslie (Memorial Sloan Kettering Cancer Center)

Abstract:
The human genome is intricately organized within the nucleus, where the spatial organization of chromatin plays essential roles in regulating genome function. While DNA sequence and epigenomic features are known to influence genome structure and function, the principles linking sequence, structure, and function remain poorly understood. This dissertation develops machine learning frameworks to decode how DNA sequence and 3D nuclear organization shape genome function across diverse cell types and species. First, I introduce UNADON, a transformer-based model that predicts chromatin spatial positioning relative to nuclear bodies using sequence and epigenomic data, achieving high accuracy across cell types and identifying key determinants of chromatin positioning. Second, I present TEMPURA, a graph neural network model that integrates sequence features and 3D chromatin interactions to predict structural and functional genomic signals across human cell types and non-human primates. TEMPURA enables robust cross-cell-type and cross-species predictions. Third, leveraging newly assembled telomere-to-telomere (T2T) genomes in primates, I conduct a cross-species analysis of replication timing, identifying conserved and lineage-specific patterns of genome organization, including in previously inaccessible genomic regions. Finally, I develop CHANGE-net, a convolutional neural network model that predicts CRISPR-Cas9 off-target activity and the impact of genetic variation, generalizing across gRNAs and technologies. Together, these contributions advance our understanding of how genome sequence encodes higher-order genome structure and function, and provide robust computational frameworks for predictive modeling and interpretation in genomics.