Carnegie Mellon University
All students are required to take three graduate elective courses (3 credits/9 units each): a life sciences elective, a specialization elective, and an open elective.

Life Sciences Electives (3 credits/9 units)

Students with previous experience in graduate-level Life Sciences courses may convert this elective to an Open Elective with the approval of the Program Directors.

This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in genomic technology are in the process of revolutionizing how we conduct molecular biology research. These new techniques have given us an appreciation for the role that epigenetics modifications of the genome play in gene regulation, development, and inheritance. In this course, we will cover the biological basis of genomics and epigenetics, the basic computational tools to analyze genomic data, and the application of those tools to neuroscience. Through programming assignments and reading primary literature, the material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.

 

Some of the most serious public health problems we face today, from drug-resistant bacteria, to cancer, even covid, all arise from a fundamental property of living systems: their ability to evolve. Evolution permeates every system in flux and since Darwin’s theory of natural selection was first proposed, we have begun to understand how heritable differences in reproductive success drive the adaptation of living systems. This makes it intuitive and tempting to view evolution from an optimization perspective. However, genetic drift, trade-offs, constraints, and changing environ- ments, are among the many factors that may limit the optimizing force of natural selection. This tug-of-war between selection and drift, between the forces that produce variation in a population and the forces suppressing this variation, make the theory of evolution much more complex than previously thought and our understanding still far from complete.
The aim of this class is to provide an introduction into how biological systems are shaped by the forces and constraints driving evolutionary dynamics. I will also introduce population genetic theory as a lens for the understanding and interpretation of modern datasets. By the end of the course, you should have learned to appreciate the power of simple population genetic models, as well as the basic differences between idealized models and the data you might encounter in real life. The class is project-based and you will also work together to build your own models and explore open questions in evolutionary biology.

 

This course covers principles and applications of optical methods in the study of structure and function in biological systems. Topics to be covered include: absorption and fluorescence spectroscopy; interaction of light with biological molecules, cells, and systems; design of fluorescent probes and optical biosensor molecules; genetically expressible optical probes; photochemistry; optics and image formation; transmitted-light and fluorescence microscope systems; laser-based systems; scanning microscopes; electronic detectors and cameras: image processing; multi-mode imaging systems; microscopy of living cells; and the optical detection of membrane potential, molecular assembly, transcription, enzyme activity, and the action of molecular motors. This course is particularly aimed at students in science and engineering interested in gaining in-depth knowledge of modern light microscopy.

The purpose of this course is to review key cellular and molecular phenomenon in biological pathways with strong emphasis on latest experimental techniques used in applications including but not limited to disease diagnosis, therapeutics, large-scale genomic and proteomic analysis. Knowledge gained from this course will be both conceptual and analytical. Students will periodically write extensive research reports on select topics and give oral presentations on a select few, while critically analyzing primary literature.

 

This course considers selected current topics in genetics at an advanced level. The emphasis is on classroom discussion of research papers, supplemented with individual and group exercises. Topics change yearly. Recent topics have included genome imprinting in mammals, chromatin boundaries and long distance gene regulation, learning and memory in Drosophila,and the kinetochore complex in yeast. Must obtain a minimum grade of B in 03-330 to take this course.
This is a special topics course in which selected topics in biochemistry will be analyzed in depth with emphasis on class discussion of papers from the recent research literature. Topics change yearly. Recent topics have included single molecule analysis of catalysis and conformational changes; intrinsically disordered proteins; cooperative interactions of aspartate transcarbamoylase; and the mechanism of ribosomal protein synthesis.
This course covers fourteen topics in which significant recent advances or controversies have been reported. For each topic there is a background lecture by the instructor, student presentations of the relevant primary research articles and a general class discussion. Example topics are: extracellular matrix control of normal and cancer cell cycles, force generating mechanisms in trans-membrane protein translocation, signal transduction control of cell motility, and a molecular mechanism for membrane fusion.
The structure and expression of eukaryotic genes are discussed, focusing on model systems from a variety of organisms including yeast, flies, worms, mice, humans, and plants. Topics discussed include (1) genomics, proteomics, and functional proteomics and (2) control of gene expression at the level of transcription of mRNA from DNA, splicing of pre-mRNA, export of spliced mRNA from the nucleus to the cytoplasm, and translation of mRNA.
This course examines current topics in developmental biology at an advanced level. The course is team-taught by faculty from Carnegie Mellon University, the University of Pittsburgh Department of Biological Sciences, and the University of Pittsburgh Medical School. Each year several areas of current research are examined. Previous topics have included pattern formation, molecular signaling pathways, morphogen gradients, cell movements, and stem cells. Emphasis is on critical reading of original research papers and classroom discussion, with supporting lectures by faculty.
This course will use both lectures and current research literature in the area of Microbiology and Infectious Diseases to introduce such topics as prokaryotic cytoskeletal functions, the human microbiome and its impact, metabolic engineering, transposon mutagenesis for gene function elucidation, synthetic genome construction and applications, pathogenicity islands, functional and expression-based identification of pathogenicity determinants, horizontal gene transfer, regulatory RNAs, biofilm formation quorum sensing, and antimicrobial drug development.
This course is an introduction to human physiology and includes units on all major organ systems. Particular emphasis is given to the musculoskeletal, cardiovascular, respiratory, digestive, excretory, and endocrine systems. Modules on molecular physiology tissue engineering and physiological modeling are also included. Due to the close interrelationship between structure and function in biological systems, each functional topic will be introduced through a brief exploration of anatomical structure. Basic physical laws and principles will be explored as they relate to physiologic function.
This course will cover fundamental findings and approaches in cognitive neuroscience, with the goal of providing an overview of the field at an advanced level. Topics will include high-level vison, spatial cognition, working memory, long-term memory, learning, language, executive control, and emotion. Each topic will be approached from a variety of methodological directions, for example, computational modeling, cognitive assessment in brain-damaged humans, non-invasive brain monitoring in humans, and single-neuron recording in animals. Lectures will alternate with sessions in seminar format. Prerequisites: Graduate standing or two upper-level psychology courses from the areas of developmental psychology, cognitive psychology, computational modeling of intelligence, neuropsychology or neuroscience.
Evolution is a fundamental unifying principle of biology. This class takes a broad approach to illustrate how an evolutionary perspective augments medical research and practice. Topics covered range from the evolution of human populations, to antibiotic resistance, and include medical conditions as diverse as diabetes, cardiovascular disease, cancer or aging. Computational methods are not a focus of the class.
This course will survey and discuss current literature pertaining to advances in understanding how cells regulate complex behaviors such as migration, cell polarity, protein/membrane trafficking, cell and tissue morphology, and cell proliferation/survival and how lesions in these processes result in human disease
This course examines the electrical properties of nerve cells and the mechanisms by which nerve cells communicate. The following topics will be covered: electrical principles used by nerve cells, the basis of the resting potential, the function of voltage-dependent ionic channels, the mechanisms by which action potentials are generated, neurotransmitter receptor function, and the physiology of fast synaptic communication.
--
Sequencing technology is continually progressing, and genome sequences from different species and populations continue to become available in increasing numbers. Such data allows
questions about molecular function and evolution to be addressed in new and exciting ways. This course introduces students to the evolutionary analysis of DNA and amino acid sequences. Lectures on theory will be accompanied by practical instruction in the use of contemporary computational methods and software. Topics include: population genetics of selection and mutation, models of sequence evolution, phylogenetic models, analysis of multiple sequence alignments for rates and patterns of divergence, inference of natural selection, and coevolution between proteins. Emphasis is placed on quantitative modeling and the biological principles underlying observed patterns of molecular evolution. Interested students should have a basic grasp of molecular biology and calculus.
Course is concerned primarily with the structure and functions of proteins and nucleic acids. These are large polymers where structure and function are determined by the sequence of monomeric units. Topics will include the physical and chemical properties of the monomer units (amino acids/nucleotides); the determination of the linear sequence of these units; the size, shape and general properties of the biopolymers in aqueous systems; and the relation between structure and function, particularly in transport (hemoglobin) and in catalysis (enzymes).

This course will focus on advanced topics in Systems Immunology. Topics will include discussion of systems approaches, the general framework for applying approaches to biological problems, and specific immunologically relevant examples. The course is primarily intended to teach systems immunology techniques to graduate students in their second and third years so that they can adopt these approaches in their own research.

 

Comprehensive Immunology (2 credits)

This is a lecture course that will introduce the students to the fundamental concepts of modern immunology. The course will cover cells, tissues and organs of the immune system. Furthermore in-depth analysis of the development, activation, effector functions and regulation of immune response will be presented in this course.

Experimental Basis in Immunology (2 credits)

This course will expose the students to classical and contemporary literature in modern immunology. Emphasis will be on paper analysis and critical evaluation of primary data. This course will parallel the topics presented in comprehensive immunology lecture course which must be taken before or simultaneously with experimental basis of immunology.

 

Specialization: Bioimage Informatics (3 credits/9 units)

Bioimage Informatics draws upon advances in signal processing, optics, probe chemistry, molecular biology and machine learning to provide answers to biological questions from the growing numbers of biological images acquired in digital form. Microscopy is one of the oldest biological methods, and for centuries it has been paired with visual interpretation to learn about biological phenomena. With the advent of sensitive digital cameras and the dramatic increase in computer processing speeds over the past two decades, it has become increasingly common to collect large volumes of biological image data that create a need for sophisticated image processing and analysis. In addition, dramatic advances in machine learning during the same period set the stage for converting imaging from an observational to a computational discipline and allow the direct generation of biological knowledge from images.

With the rapid advance of bioimaging techniques and fast accumulation of bioimage data, computational bioimage analysis and modeling are playing an increasingly important role in understanding of complex biological systems. The goals of this course are to provide students with the ability to understand a broad set of practical and cutting-edge computational techniques to extract knowledge from bioimages.
This course covers principles and applications of optical methods in the study of structure and function in biological systems. Topics to be covered include: absorption and fluorescence spectroscopy; interaction of light with biological molecules, cells, and systems; design of fluorescent probes and optical biosensor molecules; genetically expressible optical probes; photochemistry; optics and image formation; transmitted-light and fluorescence microscope systems; laser-based systems; scanning microscopes; electronic detectors and cameras: image processing; multi-mode imaging systems; microscopy of living cells; and the optical detection of membrane potential, molecular assembly, transcription, enzyme activity, and the action of molecular motors. This course is particularly aimed at students in science and engineering interested in gaining in-depth knowledge of modern light microscopy.
This course introduces the fundamental techniques used in computer vision, that is, the analysis of patterns in visual images to reconstruct and understand the objects and scenes that generated them. Topics covered include image formation and representation, camera geometry and calibration, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, image segmentation, object recognition. The material is based on graduate-level texts augmented with research papers, as appropriate. Evaluation is based on homeworks and final project. The homeworks involve considerable Matlab programming exercises.
The fundamentals of computational medical image analysis will be explored, leading to current research in applying geometry and statistics to segmentation, registration, visualization, and image understanding. Student will develop practical experience through projects using the National Library of Medicine Insight Toolkit (ITK), a new software library developed by a consortium of institutions including CMU. In addition to image analysis, the course will describe the major medical imaging modalities and include interaction with practicing radiologists at UPMC.
A graduate course in Computer Vision with emphasis on representation and reasoning for large amounts of data (images, videos and associated tags, text, gps-locations etc) toward the ultimate goal of Image Understanding. We will be reading an eclectic mix of classic and recent papers on topics including: Theories of Perception, Mid-level Vision (Grouping, Segmentation, Poselets), Object and Scene Recognition, 3D Scene Understanding, Action Recognition, Contextual Reasoning, Image Parsing, Joint Language and Vision Models, etc. We will be covering a wide range of supervised, semi-supervised and unsupervised approaches for each of the topics above.
The emergence of contemporary artificial intelligence (AI) and machine learning(ML) methods has the potential to substantially alter and enhance the role ofcomputers in science. At the heart of deep learning (DL) applications lie statistical algorithms whose performance, much like that of a scholar, improves with training. There is a growing infrastructure of machine learning tools for generating, testing and refining scientific models. Such techniques are suitable for addressing complex problems in chemistry and computational biology that involve vast combinatorial spaces or complex processes, which conventional procedures either cannot solve or can tackle only at great computational cost. Focus on practice and applications of deep learning by exploring foundational concepts, structuring popular networks and implementing models through modern technologies (python, Jupyter notebooks and PyTorch). Other topics may include microscopy image recognition, natural language processing for sequence data, parallelism, GPU distributed computing, cloud technologies, inference and parameter fitting in deep networks. Course uses large datasets hosted by PSC. Throughout thecourse, we emphasize application of ML methods to chemical, physical and biological data. A notable aspect of the course is the hands-on use of Python Jupyter notebooks to introduce modern ML/statistical packages.
Biomedical modeling and visualization play an important role in mathematical modeling and computer simulation of real/artificial life for improved medical diagnosis and treatment. This course integrates mechanical engineering, biomedical engineering, computer science, and mathematics together. Topics to be studied include medical imaging, image processing, geometric modeling, visualization, computational mechanics, and biomedical applications. The techniques introduced are applied to examples of multi-scale biomodeling and simulations at the molecular, cellular, tissue, and organ level scales.
High-throughput techniques are revolutionizing biomedical research. From whole genome sequencing, to RNA-Seq transcriptome profiling, to high-throughput mass spectrometry for protein profiling, to high-throughput biochemical screening, to flow cytometry for cell profiling, to high-content screening, to literature analysis and electronic medical records, from molecule to patient, modern techniques generate vast quantities of data. In order to be effective, biomedical researchers require the appropriate computational tools to correctly interpret and utilize this data. As machine learning is the science of finding and applying patterns in data, it is an essential tool for turning data into knowledge and actionable insights and has been rising in prominence in biomedical research. This course will focus on the practical aspects of effectively applying state-of-the-art machine learning methods to biomedically relevant datasets. Topics covered include mathematical foundations, practical coding skills, classical machine learning, deep learning, and generative modeling.
This course will introduce students to micron, nanometer, and sub-nanometer scale imaging-based approaches in experimental, computational and systems biology. Emphasis will be placed on understanding the fundamentals of image formation, processing, and analysis of brightfield, phase-contrast, highly multiplexed, super-resolution, live-cell, cryo-EM, and computational imaging, along with imaging-based spatial transcriptomic methods. Applications of these methods in studying biological phenomena in space and time will also be discussed. Python will be the programming language in which image processing, machine learning, and system biology methods introduced in the course will be implemented. Students are expected to know Python basics. It is also expected that students have a basic understanding of linear algebra and calculus.

Specialization: Cellular and Systems Modeling (3 credits/9 units)

Cellular and Systems Modeling undertakes the ambitious task of studying the dynamics of biological and biomedical processes from a whole system point of view. The observed systems range over orders of magnitude, from tissue to cells to molecular assemblies! Engineering tools are used along with genome-scale information in mathematical and/or computational models that usually adopt a top-down approach. Modeling diseases, entire ‘virtual’ cells, or subcellular networks of interactions are among typical tasks. Major research topics include the modeling of complex signaling and regulatory networks, transport mechanisms, spatio-temporal evolution of microphysiological events, as well as establishing the links between the development of complex phenotypes and the seemingly unrelated molecular events.

This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problems sets with significant programming components and independent or group final projects.
Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of, and our ability to treat human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning and other computational methods are being used by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Students enrolled in 02-518 will be graded based on homeworks, paper summaries, and a course project. Students enrolled in 02-718 will be graded based on in-class presentations, written summaries of papers, and a course project.
Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.
Automated science and engineering combines Robotics, Machine Learning, and Artificial Intelligence to accelerate the pace of discovery and rational design. This course introduces students to the Machine Learning and Artificial Intelligence algorithms that enable this emerging paradigm. Emphasis is placed on techniques for sequential analysis (i.e., model discovery and hypothesis generation), design of experiments, and optimization to maximize the return on research capital. Specific approaches will include Active Learning, Reinforcement Learning, and Bayesian Optimization. Examples of automated science and engineering from the literature will be studied.

Robotic scientific instruments are already used to decrease costs and increase reproducibility. Automated science and engineering take this one step further by leveraging Artificial Intelligence and Machine Learning to interpret data and select experiments in a closed-loop fashion. This emerging paradigm is motivated by the fact that most systems are too complex for humans to truly understand. Artificial Intelligence and Machine Learning can manage this complexity and find the most efficient paths to discovery and rational design by avoiding the costs of performing experiments where the outcome can already be predicted accurately.

Website

Healthcare is not only the largest sector in the US economy, accounting for 20% of the GDP, but it also has a profound impact on our lives. Machine learning (ML) is experiencing explosive growth in healthcare, and is now top of mind for leaders at hospitals, insurance companies, and pharmaceutical firms. This course offers a survey of ML in healthcare today. The course will cover how ML is impacting care delivery, in particular in radiology, pathology, and ophthalmology. Students will learn how to apply causal inference, anomaly detection, Bayesian statistics, natural language processing, and large language models to important problems in healthcare, such as diagnosing a patient with a complex set of conditions, and predicting how long a patient will remain in the hospital. Students will gain firsthand experience extracting knowledge from electronic health records, time-series medical data, and other healthcare data sources. As a recurring theme, the course will cover the many challenges of working responsibly with healthcare data, including potential biases and inconsistencies, and provide strategies for identifying and mitigating these issues.
This course is an in-depth study of information processing in real neural systems from a computer science perspective. We will examine several brain areas, such as the hippocampus and cerebellum, where processing is sufficiently well understood that it can be discussed in terms of specific representations and algorithms. We will focus primarily on computer models of these systems, after establishing the necessary anatomical, physiological, and psychophysical context. There will be some neuroscience tutorial lectures for those with no prior background in this area.
This course provides a survey of basic statistical methods, emphasizing motivation from underlying principles and interpretation in the context of neuroscience and psychology. Though 36-746 assumes only passing familiarity with school-level statistics, it moves faster than typical university-level first courses. Vectors and matrices will be used frequently, as will basic calculus. Topics include Probability, Random Variables, and Important Distributions (binomial, Poisson, and normal distributions; the Law of Large Numbers and the Central Limit Theorem); Estimation and Uncertainty (standard errors and confidence intervals; the bootstrap); Principles of Estimation (mean squared error; maximum likelihood); Models, Hypotheses, and Statistical Significance (goodness-of-fit, p-values; power); General methods for testing hypotheses (permutation, bootstrap, and likelihood ratio tests); Linear Regression (simple linear regression and multiple linear regression); Analysis of Variance (one-way and two-way designs; multiple comparisons); Generalized Linear and Nonlinear Regression (logistic and Poisson regression; generalized linear models); and Nonparametric regression (smoothing scatterplots; smoothing histograms).
Causal connections are usually more informative and helpful than associational information, especially in understanding, control, decision-making, and prediction in changing environments. In the past decades, interesting advances were made in machine learning and statistics for tackling long-standing causality problems, such as how to discover causal knowledge from purely or partly observational data and how to infer the effect of interventions using such data. Furthermore, recently it has been shown that causal information can facilitate understanding and solving various machine learning problems, including transfer learning and semi-supervised learning. This course explores how causality is different from and related to association, recent machine learning methods for causal discovery, and why and how the causal perspective helps in machine learning and other tasks.We will study the difference between causal and non-causal systems and make an attempt to characterize the former. Apart from identification of causal effects, we will mainly explore two other causality-related areas. One is causal discovery, i.e., revealing causal information by analyzing purely observational data. It is well known that “correlation does not imply causality,” but we will make this statement more precise by asking what information in the data and what assumptions enable causal discovery. This will cover constraint-based causal discovery, causal discovery based on structural equation models, causal discovery from time series, difficulties in practical causal discovery, causality in neuroscience, causality in biology, and causality in economics and finance. The other is how to properly make use of causal information. This includes counterfactual reasoning, improving machine learning in light of causal knowledge, and forecasting in nonstationary environments. We will have the opportunity to solve causality- related problems in various fields: students are encouraged to bring any causal problems they are interested in, and we will work together to find potential solutions.
Course covers computational and mathematical neuroscience. Class will do modeling and analysis of complex dynamics of single neurons and large-scale networks.

This course will focus on advanced topics in Systems Immunology. Topics will include discussion of systems approaches, the general framework for applying approaches to biological problems, and specific immunologically relevant examples. The course is primarily intended to teach systems immunology techniques to graduate students in their second and third years so that they can adopt these approaches in their own research.

 

This course offers an introduction to modeling methods in neuroscience. Topics range from modeling the firing patterns of single neurons to using computational methods to understand neural coding. Some systems level modeling is also done.
This course introduces a number of modeling methods for biological systems. We will examine a number of problems from cell biology, immunology, population biology, physiology and molecular genetics. The main tools will be techniques from ordinary and partial differential equations. Discrete and delay-differential equations will also be used however the background for these will not be assumed. We will take models from current and classic papers in the field.
This course is focused on particular topics of great biologic complexity in critical illness, where modeling has the potential to translate in improved patient care. Lectures are provided by basic (biological and mathematical sciences) and clinical faculty, in conjunction with members of industry and speakers from outside institutions. This information will be communicated within the framework of defined themes that describe the complexity of inflammation in acute and chronic illnesses.
Drug discovery is an interdisciplinary science that seeks to identify small molecular and/or biologic entities for therapeutic intervention and to understand integrated biological systems and processes at the functional and molecular levels. This course will discuss various topics that are relevant to current approaches and principles in drug discovery including drug origins, drug target identification, and validation, biochemical and cell-based screening approaches, proteomic approaches to drug discovery, computational chemistry and biology, and quantitative systems pharmacology, as well as other topics in preclinical and clinical drug development, personalized medicine, Chinese herbal medicines and intellectual property. The course will include case studies intended to aid students in a full understanding of the drug discovery process.

By accurately controlling the movement of fluids at the microscale, microfluidics presents a unique opportunity to accurately establish mechanical and biochemical conditions that mimic the dynamic tissue environment in healthy and diseased tissue states. This course covers principles of biofluids, solid mechanics and mass transport that are applied for the design and analysis of biomedical microfluidic systems. Modeling of both healthy and diseased tissue microenvironment will be presented. Topics include tissue morphogenesis (e.g., epithelial layer and vascular pattern), tissue homeostasis (e.g., regeneration and wound healing) and cancer (angiogenesis and interactions with inflammatory cells). Lectures, in-class journal paper discussion and projects will focus on how microfluidic engineering can elucidate mechanistic understanding of cellular function and diseased tissues

Specialization: Computational Genomics (3 credits/9 units)

Computational Genomics entails efforts to digest the daunting quantity of genomic and proteomic data now available by systematic development and application of probability and statistics theories, information technologies and data mining techniques. Linguistics methods are viewed as promising tools towards elucidating sequence-structure-function relations, and complementing computational genomics studies. Computational genomics targets understanding gene/protein function, identifying and characterizing cellular regulatory networks and discerning the link between genes and diseases. Discovery and processing of this information is pivotal in the development of novel gene therapy strategies and tools.

Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as sux trees, sux arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed; programming proficiency is required.

The course will focus on describing algorithms that work with strings and string-like data in a rigorous way. We will typically describe why the algorithms are correct and give proofs (sometimes abbreviated or sketched) for runtime. For each major topic, we will describe at least one application from genomics that motivates the developed algorithms. We will include examples from other application areas as well. We have the following objectives:

  • Learn various algorithmic techniques and data structures for ecient processing of string data, including sux trees, sux arrays, Borrows-Wheeler transforms.
  • Understand the why these algorithms and data structures work.
  • Learn to apply and extend these algorithms and data structures.
  • Learn about the practical application of these techniques, especially in genomics.
  • At the end of this class, you should be familiar with much of the state-of-the-art in algorithms for strings, have familiarity with their use in practice, and have experience applying them to new problems.

Many of the problems in artificial intelligence, statistics, computer systems, computer vision, natural language processing, and computational biology, among many other fields, can be viewed as the search for a coherent global conclusion from local information. The probabilistic graphical models framework provides an unified view for this wide range of problems, enabling efficient inference, decision-making and learning in problems with a very large number of attributes and huge datasets. This graduate-level course will provide you with a strong foundation for both applying graphical models to complex problems and for addressing core research topics in graphical models.

The class will cover three aspects: the core representation, including Bayesian and Markov networks, and dynamic Bayesian networks; probabilistic inference algorithms, both exact and approximate; and learning methods for both the parameters and the structure of graphical models. Students entering the class should have a pre-existing working knowledge of probability, statistics, and algorithms, though the class has been designed to allow students with a strong mathematical background to catch up and fully participate.

Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of, and our ability to treat human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning and other computational methods are being used by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Students enrolled in 02-518 will be graded based on homeworks, paper summaries, and a course project. Students enrolled in 02-718 will be graded based on in-class presentations, written summaries of papers, and a course project.
This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in genomic technology are in the process of revolutionizing how we conduct molecular biology research. These new techniques have given us an appreciation for the role that epigenetics modifications of the genome play in gene regulation, development, and inheritance. In this course, we will cover the biological basis of genomics and epigenetics, the basic computational tools to analyze genomic data, and the application of those tools to neuroscience. Through programming assignments and reading primary literature, the material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.
Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.

Some of the most serious public health problems we face today, from drug-resistant bacteria, to cancer, even covid, all arise from a fundamental property of living systems: their ability to evolve. Evolution permeates every system in flux and since Darwin’s theory of natural selection was first proposed, we have begun to understand how heritable differences in reproductive success drive the adaptation of living systems. This makes it intuitive and tempting to view evolution from an optimization perspective. However, genetic drift, trade-offs, constraints, and changing environ- ments, are among the many factors that may limit the optimizing force of natural selection. This tug-of-war between selection and drift, between the forces that produce variation in a population and the forces suppressing this variation, make the theory of evolution much more complex than previously thought and our understanding still far from complete.
The aim of this class is to provide an introduction into how biological systems are shaped by the forces and constraints driving evolutionary dynamics. I will also introduce population genetic theory as a lens for the understanding and interpretation of modern datasets. By the end of the course, you should have learned to appreciate the power of simple population genetic models, as well as the basic differences between idealized models and the data you might encounter in real life. The class is project-based and you will also work together to build your own models and explore open questions in evolutionary biology.

 

Automated science and engineering combines Robotics, Machine Learning, and Artificial Intelligence to accelerate the pace of discovery and rational design. This course introduces students to the Machine Learning and Artificial Intelligence algorithms that enable this emerging paradigm. Emphasis is placed on techniques for sequential analysis (i.e., model discovery and hypothesis generation), design of experiments, and optimization to maximize the return on research capital. Specific approaches will include Active Learning, Reinforcement Learning, and Bayesian Optimization. Examples of automated science and engineering from the literature will be studied.

Robotic scientific instruments are already used to decrease costs and increase reproducibility. Automated science and engineering take this one step further by leveraging Artificial Intelligence and Machine Learning to interpret data and select experiments in a closed-loop fashion. This emerging paradigm is motivated by the fact that most systems are too complex for humans to truly understand. Artificial Intelligence and Machine Learning can manage this complexity and find the most efficient paths to discovery and rational design by avoiding the costs of performing experiments where the outcome can already be predicted accurately.

Website

An advanced introduction to computational molecular biology, using an applied algorithms approach. The first part of the course will cover established algorithmic methods, including pairwise sequence alignment and dynamic programming, multiple sequence alignment, fast database search heuristics, hidden Markov models for molecular motifs and phylogeny reconstruction. The second part of the course will explore emerging computational problems driven by the newest genomic research. Course work includes four to six problem sets, one midterm and final exam. A project based on recent results from the genomics literature will be required of students taking 03-711.

The overarching aim of this course is to teach evolutionary concepts and bioinformatic skills that are central to research in molecular, cell, developmental and microbiology. Evolutionary trees (phylogenies) model the evolutionary history of descent with modification from a shared ancestor, evolutionary relatedness, and patterns of divergence. Originally introduced to model species evolution, phylogenetics is now a primary tool for understanding the evolution of genes and proteins. Evolutionary models of protein evolution are of great practical importance because shared ancestry is a strong predictor of shared function. This assumption underlies many sequence‐based bioinformatics applications. Model organism research rests on the assumption that genes that share common ancestry (orthologs) perform the same function in related species. Reconstruction of evolutionary relationships in protein families is central to
identifying the appropriate target of study in an animal model.

The objective of the course to make the theory and practice of evolutionary bioinformatics
accessible to a broad biological audience and to accommodate students with a range of computational
backgrounds and skills. Students in 03‐327/727 acquire “tree thinking” skills required for critical interpretation of phylogenetic analyses and figures in the
literature; a detailed understanding of phylogenetic inference methods without relying upon formal mathematics; hands‐on experience working with sequence data repositories, bioinformatic tools for database retrieval, sequence analysis, and tree building; the knowledge required to apply those tools correctly to messy, genuine data sets, and the ability to think critically about abstract evolutionary models and evaluate alternate hypotheses in light of bioinformatic analyses. Students will acquire the knowledge and skills to carry out a phylogenetic analysis independently after completing the course.

Many of the problems in artificial intelligence, statistics, computer systems, computer vision, natural language processing, and computational biology, among many other fields, can be viewed as the search for a coherent global conclusion from local information. The probabilistic graphical models framework provides an unified view for this wide range of problems, enabling efficient inference, decision-making and learning in problems with a very large number of attributes and huge datasets. This graduate-level course will provide you with a strong foundation for both applying graphical models to complex problems and for addressing core research topics in graphical models.

The class will cover three aspects: the core representation, including Bayesian and Markov networks, and dynamic Bayesian networks; probabilistic inference algorithms, both exact and approximate; and learning methods for both the parameters and the structure of graphical models. Students entering the class should have a pre-existing working knowledge of probability, statistics, and algorithms, though the class has been designed to allow students with a strong mathematical background to catch up and fully participate.

Causal connections are usually more informative and helpful than associational information, especially in understanding, control, decision-making, and prediction in changing environments. In the past decades, interesting advances were made in machine learning and statistics for tackling long-standing causality problems, such as how to discover causal knowledge from purely or partly observational data and how to infer the effect of interventions using such data. Furthermore, recently it has been shown that causal information can facilitate understanding and solving various machine learning problems, including transfer learning and semi-supervised learning. This course explores how causality is different from and related to association, recent machine learning methods for causal discovery, and why and how the causal perspective helps in machine learning and other tasks.We will study the difference between causal and non-causal systems and make an attempt to characterize the former. Apart from identification of causal effects, we will mainly explore two other causality-related areas. One is causal discovery, i.e., revealing causal information by analyzing purely observational data. It is well known that “correlation does not imply causality,” but we will make this statement more precise by asking what information in the data and what assumptions enable causal discovery. This will cover constraint-based causal discovery, causal discovery based on structural equation models, causal discovery from time series, difficulties in practical causal discovery, causality in neuroscience, causality in biology, and causality in economics and finance. The other is how to properly make use of causal information. This includes counterfactual reasoning, improving machine learning in light of causal knowledge, and forecasting in nonstationary environments. We will have the opportunity to solve causality- related problems in various fields: students are encouraged to bring any causal problems they are interested in, and we will work together to find potential solutions.
This course provides an intermediate-level understanding of statistical foundations to prepare students for the competent use of data analysis methods in common practice in bioinformatics. Statistical ideas covered include probability distributions, likelihood theory, Bayesian and frequentist concepts, estimation, hypothesis testing and significance testing, multiplicity adjustments, the EM and MCMC algorithms, random walks, Poisson processes and Markov chains. Application areas include biological swquence analysis and microarray analysis. Students will learn the R statistical language. The R packages Bioconductor and BRB array tools for microarray analysis will be studied.
This survey course covers the principles of population genetics as applicable to human populations, including (1) the laws of inheritance that govern the organization of the genomes in populations, (2) the evolutionary forces and phenomena that impact genetic diversity in human populations, and (3) the foundational concepts of genetic epidemiology and gene discovery.
Sequencing technology is continually progressing, and genome sequences from different species and populations continue to become available in increasing numbers. Such data allows questions about molecular function and evolution to be addressed in new and exciting ways. This course introduces students to the evolutionary analysis of DNA and amino acid sequences. Lectures on theory will be accompanied by practical instruction in the use of contemporary computational methods and software. Topics include: population genetics of selection and mutation, models of sequence evolution, phylogenetic models, analysis of multiple sequence alignments for rates and patterns of divergence, inference of natural selection, and co-evolution between proteins. Emphasis is placed on quantitative modeling and the biological principles underlying observed patterns of molecular evolution. Interested students should have a basic grasp of molecular biology and calculus.

Specialization: Biological Physics (3 credits/9 units)

Biological physics encompasses a multidisciplinary approach that uses principles from physics to gain insights into the fundamental processes underlying living systems. Concepts from statistical physics, dynamical systems, and fluid dynamics are applied to investigate phenomena such as cell state transitions, cell motility, tissue morphogenesis, and evolution. Biological physicists probe these multi-scale phenomena using approaches from theoretical analyses, quantitative modeling, machine learning, and experimental measurements of forces and fields. This field aims to unravel the intricate workings of life from a unique perspective, which can also lead to new discoveries in physics and biology.

Note: The 700 level version of the course should be taken if available.

Computer modeling is playing an increasingly important role in chemical, biological and materials research. This course provides an overview of computational chemistry techniques including molecular mechanics, molecular dynamics, electronic structure theory and continuum medium approaches. Sufficient theoretical background is provided for students to understand the uses and limitations of each technique. An integral part of the course is hands on experience with state-of-the-art computational chemistry tools running on graphics workstations. 3 hrs. lec.

Many of the problems in artificial intelligence, statistics, computer systems, computer vision, natural language processing, and computational biology, among many other fields, can be viewed as the search for a coherent global conclusion from local information. The probabilistic graphical models framework provides an unified view for this wide range of problems, enabling efficient inference, decision-making and learning in problems with a very large number of attributes and huge datasets. This graduate-level course will provide you with a strong foundation for both applying graphical models to complex problems and for addressing core research topics in graphical models.

The class will cover three aspects: the core representation, including Bayesian and Markov networks, and dynamic Bayesian networks; probabilistic inference algorithms, both exact and approximate; and learning methods for both the parameters and the structure of graphical models. Students entering the class should have a pre-existing working knowledge of probability, statistics, and algorithms, though the class has been designed to allow students with a strong mathematical background to catch up and fully participate.

Like all branches of physical science, physical virology encompasses a search for simplifying generalities. However, viruses display a kaleidoscopic diversity that imposes limits on any generalization and provides tremendous opportunity for discovery.

The course covers latest methods in biological physics as well as fundamentals in physics of DNA, protein self-assembly and membranes using viruses as a physical object. This course also provides introductory level biochemistry and molecular biology lectures so that students with any background can participate in the course. Being an interdisciplinary and up-to-date research field involving fundamental theory and numerous applications, the emerging field of physical virology is aimed to attract students from any of the natural science disciplines (physics, chemistry and biology).

This course develops the methods of statistical mechanics and uses them to calculate observable properties of systems in thermodynamic equilibrium. Topics treated include the principles of classical thermodynamics, canonical and grand canonical ensembles for classical and quantum mechanical systems, partition functions and statistical thermodynamics, fluctuations, ideal gases of quanta, atoms and polyatomic molecules, degeneracy of Fermi and Bose gases, chemical equilibrium, ideal paramagnetics and introduction to simple interacting systems. 3 hrs. lecture, 1 hr. recitation. Typical Texts: Reif, Statistical and Thermal Physics; Pathria, Statistical Mechanics.
This course provides an intermediate-level understanding of statistical foundations to prepare students for the competent use of data analysis methods in common practice in bioinformatics. Statistical ideas covered include probability distributions, likelihood theory, Bayesian and frequentist concepts, estimation, hypothesis testing and significance testing, multiplicity adjustments, the EM and MCMC algorithms, random walks, Poisson processes and Markov chains. Application areas include biological swquence analysis and microarray analysis. Students will learn the R statistical language. The R packages Bioconductor and BRB array tools for microarray analysis will be studied.
Basic quantum mechanics, with emphasis on the theory of chemical structure and dynamics.
Development of equilibrium statistical mechanics and thermodynamics. Applications to chemical systems. These applications include solutions, phase transitions (Ising model) and reaction theory.
This course deals with the elements of polymer science and engineering necessary for entry-level understanding of polymer technology. While the chemistry determines macromolecular microstructure, an understanding of polymer manufacture and processing requires the addition of physical chemistry and transport phenomena. The essential material covered in this class includes the elements of polymers thermodynamics, rheology, mechanical behavior and equipment design.
This course covers the basic as well as certain selected topics pertaining to the physicochemical origins of architecture and motility of biological cells. It is aimed at graduate students pursuing degrees in various fields of biology (and also in mathematics, physics, chemistry, or engineering), who have taken university-level courses in mathematics, physics, and chemistry. This course material draws upon the variety of quantitaive disciplines but maintains a biological perspective. Physical properties and chemical kinetics that determine the structure and function of the cytoskeleton (the assembly of non-covalent polymers at the base of the cellular architecture) will be covered, as will the physicochemical mechanisms of motility driven by biological force-generating macromolecules. The final grade will be based on homework problems and on a closed-book exam. The didactic material will be presented from the perspective of a practical researcher, and the problem sets will emphasize developing a sense of what makes for a good research strategy.
The main subject matter of this course will be a survey of group theory methods and their applications in various fields of physics. Selected topics involving analytic functions, operator algebra, and solutions of the differential and integral equations of physics will be addressed. Some numerical analysis and computational work will also be incorporated.
This is the first term of a 2-term course with emphasis on statistical mechanics. Discussion of microcanonical, canonical, and grand canonical ensembles, the passage to quantum mechanics, and the use of density matrix. The Gibbs approach to the second law. Fermi-Dirac and Bose-Einstein statistics, in both weak and strong degeneracy approximations. Transport phenomena including the fluctuation dissipation theorem and the master equation

Specialization: Computational Structural Biology (3 credits/9 units)

Computational Structural Biology aims at establishing biomolecular sequence-structure-function relations using fundamental principles of physical sciences in theoretical models and simulations of structure and dynamics. After the advances in complete genomes sequencing, it became evident that structural information is needed for understanding the origin and mechanisms of biological interactions, and designing/controlling function. Computational Structural Biology emerged as a tool for efficient identification of structure and dynamics in many applications. Major research topics include protein folding, protein dynamics with emphasis on large complexes and assemblies, protein-protein, protein-ligand and protein-DNA interactions and their functional implications. Drug design and protein engineering represent applications of note.

Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.

Automated science and engineering combines Robotics, Machine Learning, and Artificial Intelligence to accelerate the pace of discovery and rational design. This course introduces students to the Machine Learning and Artificial Intelligence algorithms that enable this emerging paradigm. Emphasis is placed on techniques for sequential analysis (i.e., model discovery and hypothesis generation), design of experiments, and optimization to maximize the return on research capital. Specific approaches will include Active Learning, Reinforcement Learning, and Bayesian Optimization. Examples of automated science and engineering from the literature will be studied.

Robotic scientific instruments are already used to decrease costs and increase reproducibility. Automated science and engineering take this one step further by leveraging Artificial Intelligence and Machine Learning to interpret data and select experiments in a closed-loop fashion. This emerging paradigm is motivated by the fact that most systems are too complex for humans to truly understand. Artificial Intelligence and Machine Learning can manage this complexity and find the most efficient paths to discovery and rational design by avoiding the costs of performing experiments where the outcome can already be predicted accurately.

Website

Note: The 700 level version of the course should be taken if available.

Computer modeling is playing an increasingly important role in chemical, biological and materials research. This course provides an overview of computational chemistry techniques including molecular mechanics, molecular dynamics, electronic structure theory and continuum medium approaches. Sufficient theoretical background is provided for students to understand the uses and limitations of each technique. An integral part of the course is hands on experience with state-of-the-art computational chemistry tools running on graphics workstations. 3 hrs. lec.

This course develops the methods of statistical mechanics and uses them to calculate observable properties of systems in thermodynamic equilibrium. Topics treated include the principles of classical thermodynamics, canonical and grand canonical ensembles for classical and quantum mechanical systems, partition functions and statistical thermodynamics, fluctuations, ideal gases of quanta, atoms and polyatomic molecules, degeneracy of Fermi and Bose gases, chemical equilibrium, ideal paramagnetics and introduction to simple interacting systems. 3 hrs. lecture, 1 hr. recitation. Typical Texts: Reif, Statistical and Thermal Physics; Pathria, Statistical Mechanics.
This course deals with the elements of polymer science and engineering necessary for entry-level understanding of polymer technology. While the chemistry determines macromolecular microstructure, an understanding of polymer manufacture and processing requires the addition of physical chemistry and transport phenomena. The essential material covered in this class includes the elements of polymers thermodynamics, rheology, mechanical behavior and equipment design.
Basic quantum mechanics, with emphasis on the theory of chemical structure and dynamics.
Development of equilibrium statistical mechanics and thermodynamics. Applications to chemical systems. These applications include solutions, phase transitions (Ising model) and reaction theory.
This course covers the basic as well as certain selected topics pertaining to the physicochemical origins of architecture and motility of biological cells. It is aimed at graduate students pursuing degrees in various fields of biology (and also in mathematics, physics, chemistry, or engineering), who have taken university-level courses in mathematics, physics, and chemistry. This course material draws upon the variety of quantitaive disciplines but maintains a biological perspective. Physical properties and chemical kinetics that determine the structure and function of the cytoskeleton (the assembly of non-covalent polymers at the base of the cellular architecture) will be covered, as will the physicochemical mechanisms of motility driven by biological force-generating macromolecules. The final grade will be based on homework problems and on a closed-book exam. The didactic material will be presented from the perspective of a practical researcher, and the problem sets will emphasize developing a sense of what makes for a good research strategy.
This course consists of a series of lectures and tutorial sessions which focus on the general principles of pharmacology. Major topics are principles of pharmacokinetics (including drug absorption, distribution, and metabolism) and pharmacodynamics (quantitation of drug-receptor interactions).
Drug discovery is an interdisciplinary science that seeks to identify small molecular and/or biologic probes and to understand at the molecular level how these probes affect macromolecular processes. This course will discuss various topics that are relevant to current approaches and principles in drug discovery including target validation, drug origins, cell based screening, high throughput screening, proteomic approaches to drug discovery, computational biological aspects of drug discovery, and pharmacoinformatics, as well as topics in preclinical drug development and intellectual property. The course will include case studies intended to aid students in a full understanding of the drug discovery process.
This course examines molecular mechanisms of drug interactions with an emphasis on drugs that modulate cell signaling, cellular responses to drugs. The course will include student participation through presentations and discussion of relevant contemporary scientific literature. Topics include: cell cycle checkpoints and anti-cancer drugs, therapeutic control of ion channels, and blood glucose, anti-inflammatory agents and nuclear receptor signaling.
The main subject matter of this course will be a survey of group theory methods and their applications in various fields of physics. Selected topics involving analytic functions, operator algebra, and solutions of the differential and integral equations of physics will be addressed. Some numerical analysis and computational work will also be incorporated.
This is the first term of a 2-term course with emphasis on statistical mechanics. Discussion of microcanonical, canonical, and grand canonical ensembles, the passage to quantum mechanics, and the use of density matrix. The Gibbs approach to the second law. Fermi-Dirac and Bose-Einstein statistics, in both weak and strong degeneracy approximations. Transport phenomena including the fluctuation dissipation theorem and the master equation