CS8395 Special Topics in Computational Biology
🧭 Description
This graduate seminar examines how computational methods are designed, assessed, and advanced to tackle modern biological questions, with emphasis on applying machine learning and AI to model, integrate, and interpret single-cell and spatial transcriptomic data. In particular the course explores how choices in data representation, model specification, and algorithm design shape what can be inferred from such datasets.
Through discussions and student presentations of recent work from both conferences (e.g., ISMB, RECOMB, NeurIPS, MLCB etc.) and journals (Nature Methods, Nature Biotechnology etc.), students will learn how to critically evaluate existing models, identify the underlying assumptions and limitations, and develop their own ideas. The course also includes hands-on mini-projects in which students will design, implement, and assess a computational method inspired by the papers discussed in class.
💡 Beyond running analyses, we focus on how to think like a method developer — turning ideas into models that connect computation and biology.
Logistics
- Course Code: CS8395-04
- Term: Spring 2026
- Class Times: Tuesdays & Thursdays, 02:45 PM - 04:00 PM
- Location: Olin Hall 131
🧬 Computational biology in a Nutshell
🎯 Learning Goals
By the end of this course, students will be able to:
- Define computational biology and its scope across computer science and life sciences
- Understand major biological data types — single-cell, spatial, proteomic, genomic, and multi-omics
- Formulate computational methods to address biological problems
- Evaluate and reconstruct mathematical models from research literature
- Appreciate the role of simulation and benchmarking in method development
- Translate mathematical models into efficient, reproducible implementations
🗓️ Lecture Outline
| Lecture/Date | Theme / Discussion Paper(s) | Computational Topics | Hands-on Notebook |
|---|---|---|---|
| L1/ Tue, Jan 06, 2026 | What is computational biology — Way et al., Eraslan et al.; How to read a paper | ||
| L2/ Thu, Jan 08, 2026 | In Class Project Discussion | How do we present a paper effectively? What not to do? | TBD |
| L3/ Tue, Jan 13, 2026 | Mathematical Foundation, Representation of data — Scanpy | Distribution, Linear Algebra | play_with_distribution.ipynb |
| L4/ Thu, Jan 15, 2026 | Dimensionality Reduction — Sun et al.;Yin et. al. | PCA, SVD, tSNE, UMAP, unsupervised/supervised clustering | pca_tsne.ipynb |
| L5/ Tue, Jan 20, 2026 | Clustering Algorithms - Zappia et al.;Yang et al. | Leiden and Louvain algorithms | hvg_selection.ipynb |
| L6/ Thu, Jan 22, 2026 | Clustering Algorithms - Eraslan et al. Benzio et al. | Leiden and Louvain algorithms | representation_learning.ipynb |
| | Canceled because of school closure | ||
| L8/ Thu, Jan 29, 2026 | | Canceled because of school closure | |
| L9/ Tue, Feb 03, 2026 | Spatial Transcriptomics — Squidpy, Palla et al. 2022, Nature Methods | Spatial data structures, spot deconvolution | TBD |
| Feb 05, 2026 | Initial Proposal Submission (1 page) | Will be graded | TBD |
| L10/ Thu, Feb 05, 2026 | VAE (continued) | TBD | |
| L11/ Tue, Feb 10, 2026 | Probabilistic spatial deconvolution — DestVI, Lopez et al. 2022, Nature Biotech; RCTD, Cable et al. 2019, Nature Biotech | NMF/PNMF, VAEs for spatial data | TBD |
| L12/ Thu, Feb 12, 2026 | GNN foundation - GCN, Kipf & Welling 2016, ICLR; GraphSAGE, Hamilton et al. 2017, NeurIPS GAT, Veličković et al. 2018, ICLR | Graph representations, GNNs, | gnn_basics.ipynb |
| L13/ Tue, Feb 17, 2026 | GNN/GCN for spatial transcriptomics data - GraphST; STAGATE | Graph representations, GNNs, spatial autocorrelation | TBD |
| L14/ Thu, Feb 19, 2026 | Bridging Histology and Spatial Transcriptomics — Tangram SpaGCN iStar | histology.ipynb | |
| L15/ Tue, Feb 24, 2026 | Spatial Transcriptomics (continued), Spatial Statistics — spatialDE; SPARK DESpace | Detecting spatially variable genes | spatial_cov.ipynb |
| L16/ Thu, Feb 26, 2026 | Segmentation I — CellPose; CellVIT | Cell Segmentation basics | spatial_cov.ipynb |
| L17/ Tue, Mar 03, 2026 | Segmentation II — Baysor; | Bayesian view of cell segmentation | spatial_cov.ipynb |
| L18/ Thu, Mar 05, 2026 | Multi-omics integration - - Papers TBD | TBD | |
| Tue, Mar 10, 2026 | No class — Spring Break | TBD | |
| Thu, Mar 12, 2026 | No class — Spring Break | ||
| Mar 16, 2026 | Deadline for midterm report submission | Will be graded | TBD |
| Tue, Mar 17, 2026 | Mid semester project presentations I | Will be graded | TBD |
| Thu, Mar 19, 2026 | Mid semester project presentations II | Will be graded | TBD |
| L20/ Tue, Mar 24, 2026 | Simulating multi-omics data - - Papers TBD | TBD | TBD |
| L20/ Thu, Mar 26, 2026 | Simulation and benchmarking - - Papers TBD | TBD | TBD |
| L21/ Tue, Mar 31, 2026 | Benchmarking - - Papers TBD | TBD | |
| L22/ Tue, Apr 02, 2026 | Advanced topics - Neural flow | TBD | |
| L23/ Tue, Apr 07, 2026 | Advanced topics - Neural ODE and other physics inspired models I | TBD | |
| L24/ Tue, Apr 09, 2026 | Advanced topics - Neural ODE and other physics inspired models II | TBD | |
| L25/ Tue, Apr 14, 2026 | Final Project discussion | TBD | |
| L26/ Tue, Apr 16, 2026 | Final Group Presentations I and Report submission | Will be graded | TBD |
| L27/ Tue, Apr 21, 2026 | Final Group Presentations II and Report submission | Will be graded | TBD |
🧩 Core Topics (Summary)
- Foundations
- What is computational biology?
- Designing methods vs running tools
- Data Representation
- Single-cell, spatial, and multi-omics data
- Structuring biological datasets
- Exploratory Analysis
- Visualization, dimensionality reduction, clustering
- Feature selection in multi-omics contexts
- Problem Formulation
- Translating biological questions into computational problems
- Case studies from recent literature
- Modeling & Software
- Statistical, probabilistic, and ML frameworks
- Coding (Python, R, C++, Rust), efficiency, and reproducibility
- Simulation & Benchmarking
- Role of simulations and benchmark datasets
- Designing fair, reproducible evaluations
🏗️ Course Structure
Lectures/Discussions: Weekly topics introduced through short lectures and literature discussions (back-to-back or two days of the same week).
Readings: Research papers, book chapters, and review articles.
Assignments: Focused problem sets and small coding projects.
Final Project: Students will select a computational biology problem, survey existing literature, and either (1) reconstruct/improve an existing method or (2) propose/implement a novel approach.
👥 Evaluation (Groups)
| Component | Weight |
|---|---|
| Paper Presentation | 20% |
| - Presentation | |
| - Critical Evaluation | |
| Initial Proposal Presentation | 15% |
| Midterm Project Report | 15% |
| Midterm Project Presentation | 15% |
| Final Project Report | 15% |
| Final Project Presentation | 15% |
| Peer Review | 10% |
| Attendance (only counted after 19th February according to announcement) | 10% |
📚 Suggested References
- Compeau & Pevzner, BIOINFORMATICS ALGORITHMS - An Active Learning Approach
- John Tukey, Exploratory Data Analysis
- Pevzner & Shamir, Computational and Systems Biology
- Blum, Hopcroft & Kannan, Foundations of Data Science
- Single-cell Best Practices
- Selected research papers (to be assigned weekly)