| Date | Topic | Contents | Slides | Videos | |
|---|---|---|---|---|---|
| 09/02 | Lecture 1: Introduction | Course logistics Applications of multimodal foundation models |
|||
| ——— | Architecture | ||||
| 09/04 | Lecture 2: Transformer | Tokenization Sequence-to-sequence architecture |
|||
| 09/09 | Lecture 3: Transformer | Attention | But What Are Transformers? | ||
| 09/11 | Lecture 4: Positional embedding | Absolute positional encodings Relative positional encoding Rotary positional encoding |
How Rotary Position Encoding Supercharges Modern LLMs | ||
| 09/16 | Lecture 5: Architecture variants | Optimizer: AdamW, Moun Normalization: LayerNorm, RMSNorm Pre-vs-post norm Activation: GELU, SwiGLU |
AdamW |
||
| 09/18 | Lecture 6: Efficient Attention | Sparse and low-rank Linear attention Log-linear attention Multi-head attention, Multi-query attention, Multi-head Latent Attention, TransMLA |
|||
| 09/23 |
In-Class Midterm 1 |
11:00am-12:15pm ET | |||
| 09/25 | Lecture 7: Hardware-aware attention | FlashAttention Native Sparse Attention |
How FlashAttention Accelerates Generative AI Revolution |
||
| 09/30 | Lecture 8: Beyond tokenization | SuperBPE Byte Latent Transformer: Patches Scale Better Than Tokens Large concept models |
|||
| ——— | Large Language Models | ||||
| 10/02 | Lecture 9: Pretraining | Scaling laws | |||
| 10/07 | Lecture 10: Prompting & Parameter-Efficient Tuning | LoRA, QLoRA, DoRA, SymLoRA Prompt tuning, prefix tuning |
|||
| 10/09 | Lecture 11: Post-Trianing | Reinforcement learning from human feedback Proximal Policy Optimization Direct Policy Optimization |
|||
| 10/14 | No class | Fall break | |||
| 10/16 | Lecture 12: Reasoning | Chain of thought Text-time scaling GRPO, GSPO |
|||
| 10/21 |
In-Class Midterm 2 |
11:00am-12:15pm ET | |||
| 10/23 | No class | ICCV 2025 | |||
| 10/28 | Lecture 13: Efficient inference | Mixture of experts Quantization Speculative decoding Multi-token prediction PagedAttention Continuous batching |
|||
| 10/30 | Lecture 14: Efficient training | Parallelism Mixed precision training, fp16, bf16, fp8 |
|||
| 11/04 | Lecture 15: Retriveal Augmented Generation | RAG | |||
| 11/06 | Lecture 16: Agentic AI | WebGPT and NewBit Constitutional AI, DERA, ReAct, Reflexion AgentGPT, Re3 |
|||
| ——— | Multimodal Foundation Models | ||||
| 11/11 | Lecture 17: Vision Transformer | Vision transformer Swin transformer Transformer++ | |||
| 11/13 |
Lecture 18: Self-supervised Learning
Multimodal pretraining |
SimCLR DINO Masked autoencoder CLIP |
|||
| 11/18 |
Lecture 19: Large Multimodal Models
|
InternVL1.5 GLM-4.5V |
|||
| 11/20 |
Lecture 20: Diffusion and Flow matching
|
Variational Autoencoder Training, guidance, latent Flow Matching |
|||
| 11/25 |
In-Class Midterm 3 |
11:00am-12:15pm ET | |||
| 11/27 |
Lecture 21: Diffusion LLM
|
Diffusion LM |
|||
| 12/02 |
Lecture 22: Applications - Video
|
||||
| 12/04 |
Lecture 23: Applications - 3D
|
||||
| 12/09 |
Lecture 24: Applications - Robotics
|
Vision-language-action |
|||
| 12/11 |
Lecture 25: Applications - Protein and Biology
|