Schedule

Updated lecture slides will be posted here shortly before each lecture. For easier reading, lecture category titles are color-coded in blue, and the midterm exam is highlighted in red. Please note that the schedule is subject to change as the semester progresses.

DateTopicContentsSlidesVideos
09/02 Lecture 1: Introduction Course logistics
Applications of multimodal foundation models
——— Architecture
09/04 Lecture 2: Transformer Tokenization
Sequence-to-sequence architecture
09/09 Lecture 3: Transformer Attention But What Are Transformers?
09/11 Lecture 4: Positional embedding Absolute positional encodings
Relative positional encoding
Rotary positional encoding
How Rotary Position Encoding Supercharges Modern LLMs
09/16 Lecture 5: Architecture variants Optimizer: AdamW, Moun
Normalization: LayerNorm, RMSNorm
Pre-vs-post norm
Activation: GELU, SwiGLU
AdamW
09/18 Lecture 6: Efficient Attention Sparse and low-rank
Linear attention
Log-linear attention
Multi-head attention, Multi-query attention, Multi-head Latent Attention, TransMLA
09/23 In-Class Midterm 1
11:00am-12:15pm ET
09/25 Lecture 7: Hardware-aware attention FlashAttention
Native Sparse Attention
How FlashAttention Accelerates Generative AI Revolution
09/30 Lecture 8: Beyond tokenization SuperBPE
Byte Latent Transformer: Patches Scale Better Than Tokens
Large concept models
——— Large Language Models
10/02 Lecture 9: Pretraining Scaling laws
10/07 Lecture 10: Prompting & Parameter-Efficient Tuning LoRA, QLoRA, DoRA, SymLoRA
Prompt tuning, prefix tuning
10/09 Lecture 11: Post-Trianing Reinforcement learning from human feedback
Proximal Policy Optimization
Direct Policy Optimization
10/14 No class Fall break
10/16 Lecture 12: Reasoning Chain of thought
Text-time scaling
GRPO, GSPO
10/21 In-Class Midterm 2
11:00am-12:15pm ET
10/23 No class ICCV 2025
10/28 Lecture 13: Efficient inference Mixture of experts
Quantization
Speculative decoding
Multi-token prediction
PagedAttention
Continuous batching
10/30 Lecture 14: Efficient training Parallelism
Mixed precision training, fp16, bf16, fp8
11/04 Lecture 15: Retriveal Augmented Generation RAG
11/06 Lecture 16: Agentic AI WebGPT and NewBit
Constitutional AI, DERA, ReAct, Reflexion
AgentGPT, Re3
——— Multimodal Foundation Models
11/11 Lecture 17: Vision Transformer Vision transformer Swin transformer Transformer++
11/13 Lecture 18: Self-supervised Learning Multimodal pretraining
SimCLR
DINO
Masked autoencoder
CLIP
11/18 Lecture 19: Large Multimodal Models
InternVL1.5
GLM-4.5V
11/20 Lecture 20: Diffusion and Flow matching
Variational Autoencoder
Training, guidance, latent
Flow Matching
11/25 In-Class Midterm 3
11:00am-12:15pm ET
11/27 Lecture 21: Diffusion LLM
Diffusion LM
12/02 Lecture 22: Applications - Video

12/04 Lecture 23: Applications - 3D

12/09 Lecture 24: Applications - Robotics
Vision-language-action
12/11 Lecture 25: Applications - Protein and Biology