UMD CMSC848K: Multimodal Foundation Models

Course Description

Recent advances in machine learning are driven by training scalable models on Internet-scale data (e.g., billions of image-text pairs or trillions of text tokens). This gives rises to foundation models that demonstrate in diverse tasks. In this course, we will study techniques that enable such machine learning systems. We will cover foundation models for language, vision, and other modalities.

[Fall 2024]

Instructor

Jia-Bin Huang

Teaching Assistants

Yi-Ting Chen

Yao-Chih Lee

Ting-Hsuan Liao

Course Logistics

Lectures: Tuesday/Thursday 11:00AM-12:15PM Eastern Time at CSI 3117.
Lecture Videos: No lecture recordings. The instructor will post edited/summarized videos on the selected topics for student reviews. These will be posted shortly after the lectures.
Piazza: We will be using Piazza as the primary platform for communication.

Coursework

Prerequisites

College Calculus, Linear algebra, Probability and Statistics. Prior courses in machine learning, natural language processing, and computer vision are helpful, but not required.

Midterm (30%)

We have three in-class midterm exams through out the semester. Detailed information will be made available.

Final Project (30%)

Students will work in a group of 2-3 students to work on projects on the topic of multimodal foundation models.

Paper Review (40%)

We will have a list of recommended paper readings starting from the third lecture. For each lecture, students will turn in an one-page paper review. The review should have two sections:

paper summary
your critiques (strenth/weakness of the paper, interesting insights or questions that worth discussions)

The paper review will be due prior to the class (11:00 AM on Tues or Thurs). No late submissions are allowed. The students need to submit at least 20 paper reviews to receive full scores (40%).

Regrade Requests

If you believe that the course staff made an objective error in grading, you may submit a regrade request on Gradescope within 7 days of the grade release. Your request should briefly summarize why the original grading was incorrect.

Late Policy

No late submissions are allowed.
Students need to submit at least 20 paper reviews to receive full scores of Paper Review (40%).

CMSC848K: Multimodal Foundation Models

UMD - Fall 2025