Course Description

Recent advances in machine learning are driven by training scalable models on Internet-scale data (e.g., billions of image-text pairs or trillions of text tokens). This gives rises to foundation models that demonstrate in diverse tasks. In this course, we will study techniques that enable such machine learning systems. We will cover foundation models for language, vision, and other modalities.

Instructor

Teaching Assistants

Course Logistics



Coursework

Prerequisites

College Calculus, Linear algebra, Probability and Statistics. Prior courses in machine learning, natural language processing, and computer vision are helpful, but not required.

Midterm (30%)

We have three in-class midterm exams through out the semester. Detailed information will be made available.

Final Project (30%)

Students will work in a group of 2-3 students to work on projects on the topic of multimodal foundation models.

Paper Review (40%)

We will have a list of recommended paper readings starting from the third lecture. For each lecture, students will turn in an one-page paper review. The review should have two sections:

The paper review will be due prior to the class (11:00 AM on Tues or Thurs). No late submissions are allowed. The students need to submit at least 20 paper reviews to receive full scores (40%).

Regrade Requests

If you believe that the course staff made an objective error in grading, you may submit a regrade request on Gradescope within 7 days of the grade release. Your request should briefly summarize why the original grading was incorrect.

Late Policy