Summary image here

Course Description

Computer vision -- the study of how to design artificial systems that can perform high-level tasks related to image or video data (e.g. recognizing and locating objects in images and behaviors in videos, or generating photorealistic, imagined data) -- has seen dramatic successes in recent years. In this course, we seek to give education and social science researchers the know-how needed to apply cutting edge computer vision algorithms in their work as well as an opportunity to workshop applications in their domains of interest. Particular application domains of interest include (1) building computer vision-powered tools which can be applied in educational settings, and (2) the analysis of human behavioral data. However, this course is meant to be useful for a wide variety of use cases, and aspects of it can be tailored to individual interest. Students will complete a project component where they will apply such a technique in a domain of their choosing.

This course is part lecture (in which I give a survey of computer vision tools and explain the basics of their inner workings), part seminar (in which we workshop a wide variety of application ideas, many student-inspired), and part workshop/lab (in which we prototype applications). We'll look to draw from the students taking the course for application ideas, but to give you a sense, here are a few prime examples:

  • Education and social science research often uses, or could gainfully use, video data, which is manually coded by researchers (often for high-level semantic content, e.g. engagement or affect). In some cases, computer vision has the capability of automating large aspects of this process. In other cases, it has the potential to aid manual coding, greatly reducing the effort of researchers. In either case, it may be possible to drastically increase the scale of data analysis as well as add a wide variety of additional quantitative metrics.
  • A wide variety of computer vision techniques (e.g. affect recognition, object recognition, and object localization) have promising applications in educational technologies. For example, in previous work, I and many others worked on a computer-vision powered tool for children with autism using Google Glass. There is tremendous opportunity to prototype new tools, toys, and aids.
  • Recent and ongoing advances in foundation models, and in particular, generative AI, present a wide range of opportunities for developing novel content, and for students to interact with, learn from, and learn with AI in new ways.

FAQ

Who is this class for?

It's useful for you to have some background in programming, though we will tailor the course and scope projects so that it is suitable for a wide variety of backgrounds. Most importantly, you should be interested in these sorts of applications! Stanford offers a number of excellent computer sciences centered on computer vision, and we'll look to distinguish this one for the opportunity to workshop and prototype application ideas. In the past, this course has been useful for applications-oriented students who either do not have the bandwidth for a course like CS231N, or who wish to take it concurrently or in the future.

What's computer vision?

Here is a high-level blog post about it and its many capabilities. Note: this is from 2019, which leaves out many recent advances! Here is a more recent blog post specialized to generative AI.

Do I need to come in with a dataset or particular application in mind?

No! These are welcome, but we'll work together to find useful data with which to play.

What level of coding is involved?

We'll look to make this accessible to those with a wide variety of backgrounds. By then end, we hope you'll be able to take existing computer vision tools (either high-level tools, like those provided by Google AI Platform, or lower level, like someone's GitHub repo), try them out in some new application, and troubleshoot if they do not work. So this will likely be less demanding than CS vision courses (and perhaps a good stepping stone to them), and if you are, for example, comfortable manipulating and analyzing data in some language, and have at least some familiarity with python, this course could work well for you. In the past, several students built their projects off of Tensorflow Colab Demos, which are run in the browser, whereas others found custom software or GitHub repos that worked well for their projects.

Prerequisites

This course is intended to attract a broad range of students interested in applying these methods, and in particular students who do not have significant experience taking computer science courses. Project milestones will be tailored so as to be appropriate for each student’s background, and we'll make sure you feel supported in all technical aspects.

Meeting details

Winter quarter, 2024
Mo, We 4:30-5:50PM
Building 200 rm 107

Office hours

Nick: Wednesday 3:10 - 4:10PM, CERAS 109 (drop-in)
Merve: Monday 3:00 - 4:00 PM, Zoom Link, also by request

nhaber@stanford.edu mervecer@stanford.edu

Forum, project submissions

Visit our Canvas page.

Grading

Project report 20%
Project final presentation 15%
Earlier project milestones 30%
Pre-assignments 25%
Discussion participation 10%

Learning objectives & student responsibilities

Students should walk away from the course with three general skills:

Each student will complete a project in which they will apply computer vision to a domain of their choosing. See here for project details.

In addition, each student will complete pre-assignments, short homework assignments. These typically consist of a short response on a reading, an open-ended exploration, or some tooling with a demo. These are meant to be a low-lift, low-stakes way of introducing ideas prior to class.

Resources:

This Canvas link contains some useful resources, open to suggestions.

Week-by-week

Topics and schedule subject to modifications. Not all details of assignments appear here. See Canvas for pre-assignment discussions, and the project page for project milestones.

Unit 1: Getting started

We take a first look at what is possible with computer vision with a high-level survey of current techniques. We'll point you to resources for getting started, discuss ethical considerations, and have first pitches for projects.

Unit 2: Inner workings and practicalities.

The goal of this class is not to be able to implement all of the above algorithms ourselves. However, in applying any of these, it is useful to have some intuition for how they work. Fortunately, many current methods have a great deal of commonality. Hence, here we will look in some detail at how image classification works, with practice being able to train such algorithms. We will then have a tutorial on other techniques.

Unit 3: A selection of computer vision applications

Here, we will have a series of lectures and discussions on current applications of computer vision in education and the social sciences.