Stanford University EDUC463/CS432: Computer Vision for Education and Social Science Research

Course Description

Computer vision -- the study of how to design artificial systems that can perform high-level tasks related to image or video data (e.g. recognizing and locating objects in images and behaviors in videos, or generating photorealistic, imagined data) -- has seen dramatic successes in recent years. In this course, we seek to give education and social science researchers the know-how needed to apply cutting edge computer vision algorithms in their work as well as an opportunity to workshop applications in their domains of interest. Particular application domains of interest include (1) building computer vision-powered tools which can be applied in educational settings, and (2) the analysis of human behavioral data. However, this course is meant to be useful for a wide variety of use cases, and aspects of it can be tailored to individual interest. Students will complete a project component where they will apply such a technique in a domain of their choosing.

This course is part lecture (in which I give a survey of computer vision tools and explain the basics of their inner workings), part seminar (in which we workshop a wide variety of application ideas, many student-inspired), and part workshop/lab (in which we prototype applications). We'll look to draw from the students taking the course for application ideas, but to give you a sense, here are a few prime examples:

Education and social science research often uses, or could gainfully use, video data, which is manually coded by researchers (often for high-level semantic content, e.g. engagement or affect). In some cases, computer vision has the capability of automating large aspects of this process. In other cases, it has the potential to aid manual coding, greatly reducing the effort of researchers. In either case, it may be possible to drastically increase the scale of data analysis as well as add a wide variety of additional quantitative metrics.
A wide variety of computer vision techniques (e.g. affect recognition, object recognition, and object localization) have promising applications in educational technologies. For example, in previous work, I and many others worked on a computer-vision powered tool for children with autism using Google Glass. There is tremendous opportunity to prototype new tools, toys, and aids.
Recent and ongoing advances in foundation models, and in particular, generative AI, present a wide range of opportunities for developing novel content, and for students to interact with, learn from, and learn with AI in new ways.

FAQ

Who is this class for?

It's useful for you to have some background in programming, though we will tailor the course and scope projects so that it is suitable for a wide variety of backgrounds. Most importantly, you should be interested in these sorts of applications! Stanford offers a number of excellent computer sciences centered on computer vision, and we'll look to distinguish this one for the opportunity to workshop and prototype application ideas. In the past, this course has been useful for applications-oriented students who either do not have the bandwidth for a course like CS231N, or who wish to take it concurrently or in the future.

What's computer vision?

Here is a high-level blog post about it and its many capabilities. Note: this is from 2019, which leaves out many recent advances! Here is a more recent blog post specialized to generative AI.

Do I need to come in with a dataset or particular application in mind?

No! These are welcome, but we'll work together to find useful data with which to play.

What level of coding is involved?

We'll look to make this accessible to those with a wide variety of backgrounds. By then end, we hope you'll be able to take existing computer vision tools (either high-level tools, like those provided by Google AI Platform, or lower level, like someone's GitHub repo), try them out in some new application, and troubleshoot if they do not work. So this will likely be less demanding than CS vision courses (and perhaps a good stepping stone to them), and if you are, for example, comfortable manipulating and analyzing data in some language, and have at least some familiarity with python, this course could work well for you. In the past, several students built their projects off of Tensorflow Colab Demos, which are run in the browser, whereas others found custom software or GitHub repos that worked well for their projects.

Prerequisites

This course is intended to attract a broad range of students interested in applying these methods, and in particular students who do not have significant experience taking computer science courses. Project milestones will be tailored so as to be appropriate for each student’s background, and we'll make sure you feel supported in all technical aspects.

Students should have some experience in at least one programming language. It's helpful to be able to find one's way around simple python scripts, but if you're strongly motivated, you should be able to pick up the needed basics.

A working knowledge in linear algebra, calculus, data science, and statistics (MATH 19, 41, or 51, EDUC400b, 423, 423a, or equivalent) is useful but not at all required.

Learning objectives & student responsibilities

Students should walk away from the course with three general skills:

A basic familiarity with a wide variety of successful computer vision tools,

enough fluency in the field so that, for a particular application, they can attempt to find a suitable computer vision tool, locate a codebase or API for it, and apply it in their setting, and

the ability to troubleshoot, and in some cases adapt, their computer vision applications.

Each student will complete a project in which they will apply computer vision to a domain of their choosing. See here for project details.

In addition, each student will complete pre-assignments, short homework assignments. These typically consist of a short response on a reading, an open-ended exploration, or some tooling with a demo. These are meant to be a low-lift, low-stakes way of introducing ideas prior to class.

Week-by-week

Topics and schedule subject to modifications. Not all details of assignments appear here. See Canvas for pre-assignment discussions, and the project page for project milestones.

Unit 1: Getting started

We take a first look at what is possible with computer vision with a high-level survey of current techniques. We'll point you to resources for getting started, discuss ethical considerations, and have first pitches for projects.

Class 1 (Jan 8): Introduction: what is computer vision? Syllabus and logistics

Class 2 (Jan 10): A whirlwind tour of computer vision, part 1

Class 3 (Jan 17): A whirlwind tour of computer vision, part 2

[Due: pre-assignment -- exploring existing tools]

Class 4 (Jan 22): Scoping projects

Class 5 (Jan 24): Getting started resources

Class 6 (Jan 29): Ethical considerations

[Due: pre-assignment -- ethics basics]

Class 7 (Jan 31): Project pitches

[Due: presentation in class (please upload slides,see format guide)]

Unit 2: Inner workings and practicalities.

The goal of this class is not to be able to implement all of the above algorithms ourselves. However, in applying any of these, it is useful to have some intuition for how they work. Fortunately, many current methods have a great deal of commonality. Hence, here we will look in some detail at how image classification works, with practice being able to train such algorithms. We will then have a tutorial on other techniques.

Class 8 (Feb 5): Image classification, training basics

[Due: pre-assignment -- optimization basics]

Class 9 (Feb 7): The deep learning revolution

[Due: pre-assignment -- convnets and transformers, project scoping follow-up]

Class 10 (Feb 12): Model selection, success metrics, and hyperparameter tuning

Class 11 (Feb 14): Self-supervision and large pre-trained models

[Due: project workshopping session 1 -- come prepared with a short presentation on current challenges (please upload slides, see format guide)]

Class 12 (Feb 21): Generative models

[Due: pre-assignment -- generative models]

Class 13 (Feb 26): Practicalities and pitfalls

[Due: project workshopping session 2 -- come prepared with a short presentation on current challenges (please upload slides, see format guide)]

Unit 3: A selection of computer vision applications

Here, we will have a series of lectures and discussions on current applications of computer vision in education and the social sciences.

Class 14 (Feb 28): Guest speaker 1

Class 15 (Mar 4): The Autism Glass Project

Class 16 (Mar 6): Guest speaker 2

Class 17 (Mar 11): Student final presentations, pt 1

Class 18 (Mar 13): Student final presentations, pt 2

EDUC463 (CS432): Computer Vision for Education and Social Science Research

Winter 2024

Instructor: Nick Haber

TA: Merve Cerit

Meeting times: Mo, We 4:30 - 5:50PM

Location: 200-107

Nick's Office Hours: Wednesday 3:10 - 4:10PM, CERAS 109 (drop-in)

Merve's Office Hours: Monday 3:00 - 4:00 PM, Zoom Link

Course Description

FAQ

Who is this class for?

What's computer vision?

Do I need to come in with a dataset or particular application in mind?

What level of coding is involved?

Prerequisites

Meeting details

Office hours

Forum, project submissions

Grading

Learning objectives & student responsibilities

Resources:

Week-by-week

Unit 1: Getting started

Unit 2: Inner workings and practicalities.

Unit 3: A selection of computer vision applications