Interactive Data Science

Spring 2022
T/H 11:50am-1:10pm in Gates 4307
Picture of Adam Perer
Adam Perer
(Instructor)
Picture of Venkat Sivaraman
Venkat Sivaraman
(Teaching Assistant)
Picture of Aditi Sharma
Aditi Sharma
(Teaching Assistant)
Picture of Jingyi Zhang
Jingyi Zhang
(Teaching Assistant)

Additional course information available on Canvas.

The goal of this course is to provide you with the tools to understand data and build data-driven interactive systems. You will learn to tell a story with the data and explore opportunities enabled by interactive data analysis through a combination of lectures, readings of current literature, and practical skills development. Over the course of the semester, you will learn about data science and the entire data pipeline from collecting and analyzing to interacting with data. We will also cover human-centered aspects of data science and how HCI methods can enhance the interpretation of data. This course requires comfort with programming, as required projects make use of Python and Git. A series of homework assignments help to lay the groundwork for a final larger group project.

Jump to

  • TOC

Schedule and Readings

Subject to modification

Tue, Jan 18

Introduction and the Data Science Pipeline Slides

Thu, Jan 20

Value of Visualization Slides

Tue, Jan 25

Sketching Slides

Thu, Jan 27

Exploratory Data Analysis with Tableau Slides

Tue, Feb 01

Visual Encodings with Colab and Altair Slides

Thu, Feb 03

Data Quality Slides

Tue, Feb 08

Interactivity 1 Slides

Tue, Feb 15

Interactivity Lab Slides

Thu, Feb 17

Practical Machine Learning Slides

Thu, Mar 03

Final Project Introduction + Designing with Effective Visual Encodings Slides

Tue, Mar 08

No class - Spring Break Slides

Thu, Mar 10

No class - Spring Break Slides

Tue, Mar 15

Telling Stories with Data Slides

Thu, Mar 17

Telling Stories with Data Part 2 Slides

Tue, Mar 22

Natural Language Processing (guest lecture by Dr. Hendrik Strobelt @ MIT-IBM Watson AI Lab) Slides

Thu, Mar 24

Data Science Ethics Slides

  • Required Introduction by Catherine D'Ignazio and Lauren F. Klein in Data Feminism
  • Optional Chapter 1 by Catherine D'Ignazio and Lauren F. Klein in Data Feminism
Tue, Mar 29

Critique Workshop 1 Slides

Thu, Mar 31

Critique Workshop 2 Slides

Tue, Apr 05

Final Project Feedback Session (optional) Slides

Thu, Apr 07

No Class - Spring Carnival Slides

Tue, Apr 12

Controlled Experiments + Evaluation Slides

Thu, Apr 14

Uncertainty Slides

Tue, Apr 19

Fairness (guest lecture by Prof. Ken Holstein) Slides

Thu, Apr 21

Visualization and Machine Learning (guest lecture by Dr. Fred Hohman @ Apple) Slides

Tue, Apr 26

Team Project Presentations Slides

Thu, Apr 28

Team Project Presentations Slides

Syllabus

Course Goals

The learning goals of the course are as follows:

  • To be able to analyze a dataset, evaluate potential insights, and identify specific questions.
  • To introduce the value of data visualization and its principles for designing effective interactive visualizations (e.g. human perception, color theory, storytelling techniques)
  • To have a working ability to obtain, analyze, manipulate, transform, and distribute data.
  • To introduce common problems with data such as structural problems, outliers, incomplete data, and dirty data
  • To introduce basic concepts in data interpretation including feature generation, statistical analysis and classification (e.g. assumptions of data, data quality, missing data, outliers)
  • To introduce basic concepts in data collection including data formats, parsing and sources of data (Data Structure and Storage)
  • Understand and implement basic A/B experiments and understand experimental reliability and validity
  • To introduce human-centered data science topics including ethics, fairness, and interpretability
  • To provide practical applied examples of the data pipeline through an examination of current literature
  • To provide hands on experience with creating data driven applications and a produce a portfolio of such applications

Concepts

  • Structured vs unstructured data
  • Dealing with heterogeneous data
  • Sampling and Bias in Data Collection
  • Data transformation and analysis
  • Data visualization
  • Current research in information driven interfaces

Skills

  • Getting Web data
  • Dealing with APIs
  • Common data formats
  • Data parsing
  • Common problems with data
  • Tools for analyzing data
  • Tools for visualizing data

Some of the specific skills that will be covered in projects include:

  • Display data from an API on a data-driven application you create
  • Create interactive visualizations of data
  • Answer a series of intriguing questions from both the data and corresponding visualizations

Prerequisites

The class will involve programming and debugging. If you find programming or debugging extremely difficult, this course may not be for you as you will have to master several very different programming languages/libraries/concepts in very short order (projects make use use of web programming frameworks including Pandas, Altair, Streamlit; and multiple languages including Python, JavaScript, and SQL). That being said, the assignments will mostly only require Python unless you decide on a project using any other language.

Projects

The course is project-oriented. It includes a large final group-defined project along with 2 homework assignments designed to provide the stepping stones needed to complete the final project. Tentative due dates for these projects can be found in the schedule above. Your work will be evaluated relative to your background and level of effort. This is a graduate-level class, and the assumption is that you are a mature and motivated student, and that you will define your work so that you learn and grow, given your background.

All homework assignments are to be done as individual work. It is expected that students may assist each other with conceptual issues, but not provide code. If you use example code, you must explicitly acknowledge this in your assignment submission. If you are unsure about these boundaries, please ask the instructors.

Work Required

This will not be an exam-heavy course. Instead, much of the work will focus on projects. The course will focus on understanding the techniques of data science and visualization through developing creative analyses and visualizations using tools to solve defined problems.

There is no final exam in this course. Students who do well will be invited to continue on an independent project on topics related to the course, working with Prof. Perer or others in the DIG lab during a future semester.

Course Material

Readings will be made available on the schedule listed above.

You will be expected to read assigned readings before the lecture they pertain to. These may include chapters drawn from textbooks about data, or readings about the research literature. To incentivize this, each student will be required to make at least one relevant postings to the discussion group before the class on which each reading is due. This participation will count toward the Participation and Attendance portion of their grade.

All students are required to submit at least 1 substantive discussion post per lecture on Canvas related to the course readings. Each student has 1 pass for skipping comments.

Good comments typically exhibit one or more of the following:

  • Critiques of arguments made in the papers
  • Analysis of implications or future directions for work discussed in lecture or readings
  • Clarification of some point or detail presented in the class
  • Insightful questions about the readings or answers to other people’s questions
  • Links to web resources or examples that pertain to a lecture or reading

Grades

The tentative breakdown for grading is below. As a reminder, here is the university policy on academic integrity.

  • 30% Homework Assignments
  • 60% Final Project
  • 10% Participation and Attendance

Respect for Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In addition, if any of our class meetings conflict with your religious events, please let us know so that we can make arrangements for you.

Accommodations for Students with Disabilities

If you have a disability and are registered with the Office of Disability Resources, we encourage you to use their online system to notify us of your accommodations and discuss your needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Health and Well-being

If you are experiencing COVID-like symptoms or have a recent COVID exposure, do not attend class if we are meeting in-person. Please email the instructors for accomodations.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help; call 412-268-2922 and visit their website at www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help. If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

If the situation is life threatening, call the police. On campus call CMU Police: 412-268-2323. Off campus: 911.

If you have questions about this or your coursework, please let the instructors know. Thank you, and have a great semester.