Interactive Data Science

Fall 2020
Picture of Adam Perer
Adam Perer
(Instructor)
Picture of Dominik Moritz
Dominik Moritz
(Instructor)
Picture of Kunal Khadilkar
Kunal Khadilkar
(Teaching Assistant)
Picture of Aditya Anantharaman
Aditya Anantharaman
(Teaching Assistant)
Picture of Pranav Thombre
Pranav Thombre
(Teaching Assistant)

Canvas Link.

The goal of this course is to provide you with the tools to understand data and build data-driven interactive systems. You will learn to tell a story with the data and explore opportunities enabled by interactive data analysis through a combination of lectures, readings of current literature, and practical skills development. Over the course of the semester, you will learn about data science and the entire data pipeline from collecting and analyzing to interacting with data. We will also cover human-centered aspects of data science and how HCI methods can enhance the interpretation of data. This course requires comfort with programming, as required projects make use of Python and Git (and if you want JavaScript or others). A series of homework assignments help to lay the groundwork for a final larger group project.

Syllabus

Course Goals

The learning goals of the course are as follows:

  • To be able to analyze a dataset, evaluate potential insights, and identify specific questions.
  • To introduce the value of data visualization and its principles for designing effective interactive visualizations (e.g. human perception, color theory, storytelling techniques)
  • To have a working ability to obtain, analyze, manipulate, transform, and distribute data.
  • To introduce common problems with data such as structural problems, outliers, incomplete data, and dirty data
  • To introduce basic concepts in data interpretation including feature generation, statistical analysis and classification (e.g. assumptions of data, data quality, missing data, outliers)
  • To introduce basic concepts in data collection including data formats, parsing and sources of data (Data Structure and Storage)
  • Understand and implement basic A/B experiments and understand experimental reliability and validity
  • To introduce human-centered data science topics including ethics, fairness, and interpretability
  • To provide practical applied examples of the data pipeline through an examination of current literature
  • To provide hands on experience with creating data driven applications and a produce a portfolio of such applications

Concepts

  • Structured vs unstructured data
  • Dealing with heterogeneous data
  • Sampling and Bias in Data Collection
  • Sensed Data
  • Data transformation and analysis
  • Data Visualization
  • Current research in information driven interfaces

Skills

  • Getting Web data
  • Dealing with APIs
  • Common data formats
  • Data parsing
  • Common problems with data
  • Tools for analyzing data
  • Tools for visualizing data

Some of the specific skills that will be covered in projects include:

  • Display data from an API on a data-driven application you create
  • Create interactive visualizations of data
  • Answer a series of intriguing questions from both the data and corresponding visualizations

Prerequisites

The class will involve programming and debugging. However, you should not take the course if you find programming or debugging extremely difficult because you will have to master several very different programming languages/concepts in very short order (projects make use use of web programming frameworks including Pandas, Altair, Streamlit; and multiple languages including Python, JavaScript, and SQL). That being said, the assignments will mostly require Python unless you decide on a project using any other language.

List of Topics

  • Introduction / The Data Science Pipeline
  • The Value of Visualization
  • Sketching Data
  • Perception
  • Color
  • Tableau Workshop
  • Visualization Design
  • Data quality + Introduction to Notebooks
  • Databases and SQL
  • Data Storage
  • Data Sampling
  • Streamlit Workshop
  • Interaction + Altair Workshop
  • Interaction + Altair Workshop
  • Telling Stories with Data
  • Final Project Discussion
  • Visualization Ethics
  • Data science Tooling
  • ML workshop
  • ML workshop
  • Data in ML
  • Crowdsourcing
  • Quantitative Evaluation
  • Controlled Experiments
  • Qualitative evaluation
  • No Class - Thanksgiving Holiday
  • ML Interpretability
  • ML Fairness

Projects

The course is project oriented. It includes a large final group-defined project along with 3 homework assignments designed to provide the stepping stones needed to complete the final project. Tentative due dates for these projects can be found at the bottom of this syllabus under the ‘Course Summary’ heading. Your work will be evaluated relative to your background and level of effort. This is a graduate class, and the assumption is that you are a mature and motivated student, and that you will define your work so that you learn and grow, given your background. Students who are taking this course as a part of a technical requirement (such as the computer science course requirement in the HCI PhD) will need to do more advanced or ambitious projects, and should consult with the instructor to make sure they are meeting this bar.

All homework assignments are to be done as individual work. It is expected that students may assist each other with conceptual issues, but not provide code. If you use example code, you must explicitly acknowledge this in your assignment submission. If you are unsure about these boundaries, ask.

Work Required

This will not be an exam-heavy course. Instead, much of the work will focus on projects. The course will focus on understanding the techniques of data science and visualization through developing creative analyses and visualizations using tools to solve defined problems.

There is no final exam in this course. Students who do well will be invited to continue on an independent project on topics related to the course, working with Prof. Perer or Prof. Moritz during a future semester.

Course Material

Readings will be made available on this CMU Canvas site.

Readings

You will be expected to read assigned readings before the lecture they pertain to. These may include chapters drawn from textbooks about data, or readings about the research literature. To incentivize this, each student will be required to make at least one relevant postings to the discussion group before the class on which each reading is due. This participation will count toward the Participation and Attendance portion of their grade.

All students are required to submit at least 1 substantive discussion post per week on Piazza related to the course readings. Each student has 1 pass for skipping comments.

Good comments typically exhibit one or more of the following:

  • Critiques of arguments made in the papers
  • Analysis of implications or future directions for work discussed in lecture or readings
  • Clarification of some point or detail presented in the class
  • Insightful questions about the readings or answers to other people’s questions
  • Links to web resources or examples that pertain to a lecture or reading

Grades

The tentative breakdown for grading is below. As a reminder, here is the university policy on academic integrity.

  • 40% Homework Assignments
  • 50% Final Project
  • 10% Participation and Attendance

Respect for Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In addition, if any of our class meetings conflict with your religious events, please let us know so that we can make arrangements for you.

Accommodations for Students with Disabilities

If you have a disability and are registered with the Office of Disability Resources, we encourage you to use their online system to notify us of your accommodations and discuss your needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Health and Well-being

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help. If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night: CaPS: 412-268-2922 Re:solve Crisis Network: 888-796-8226 If the situation is life threatening, call the police On campus: CMU Police: 412-268-2323 Off campus: 911.

If you have questions about this or your coursework, please let the instructors know. Thank you, and have a great semester.