Assignment 1: Exploratory Visual Analysis

In this assignment, you will begin to work with a public dataset. You will conduct exploratory analysis using visualization, with a focus on understanding the shape & structure of the data, investigating initial questions, and developing preliminary insights & hypotheses.

Due: Thursday 9/12, 11:59 pm ET
Submit on Canvas →

Table of contents

Your Tasks

  1. In this assignment, you will conduct exploratory data analysis on a public dataset from Allegheny County’s Health Department. Your goal is to explore the data to aid in understanding trends of an ongoing health crisis: fatal accidental overdoses from a variety of drugs in the county. The Western Pennsylvania Regional Data Center publishes a monthly dataset that describes fatal accidental overdose incidents in Allegheny County, denoting age, gender, race, drugs present, zip code of incident and zip code of residence.

  2. Preview the Fatal Accidental Overdoses dataset and download the file in CSV format using the download button or this direct link

  3. Prior to beginning any visual analysis, and based on your preview of the data file, write down at least three overall questions you would like to investigate for your chosen subtheme. For each question, briefly jot down your motivation for posing the question. For instance, was there something in the news you’ve read that prompted you to ask this question? Or did you think of this question based on your experiences, or stories you’ve heard from your friends and family? Etc. Cite these sources of evidence for your motivation.

  4. Next, perform an exploratory analysis of the dataset using any visualization and data transformation tools of your choice (we offer some recommendations below). You should work through two different phases of exploration:

    1. In the first phase, you should seek to gain an overview of the shape & structure of the dataset. What variables does the dataset contain? How are they distributed? Are there any notable data quality issues? Are there any surprising relationships among the variables? Be sure to also perform “gut checks” for patterns you expect to see!

    2. In the second phase, you should investigate each of your overall questions. For each question, start by creating a visualization that might provide a useful answer. Then refine the visualization (e.g., by adding additional variables, changing sorting or axis scales, transforming your data by filtering or subsetting it, etc.) to develop better perspectives, explore unexpected observations, or sanity check your assumptions.

      As you conduct this analysis, new questions are likely to arise. You should follow-up on these questions with additional visual analysis until you feel like you have a sufficient understanding of the overall question, and are ready to proceed to the next one.

      Repeat this process for each of your overall questions, but feel free to revise your questions or branch off to explore new questions as warranted.

Final Deliverable: HTML Report

Your final submission should take the form of an HTML report—similar to a slide show—that consists of captioned visualizations detailing your most important insights. Your “insights” can include important surprises or issues (such as data quality problems affecting your analysis) as well as responses to your analysis questions.

These are rich and complex datasets. As a result, we are not expecting your submission to be an exhaustive report but rather an initial exploration of areas of interest that could inform your final project pitch. To help you gauge the scope of this assignment, we have collated three example reports below.

Each visualization image should be a screenshot exported from a visualization tool, accompanied with a descriptive title and caption (2-4 sentences long) that unpacks the insight(s) learned from that view. Provide sufficient detail for each caption such that anyone could read through your report and understand what you’ve learned. You are free, but not required, to annotate your images to draw attention to specific features of the data. You may perform highlighting within the visualization tool itself, or draw annotations on the exported image.

The end of your report should include a brief summary of main lessons learned.

To help you put together the report, we’re providing a basic HTML template for you to fill in. You will need to edit the HTML page to add your captions and correctly link your images (for simplicity, we recommend exporting image files to the same local directory as your HTML file).

Please deploy your HTML report to a publicly accessible URL. We recommend your A1 submission can be part of your portfolio from the programming labs (e.g., a subdirectory). Once deployed, please double check that your web page is appearing and rendering correctly at the publicly-accessible URL (e.g. that there are no broken images or links).

Finally, submit this URL on Canvas by the due date: Thursday 9/12, 11:59 pm ET.

Grading

The assignment score is out of a maximum of 20 points. Submissions that squarely meet the requirements (i.e., the “Satisfactory” column in the rubric below) will receive a score of 16/20 (80%). We will determine scores by judging the breadth and depth of your analysis, whether visualizations meet the expressiveness and effectiveness principles, and how well-written and synthesized your insights are. We will use the following rubric to grade your assignment.

Component Excellent Satisfactory Poor
Data Overview & Quality A thorough overview of the data was achieved, with extensive profiling of fields and records to assess data quality.
(2.5 point)
Simple checks were conducted on only a handful of fields or records.
(1.5 points)
Little or no evidence that data quality was assessed.
(0.5 points)
Breadth of Exploration Interesting overall questions target substantially different portions/aspects of the data.
(5.5 points)
A nice set of overall questions are posed, but there is some overlap.
(4 points)
Fewer than 3 overall questions were posed of the data, or there is significant conceptual overlap between them.
(2 points)
Depth of Exploration A sufficient number of follow-up questions were asked to yield rich insights that provide a robust understanding of the overall questions.
(7 points)
Some follow-up questions were asked and the seeds of insightful analysis planted, but further work could have pushed the analysis more deeply.
(6 points)
Few, if any, follow-up questions were asked after answering the overall questions. Only shallow analysis was conducted with any follow-up questions.
(4 points)
Data Transformation More advanced transformation were used to extend the dataset in interesting or useful ways.
(2.5 points)
Simple transforms (e.g., sorting, filtering) were primarily used.
(1.5 points)
The raw dataset was used directly, with little to no additional transformation.
(0.5 points)
Captions Captions richly describe the visualizations and contextualize the insight within the overall analysis.
(2.5 points)
Captions do a good job describing the visualizations, but could better connect prior or subsequent steps of the analysis.
(2 points)
Captions are missing, overly brief, or shallow in their analysis of visualizations.
(0.5 points)
Creativity & Originality You exceeded the parameters of the assignment, with original insights or particularly engaging visualizations.
(+1 bonus points)
You met all the parameters of the assignment.
(0 points)
-

Resources

Example EDA Reports

To help you gauge the scope of this assignment, our colleagues at MIT have provided example reports—two of which were sourced from a similar course at MIT. Note: these examples had an assignment with a slightly different set of instructions. Moreover, even the two exemplary reports are not perfect—they contain some mistakes or missed opportunities. Nevertheless, we consider them exemplary because they demonstrate how to engage in systematic and rigorous exploratory data analysis and, thus, earned a full 100% grade.

Tableau

You are free to use any visualization tools for this assignment. However, if you are not already familiar with visualization software, we strongly encourage you to use Tableau as it provides a graphical interface focused on the task of visual data exploration, and has a friendlier learning curve than some of the other tools listed here. You will (with rare exceptions) be able to complete an initial data exploration more quickly and comprehensively than with a programming-based tool.

  • Tableau Desktop. Available for both Windows and MacOS; with free licenses for CMU students.
    • On Tuesday, Sep 3, we will use lecture time as a hands-on tutorial with Tableau.
    • Tableau also has extensive documentation with plenty of video tutorials to help you along.
    • To easily export images from Tableau, use the Worksheet > Export > Image… menu item.