Interactive Systems for Scalable Visualization and Analysis

Dominik’s PhD thesis.

Teaser image

Abstract

While computers can help us manage data, human judgment and domain expertise is what turns it into understanding. Meeting the challenges of increasingly large and complex data requires methods that richly integrate the capabilities of both people and machines. In response to these challenges, this thesis contributes new languages and models for visualization design that power interactive systems for scalable data analysis.

In these languages, users can be imprecise about low-level design decisions as the system leverages this ambiguity to optimize the visual design and necessary computation. Vega-Lite is a high-level declarative language for rapidly creating interactive visualizations, while also providing a convenient yet powerful representation for tools that generate visualizations. Vega-Lite uses smart defaults to fill in low-level details to create eective designs. The declarative design facilitates optimization of the required data processing. Draco is a model of visualization design that extends Vega-Lite with shareable design guidelines, formal reasoning over the design space, and visualization recommendation. We show how we can use Draco to construct increasingly sophisticated automated visualization design and recommendation systems, including systems based on weights learned directly from the results of graphical perception experiments.

We take a user-centric perspective on systems for scalable exploratory analysis. Considering both the backend and frontend concerns, we present Falcon, an interactive crossfilter application where users can interact with billions of records without latencies that negatively affect their exploration. To scale beyond billions of records, we present Pangloss, a visual analysis system that uses approximate query processing but provides eventual guarantees using Optimistic Visualization. In this concept, we treat approximate query processing as a user experience problem to address users' primary concern: trust in their exploration results. Falcon and Pangloss contribute techniques for scalable interaction and exploration of large data volumes by making principled trade-os among people's latency tolerance, precomputation, and the level of approximation.

Materials