Clustervision: Visual Supervision of Unsupervised Clustering
Published at
VAST
| Phoenix, AZ
2017
Abstract
Clustering, the process of grouping together similar items into distinct
partitions, is a common type of unsupervised machine learning that can be useful
for summarizing and aggregating complex multi-dimensional data. However, data
can be clustered in many ways, and there exist a large body of algorithms
designed to reveal different patterns. While having access to a wide variety of
algorithms is helpful, in practice, it is quite difficult for data scientists to
choose and parameterize algorithms to get the clustering results relevant for
their dataset and analytical tasks. To alleviate this problem, we built
Clustervision, a visual analytics tool that helps ensure data scientists find
the right clustering among the large amount of techniques and parameters
available. Our system clusters data using a variety of clustering techniques and
parameters and then ranks clustering results utilizing five quality metrics. In
addition, users can guide the system to produce more relevant results by
providing task-relevant constraints on the data. Our visual user interface
allows users to find high quality clustering results, explore the clusters using
several coordinated visualization techniques, and select the cluster result that
best suits their task. We demonstrate this novel approach using a case study
with a team of researchers in the medical domain and showcase that our system
empowers users to choose an effective representation of their complex data.