INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data
Published at
VAST
| Paris, France
2014
Abstract
Predictive modeling techniques are increasingly being used by data scientists to
understand the probability of predicted outcomes. However, for data that is
high-dimensional, a critical step in predictive modeling is determining which
features should be included in the models. Feature selection algorithms are
often used to remove non-informative features from models. However, there are
many different classes of feature selection algorithms. Deciding which one to
use is problematic as the algorithmic output is often not amenable to user
interpretation. This limits the ability for users to utilize their domain
expertise during the modeling process. To improve on this limitation, we
developed INFUSE, a novel visual analytics system designed to help analysts
understand how predictive features are being ranked across feature selection
algorithms, cross-validation folds, and classifiers. We demonstrate how our
system can lead to important insights in a case study involving clinical
researchers predicting patient outcomes from electronic medical records.