Abstract
Big Data is colloquially described in terms of the three Vs: Volume, Velocity,
and Variety. Volume and velocity receive a disproportionate amount of research
attention, however, variety is frequently cited by practitioners as the Big Data
problem that "keeps them up at night" — the problem that resists direct attacks
in terms of new algorithms, systems, and approaches. We find that the
cloud-based data management platform attracts higher variety workloads,
therefore motivating a new classes of High Variety Database Management Systems
(HVDBMS). This work provides an operational model of variety emphasizing the
complexity of user intent as well as the complexity of the data itself. The
proposed model captures intuitive notions of variety that are distinct from, and
broader than, conventional data integration challenges, establishes criteria for
a "High Variety benchmark" that can be used to evaluate competing systems, and
motivates new research directions in the design of HVDBMS.