High Variety Cloud Databases

Shrainik Jain

Dominik Moritz

Bill Howe

Published at ICDE 2016

Abstract

Big Data is colloquially described in terms of the three Vs: Volume, Velocity, and Variety. Volume and velocity receive a disproportionate amount of research attention, however, variety is frequently cited by practitioners as the Big Data problem that "keeps them up at night" — the problem that resists direct attacks in terms of new algorithms, systems, and approaches. We find that the cloud-based data management platform attracts higher variety workloads, therefore motivating a new classes of High Variety Database Management Systems (HVDBMS). This work provides an operational model of variety emphasizing the complexity of user intent as well as the complexity of the data itself. The proposed model captures intuitive notions of variety that are distinct from, and broader than, conventional data integration challenges, establishes criteria for a "High Variety benchmark" that can be used to evaluate competing systems, and motivates new research directions in the design of HVDBMS.

High Variety Cloud Databases

Abstract

Materials