Skip to main content

SPARTAN: A model-based semantic compression system for massive data tables

Publication ,  Journal Article
Babu, S; Garofalakis, M; Rastogi, R
Published in: SIGMOD Record (ACM Special Interest Group on Management of Data)
January 1, 2001

While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the ever-increasing data collection rates of modern enterprises and the need for effective, guaranteed-quality approximate answers to queries over massive relational data sets. In this paper, we propose SPARTAN, a system that takes advantage of attribute semantics and data-mining models to perform lossy compression of massive data tables. SPARTAN is based on the novel idea of exploiting predictive data correlations and prescribed error tolerances for individual attributes to construct concise and accurate Classification and Regression Tree (CaRT) models for entire columns of a table. More precisely, SPARTAN selects a certain subset of attributes for which no values are explicitly stored in the compressed table; instead, concise CaRTs that predict these values (within the prescribed error bounds) are maintained. To restrict the huge search space and construction cost of possible CaRT predictors, SPARTAN employs sophisticated learning techniques and novel combinatorial optimization algorithms. Our experimentation with several real-life data sets offers convincing evidence of the effectiveness of SPARTAN's model-based approach - SPARTAN is able to consistently yield substantially better compression ratios than existing semantic or syntactic compression tools (e.g., gzip) while utilizing only small data samples for model inference.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

SIGMOD Record (ACM Special Interest Group on Management of Data)

DOI

ISSN

0163-5808

Publication Date

January 1, 2001

Volume

30

Issue

2

Start / End Page

283 / 294

Related Subject Headings

  • Information Systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Babu, S., Garofalakis, M., & Rastogi, R. (2001). SPARTAN: A model-based semantic compression system for massive data tables. SIGMOD Record (ACM Special Interest Group on Management of Data), 30(2), 283–294. https://doi.org/10.1145/376284.375693
Babu, S., M. Garofalakis, and R. Rastogi. “SPARTAN: A model-based semantic compression system for massive data tables.” SIGMOD Record (ACM Special Interest Group on Management of Data) 30, no. 2 (January 1, 2001): 283–94. https://doi.org/10.1145/376284.375693.
Babu S, Garofalakis M, Rastogi R. SPARTAN: A model-based semantic compression system for massive data tables. SIGMOD Record (ACM Special Interest Group on Management of Data). 2001 Jan 1;30(2):283–94.
Babu, S., et al. “SPARTAN: A model-based semantic compression system for massive data tables.” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 30, no. 2, Jan. 2001, pp. 283–94. Scopus, doi:10.1145/376284.375693.
Babu S, Garofalakis M, Rastogi R. SPARTAN: A model-based semantic compression system for massive data tables. SIGMOD Record (ACM Special Interest Group on Management of Data). 2001 Jan 1;30(2):283–294.

Published In

SIGMOD Record (ACM Special Interest Group on Management of Data)

DOI

ISSN

0163-5808

Publication Date

January 1, 2001

Volume

30

Issue

2

Start / End Page

283 / 294

Related Subject Headings

  • Information Systems