Skip to main content

Profiling, what-if analysis, and costbased optimization of mapreduce programs

Publication ,  Journal Article
Herodotou, H; Babu, S
Published in: Proceedings of the VLDB Endowment
January 1, 2011

MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby. We introduce, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. We focus on the optimization opportunities presented by the large space of configuration parameters for these programs. We also introduce a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. All components have been prototyped for the popular Hadoop MapReduce system. The effectiveness of each component is demonstrated through a comprehensive evaluation using representative MapReduce programs from various application domains. © 2011 VLDB Endowment.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proceedings of the VLDB Endowment

DOI

EISSN

2150-8097

Publication Date

January 1, 2011

Volume

4

Issue

11

Start / End Page

1111 / 1122

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Herodotou, H., & Babu, S. (2011). Profiling, what-if analysis, and costbased optimization of mapreduce programs. Proceedings of the VLDB Endowment, 4(11), 1111–1122. https://doi.org/10.14778/3402707.3402746
Herodotou, H., and S. Babu. “Profiling, what-if analysis, and costbased optimization of mapreduce programs.” Proceedings of the VLDB Endowment 4, no. 11 (January 1, 2011): 1111–22. https://doi.org/10.14778/3402707.3402746.
Herodotou H, Babu S. Profiling, what-if analysis, and costbased optimization of mapreduce programs. Proceedings of the VLDB Endowment. 2011 Jan 1;4(11):1111–22.
Herodotou, H., and S. Babu. “Profiling, what-if analysis, and costbased optimization of mapreduce programs.” Proceedings of the VLDB Endowment, vol. 4, no. 11, Jan. 2011, pp. 1111–22. Scopus, doi:10.14778/3402707.3402746.
Herodotou H, Babu S. Profiling, what-if analysis, and costbased optimization of mapreduce programs. Proceedings of the VLDB Endowment. 2011 Jan 1;4(11):1111–1122.

Published In

Proceedings of the VLDB Endowment

DOI

EISSN

2150-8097

Publication Date

January 1, 2011

Volume

4

Issue

11

Start / End Page

1111 / 1122

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics