Profiling, what-if analysis, and costbased optimization of mapreduce programs
MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby. We introduce, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. We focus on the optimization opportunities presented by the large space of configuration parameters for these programs. We also introduce a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. All components have been prototyped for the popular Hadoop MapReduce system. The effectiveness of each component is demonstrated through a comprehensive evaluation using representative MapReduce programs from various application domains. © 2011 VLDB Endowment.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 4605 Data management and data science
- 0807 Library and Information Studies
- 0806 Information Systems
- 0802 Computation Theory and Mathematics
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 4605 Data management and data science
- 0807 Library and Information Studies
- 0806 Information Systems
- 0802 Computation Theory and Mathematics