Skip to main content

Mapreduce programming and costbased optimization? Crossing this chasm with starfish

Publication ,  Journal Article
Herodotou, H; Dong, F; Babu, S
Published in: Proceedings of the VLDB Endowment
August 1, 2011

MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby. Starfish is a self-tuning system for big data analytics that includes, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. Starfish also includes a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. This demonstration will present the profiling, whatif analysis, and cost-based optimization of MapReduce programs in Starfish. We will show how (nonexpert) users can employ the Starfish Visualizer to (a) get a deep understanding of a MapReduce program's behavior during execution, (b) ask hypothetical questions on how the program's behavior will change when parameter settings, cluster resources, or input data properties change, and (c) ultimately optimize the program. © 2011 VLDB Endowment.

Duke Scholars

Published In

Proceedings of the VLDB Endowment

EISSN

2150-8097

Publication Date

August 1, 2011

Volume

4

Issue

12

Start / End Page

1446 / 1449

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Herodotou, H., Dong, F., & Babu, S. (2011). Mapreduce programming and costbased optimization? Crossing this chasm with starfish. Proceedings of the VLDB Endowment, 4(12), 1446–1449.
Herodotou, H., F. Dong, and S. Babu. “Mapreduce programming and costbased optimization? Crossing this chasm with starfish.” Proceedings of the VLDB Endowment 4, no. 12 (August 1, 2011): 1446–49.
Herodotou H, Dong F, Babu S. Mapreduce programming and costbased optimization? Crossing this chasm with starfish. Proceedings of the VLDB Endowment. 2011 Aug 1;4(12):1446–9.
Herodotou, H., et al. “Mapreduce programming and costbased optimization? Crossing this chasm with starfish.” Proceedings of the VLDB Endowment, vol. 4, no. 12, Aug. 2011, pp. 1446–49.
Herodotou H, Dong F, Babu S. Mapreduce programming and costbased optimization? Crossing this chasm with starfish. Proceedings of the VLDB Endowment. 2011 Aug 1;4(12):1446–1449.

Published In

Proceedings of the VLDB Endowment

EISSN

2150-8097

Publication Date

August 1, 2011

Volume

4

Issue

12

Start / End Page

1446 / 1449

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics