Skip to main content

Stubby: A transformation-based optimizer for MapReduce Workflows

Publication ,  Journal Article
Lim, H; Herodotou, H; Babu, S
Published in: Proceedings of the VLDB Endowment
January 1, 2012

There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces-ranging from program-based to query-based interfaces-for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large between optimized and unoptimized workflows. However, automatic cost-based optimization of MapReduce workflows remains a challenge due to the multitude of interfaces, large size of the execution plan space, and the frequent unavailability of all types of information needed for optimization. We introduce a comprehensive plan space for MapReduce workflows generated by popular workflow generators. We then propose Stubby, a cost-based optimizer that searches selectively through the subspace of the full plan space that can be enumerated correctly and costed based on the information available in any given setting. Stubby enumerates the plan space based on plan-to-plan transformations and an efficient search algorithm. Stubby is designed to be extensible to new interfaces and new types of optimizations, which is a desirable feature given how rapidly MapReduce systems are evolving. Stubby's efficiency and effectiveness have been evaluated using representative workflows from many domains. © 2012 VLDB Endowment.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proceedings of the VLDB Endowment

DOI

EISSN

2150-8097

Publication Date

January 1, 2012

Volume

5

Issue

11

Start / End Page

1196 / 1207

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lim, H., Herodotou, H., & Babu, S. (2012). Stubby: A transformation-based optimizer for MapReduce Workflows. Proceedings of the VLDB Endowment, 5(11), 1196–1207. https://doi.org/10.14778/2350229.2350239
Lim, H., H. Herodotou, and S. Babu. “Stubby: A transformation-based optimizer for MapReduce Workflows.” Proceedings of the VLDB Endowment 5, no. 11 (January 1, 2012): 1196–1207. https://doi.org/10.14778/2350229.2350239.
Lim H, Herodotou H, Babu S. Stubby: A transformation-based optimizer for MapReduce Workflows. Proceedings of the VLDB Endowment. 2012 Jan 1;5(11):1196–207.
Lim, H., et al. “Stubby: A transformation-based optimizer for MapReduce Workflows.” Proceedings of the VLDB Endowment, vol. 5, no. 11, Jan. 2012, pp. 1196–207. Scopus, doi:10.14778/2350229.2350239.
Lim H, Herodotou H, Babu S. Stubby: A transformation-based optimizer for MapReduce Workflows. Proceedings of the VLDB Endowment. 2012 Jan 1;5(11):1196–1207.

Published In

Proceedings of the VLDB Endowment

DOI

EISSN

2150-8097

Publication Date

January 1, 2012

Volume

5

Issue

11

Start / End Page

1196 / 1207

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics