Skip to main content

No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics

Publication ,  Journal Article
Herodotou, H; Dong, F; Babu, S
Published in: Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011
November 30, 2011

Infrastructure-as-a-Service (IaaS) cloud platforms have brought two unprecedented changes to cluster provisioning practices. First, any (nonexpert) user can provision a cluster of any size on the cloud within minutes to run her data-processing jobs. The user can terminate the cluster once her jobs complete, and she needs to pay only for the resources used and duration of use. Second, cloud platforms enable users to bypass the traditional middleman-the system administrator-in the cluster-provisioning process. These changes give tremendous power to the user, but place a major burden on her shoulders. The user is now faced regularly with complex cluster sizing problems that involve finding the cluster size, the type of resources to use in the cluster from the large number of choices offered by current IaaS cloud platforms, and the job configurations that best meet the performance needs of her workload. In this paper, we introduce the Elastisizer, a system to which users can express cluster sizing problems as queries in a declarative fashion. The Elastisizer provides reliable answers to these queries using an automated technique that uses a mix of job profiling, estimation using black-box and white-box models, and simulation. We have prototyped the Elastisizer for the Hadoop MapReduce framework, and present a comprehensive evaluation that shows the benefits of the Elastisizer in common scenarios where cluster sizing problems arise. Copyright 2011 ACM.

Duke Scholars

Published In

Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011

DOI

Publication Date

November 30, 2011
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Herodotou, H., Dong, F., & Babu, S. (2011). No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011. https://doi.org/10.1145/2038916.2038934
Herodotou, H., F. Dong, and S. Babu. “No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics.” Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011, November 30, 2011. https://doi.org/10.1145/2038916.2038934.
Herodotou H, Dong F, Babu S. No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011. 2011 Nov 30;
Herodotou, H., et al. “No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics.” Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011, Nov. 2011. Scopus, doi:10.1145/2038916.2038934.
Herodotou H, Dong F, Babu S. No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011. 2011 Nov 30;

Published In

Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011

DOI

Publication Date

November 30, 2011