Provisioning and evaluating multi-domain networked clouds for Hadoop-based applications


Journal Article

This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created "on-demand" and are composed of virtual machines from multiple cloud sites linked with bandwidthprovisioned network pipes. The prototype uses an existing federated cloud control framework called Open Resource Control Architecture (ORCA), which orchestrates the leasing and configuration of virtual infrastructure from multiple autonomous cloud sites and network providers. ORCA enables computational and network resources from multiple clouds and network substrates to be aggregated into a single virtual "slice" of resources, built to order for the needs of the application. The experiments examine various provisioning alternatives by evaluating the performance of representative Hadoop benchmarks and applications on resource topologies with varying bandwidths. The evaluations examine conditions in which multi-cloud Hadoop deployments pose significant advantages or disadvantages during Map/Reduce/Shuffle operations. Further, the experiments compare multi-cloud Hadoop deployments with single-cloud deployments and investigate Hadoop Distributed File System (HDFS) performance under varying network configurations. The results show that networked clouds make cross-cloud Hadoop deployment feasible with high bandwidth network links between clouds. As expected, performance for some benchmarks degrades rapidly with constrained inter-cloud bandwidth. MapReduce shuffle patterns and certain Hadoop Distributed File System (HDFS) operations that span the constrained links are particularly sensitive to network performance. Hadoop's topology-awareness feature can mitigate these penalties to a modest degree in these hybrid bandwidth scenarios. Additional observations show that contention among colocated virtual machines is a source of irregular performance for Hadoop applications on virtual cloud infrastructure. © 2011 IEEE.

Full Text

Duke Authors

Cited Authors

  • Mandal, A; Xin, Y; Baldine, I; Ruth, P; Heerman, C; Chase, J; Orlikowski, V; Yumerefendi, A

Published Date

  • December 1, 2011

Published In

  • Proceedings 2011 3rd Ieee International Conference on Cloud Computing Technology and Science, Cloudcom 2011

Start / End Page

  • 690 - 697

Digital Object Identifier (DOI)

  • 10.1109/CloudCom.2011.107

Citation Source

  • Scopus