Skip to main content

Finding connected components in map-reduce in logarithmic rounds

Publication ,  Journal Article
Rastogi, V; Machanavajjhala, A; Chitnis, L; Das Sarma, A
Published in: Proceedings - International Conference on Data Engineering
August 15, 2013

Given a large graph G = (V, E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of map-reduce rounds and the communication of each round. Denoting d the diameter of the graph, and n the number of nodes in the largest component, all prior techniques for map-reduce either require a linear, Θ(d), number of rounds, or a quadratic, Θ (n|V| + |E|), communication per round. We propose here two efficient map-reduce algorithms: (i) Hash-Greater-to-Min, which is a randomized algorithm based on PRAM techniques, requiring O(log n) rounds and O(|V | + |E|) communication per round, and (ii) Hash-to-Min, which is a novel algorithm, provably finishing in O(log n) iterations for path graphs. The proof technique used for Hash-to-Min is novel, but not tight, and it is actually faster than Hash-Greater-to-Min in practice. We conjecture that it requires 2 log d rounds and 3(|V| + |E|) communication per round, as demonstrated in our experiments. Using secondary sorting, a standard map-reduce feature, we scale Hash-to-Min to graphs with very large connected components. Our techniques for connected components can be applied to clustering as well. We propose a novel algorithm for agglomerative single linkage clustering in map-reduce. This is the first map-reduce algorithm for clustering in at most O(log n) rounds, where n is the size of the largest cluster. We show the effectiveness of all our algorithms through detailed experiments on large synthetic as well as real-world datasets. © 2013 IEEE.

Duke Scholars

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

Publication Date

August 15, 2013

Start / End Page

50 / 61
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Rastogi, V., Machanavajjhala, A., Chitnis, L., & Das Sarma, A. (2013). Finding connected components in map-reduce in logarithmic rounds. Proceedings - International Conference on Data Engineering, 50–61. https://doi.org/10.1109/ICDE.2013.6544813
Rastogi, V., A. Machanavajjhala, L. Chitnis, and A. Das Sarma. “Finding connected components in map-reduce in logarithmic rounds.” Proceedings - International Conference on Data Engineering, August 15, 2013, 50–61. https://doi.org/10.1109/ICDE.2013.6544813.
Rastogi V, Machanavajjhala A, Chitnis L, Das Sarma A. Finding connected components in map-reduce in logarithmic rounds. Proceedings - International Conference on Data Engineering. 2013 Aug 15;50–61.
Rastogi, V., et al. “Finding connected components in map-reduce in logarithmic rounds.” Proceedings - International Conference on Data Engineering, Aug. 2013, pp. 50–61. Scopus, doi:10.1109/ICDE.2013.6544813.
Rastogi V, Machanavajjhala A, Chitnis L, Das Sarma A. Finding connected components in map-reduce in logarithmic rounds. Proceedings - International Conference on Data Engineering. 2013 Aug 15;50–61.

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

Publication Date

August 15, 2013

Start / End Page

50 / 61