Skip to main content

Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks

Publication ,  Conference
Snyder, J; Lebeck, AR
Published in: Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
January 1, 2022

Many data-intensive applications, such as distributed deep learning and data analytics, require moving vast amounts of data between compute servers in a distributed system. To meet the demands of these applications, datacenters are adopting Remote Direct Memory Access (RDMA), which has higher bandwidth and lower latency than traditional kernel-based networking. To ensure high performance of RDMA networks, congestion control manages queue depth on switches, and historically focused on moderating queue depth to ensure small flows complete quickly. Unfortunately, one side-effect of many common decisions is that large flows are starved of bandwidth. This negatively impacts the flow completion time (FCT) of large, bandwidth-bound flows, which are integral to the performance of data-intensive applications. The FCT is particularly impacted at the tail, which is increasingly critical for predictable application performance. We identify the root causes of the poor performance for long flows and measure the impact. We then design mechanisms that improve long flow FCT without compromising small flow performance. Our evaluations show that these improvements reduce 99.9% tail FCT of long flows by over 2x.

Duke Scholars

Published In

Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022

DOI

Publication Date

January 1, 2022

Start / End Page

1007 / 1017
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Snyder, J., & Lebeck, A. R. (2022). Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks. In Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022 (pp. 1007–1017). https://doi.org/10.1109/IPDPS53621.2022.00102
Snyder, J., and A. R. Lebeck. “Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks.” In Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022, 1007–17, 2022. https://doi.org/10.1109/IPDPS53621.2022.00102.
Snyder J, Lebeck AR. Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks. In: Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022. 2022. p. 1007–17.
Snyder, J., and A. R. Lebeck. “Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks.” Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022, 2022, pp. 1007–17. Scopus, doi:10.1109/IPDPS53621.2022.00102.
Snyder J, Lebeck AR. Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks. Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022. 2022. p. 1007–1017.

Published In

Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022

DOI

Publication Date

January 1, 2022

Start / End Page

1007 / 1017