Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks
Many data-intensive applications, such as distributed deep learning and data analytics, require moving vast amounts of data between compute servers in a distributed system. To meet the demands of these applications, datacenters are adopting Remote Direct Memory Access (RDMA), which has higher bandwidth and lower latency than traditional kernel-based networking. To ensure high performance of RDMA networks, congestion control manages queue depth on switches, and historically focused on moderating queue depth to ensure small flows complete quickly. Unfortunately, one side-effect of many common decisions is that large flows are starved of bandwidth. This negatively impacts the flow completion time (FCT) of large, bandwidth-bound flows, which are integral to the performance of data-intensive applications. The FCT is particularly impacted at the tail, which is increasingly critical for predictable application performance. We identify the root causes of the poor performance for long flows and measure the impact. We then design mechanisms that improve long flow FCT without compromising small flow performance. Our evaluations show that these improvements reduce 99.9% tail FCT of long flows by over 2x.