HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing
A typical SmartNIC (SNIC) integrates a processor comprising Arm CPU and accelerators with a conventional NIC. The processor is designed to energy-efficiently execute network functions frequently used by datacenter applications. With such a processor, the SNIC has promised to notably improve the system-wide energy efficiency of datacenter servers. Nevertheless, the latest trend of integrating accelerators into server CPUs for these functions sparks a question on the SNIC processor's superiority over a host processor (i.e., server CPU with accelerators) in system-wide energy efficiency, especially under given tail latency constraints. Answering this question, we first take an Intel Xeon processor, integrated with various accelerators (e.g., QuickAssist Technology), as a host processor, and then compare it to an NVIDIA BlueField-2 SNIC processor. This uncovers that (1) the host accelerator, coupled with a more powerful memory subsystem, can outperform the SNIC accelerator, and (2) the SNIC processor can improve system-wide energy efficiency only at low packet rates for most functions under tail latency constraints. To provide high system-wide energy efficiency without compromising tail latency at any packet rates, we propose HAL, consisting of a hardware-based load balancer and an intelligent load balancing policy implemented inside the SNIC. When HAL determines that the SNIC processor cannot efficiently process a given function beyond a specific packet rate, it limits the rate of packets to the SNIC processor and lets the host processor handle the excess. We implement a HAL-enabled SNIC with a commodity FPGA and a BlueField-2 SNIC, plug it into a commodity server, and run 10 popular network functions. Our evaluation shows that HAL can improve the system-wide energy efficiency and throughput of the server running these functions by 31% and 10%, respectively, without notably increasing the tail latency.