Designing a GPU-Accelerated Communication Layer for Efficient Fluid-Structure Interaction Computations on Heterogeneous Systems
As biological research demands simulations with increasingly larger cell counts, optimizing these models for largescale deployment on heterogeneous supercomputing resources becomes crucial. This requires the redesign of fluid-structure interaction tasks written around distributed data structures built for CPU-based systems, where design flexibility and overall memory footprint are key considerations, to instead be performant on CPU-GPU machines. This paper describes the trade-offs of offloading communication tasks to the GPUs and the corresponding changes to the underlying data structures required, along with new algorithms that significantly reduce time-to-solution. At scale performance of our GPU implementation is evaluated on the Polaris and Frontier leadership systems. Real-world workloads involving millions of deformable cells are evaluated. We analyze the competing factors that come into play when designing a communication layer for a fluid-structure interaction code, including code efficiency, complexity, and GPU memory demands, and offer advice to other high performance computing applications facing similar decisions.