Scholars@Duke publication: GPU data access on complex geometries for D3Q19 lattice boltzmann method

GPU data access on complex geometries for D3Q19 lattice boltzmann method

Publication , Conference

Herschlag, G; Lee, S; Vetter, JS; Randles, A

Published in: Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018

August 3, 2018

GPU performance of the lattice Boltzmann method (LBM) depends heavily on memory access patterns. When LBM is advanced with GPUS on complex computational domains, geometric data is typically accessed indirectly, and lattice data is typically accessed lexicographically in the Structure of Array (SoA) layout. Although there are a variety of existing access patterns beyond the typical choices, no study has yet examined the relative efficacy between them. Here, we compare a suite of memory access schemes via empirical testing and performance modeling. We find strong evidence that semi-direct addressing is the superior addressing scheme for the majority of cases examined: Semi-direct addressing increases computational speed and often reduces memory consumption. For lattice layout, we find that the Collected Structure of Arrays (CSoA) layout outperforms the SoA layout. When compared to state-of-The-Art practices, our recommended addressing modifications lead to performance gains between 10-40% across different complex geometries, fluid volume fractions, and resolutions. The modifications also lead to a decrease in memory consumption by as much as 17%. Having discovered these improvements, we examine a highly resolved arterial geometry on a leadership class system. On this system we present the first near-optimal strong results for LBM with arterial geometries run on GPUS. We also demonstrate that the above recommendations remain valid for large scale, many device simulations, which leads to an increased computational speed and average memory usage reductions. To understand these observations, we employ performance modeling which reveals that semi-direct methods outperform indirect methods due to a reduced number of total loads/stores in memory, and that CSoA outperforms SoA and bundling due to improved caching behavior.

Duke Scholars

Author Gregory Joseph Herschlag Mathematics

Author Amanda Randles Biomedical Engineering

Published In

Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018

DOI

10.1109/IPDPS.2018.00092

Publication Date

August 3, 2018

Start / End Page

825 / 834

Citation

APA

Chicago

ICMJE

MLA

NLM

Herschlag, G., Lee, S., Vetter, J. S., & Randles, A. (2018). GPU data access on complex geometries for D3Q19 lattice boltzmann method. In Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018 (pp. 825–834). https://doi.org/10.1109/IPDPS.2018.00092

Herschlag, G., S. Lee, J. S. Vetter, and A. Randles. “GPU data access on complex geometries for D3Q19 lattice boltzmann method.” In Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018, 825–34, 2018. https://doi.org/10.1109/IPDPS.2018.00092.

Herschlag G, Lee S, Vetter JS, Randles A. GPU data access on complex geometries for D3Q19 lattice boltzmann method. In: Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018. 2018. p. 825–34.

Herschlag, G., et al. “GPU data access on complex geometries for D3Q19 lattice boltzmann method.” Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018, 2018, pp. 825–34. Scopus, doi:10.1109/IPDPS.2018.00092.

Herschlag G, Lee S, Vetter JS, Randles A. GPU data access on complex geometries for D3Q19 lattice boltzmann method. Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018. 2018. p. 825–834.

Published In

Proceedings 2018 IEEE 32nd International Parallel and Distributed Processing Symposium IPDPS 2018

DOI

10.1109/IPDPS.2018.00092

Publication Date

August 3, 2018

Start / End Page

825 / 834