Scholars@Duke publication: Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm

Publication , Journal Article

Herschlag, G; Lee, S; Vetter, JS; Randles, A

Published in: IEEE Transactions on Parallel and Distributed Systems

October 1, 2021

GPU performance of the lattice Boltzmann method (LBM) depends heavily on memory access patterns. When implemented with GPUs on complex domains, typically, geometric data is accessed indirectly and lattice data is accessed lexicographically. Although there are a variety of other options, no study has examined the relative efficacy between them. Here, we examine a suite of memory access schemes via empirical testing and performance modeling. We find strong evidence that semi-direct is often better suited than the more common indirect addressing, providing increased computational speed and reducing memory consumption. For the layout, we find that the Collected Structure of Arrays (CSoA) and bundling layouts outperform the common Structure of Array layout; on V100 and P100 devices, CSoA consistently outperforms bundling, however the relationship is more complicated on K40 devices. When compared to state-of-the-art practices, our recommendations lead to speedups of 10-40 percent and reduce memory consumption up to 17 percent. Using performance modeling and computational experimentation, we determine the mechanisms behind the accelerations. We demonstrate that our results hold across multiple GPUs on two leadership class systems, and present the first near-optimal strong results for LBM with arterial geometries run on GPUs.

Duke Scholars

Author Gregory Joseph Herschlag Mathematics

Author Amanda Randles Biomedical Engineering

Published In

IEEE Transactions on Parallel and Distributed Systems

DOI

10.1109/TPDS.2021.3061895

EISSN

1558-2183

ISSN

1045-9219

Publication Date

October 1, 2021

Volume

Issue

Start / End Page

2400 / 2414

Related Subject Headings

Distributed Computing
4606 Distributed computing and systems software
1005 Communications Technologies
0805 Distributed Computing
0803 Computer Software

Citation

APA

Chicago

ICMJE

MLA

NLM

Herschlag, G., Lee, S., Vetter, J. S., & Randles, A. (2021). Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm. IEEE Transactions on Parallel and Distributed Systems, 32(10), 2400–2414. https://doi.org/10.1109/TPDS.2021.3061895

Herschlag, G., S. Lee, J. S. Vetter, and A. Randles. “Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm.” IEEE Transactions on Parallel and Distributed Systems 32, no. 10 (October 1, 2021): 2400–2414. https://doi.org/10.1109/TPDS.2021.3061895.

Herschlag G, Lee S, Vetter JS, Randles A. Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm. IEEE Transactions on Parallel and Distributed Systems. 2021 Oct 1;32(10):2400–14.

Herschlag, G., et al. “Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm.” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 10, Oct. 2021, pp. 2400–14. Scopus, doi:10.1109/TPDS.2021.3061895.

Published In

IEEE Transactions on Parallel and Distributed Systems

DOI

10.1109/TPDS.2021.3061895

EISSN

1558-2183

ISSN

1045-9219

Publication Date

October 1, 2021

Volume

Issue

Start / End Page

2400 / 2414

Related Subject Headings

Distributed Computing
4606 Distributed computing and systems software
1005 Communications Technologies
0805 Distributed Computing
0803 Computer Software