Scholars@Duke publication: Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures

Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures

Publication , Conference

Mehrabi, A; Lee, D; Chatterjee, N; Sorin, DJ; Lee, BC; O'Connor, M

Published in: Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021

March 1, 2021

Published version (DOI)

Achieving peak performance on sparse operations is challenging. The distribution of the non-zero elements and underlying hardware platform affect the execution efficiency. Given the diversity in workloads and architectures, no unique solution always wins. In this paper, we improve SpMM efficiency on GPUS. We propose several simple, but effective, sparse data permutations on the CSR data structure. Picking the right permutation over 1,688 datasets improves performance by 1.4×, on average, compared to plain CSR and 2.6× against NVIDIA cuSPARSE. Furthermore, we propose a set of novel features to describe sparsity patterns and their interactions with the kernel and hardware. Using these features, we develop a predictor to select the best permutation for each matrix. Predicted permutations' average gain achieves 96% of oracle gains.

Duke Scholars

Author Daniel J. Sorin Electrical and Computer Engineering

Published In

Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021

DOI

10.1109/ISPASS51385.2021.00016

ISBN

9781728186436

Publication Date

March 1, 2021

Start / End Page

48 / 58

Citation

APA

Chicago

ICMJE

MLA

NLM

Mehrabi, A., Lee, D., Chatterjee, N., Sorin, D. J., Lee, B. C., & O’Connor, M. (2021). Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures. In Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021 (pp. 48–58). https://doi.org/10.1109/ISPASS51385.2021.00016

Mehrabi, A., D. Lee, N. Chatterjee, D. J. Sorin, B. C. Lee, and M. O’Connor. “Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures.” In Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021, 48–58, 2021. https://doi.org/10.1109/ISPASS51385.2021.00016.

Mehrabi A, Lee D, Chatterjee N, Sorin DJ, Lee BC, O’Connor M. Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures. In: Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021. 2021. p. 48–58.

Mehrabi, A., et al. “Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures.” Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021, 2021, pp. 48–58. Scopus, doi:10.1109/ISPASS51385.2021.00016.

Mehrabi A, Lee D, Chatterjee N, Sorin DJ, Lee BC, O’Connor M. Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures. Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021. 2021. p. 48–58.

Published In

Proceedings - 2021 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2021

DOI

10.1109/ISPASS51385.2021.00016

ISBN

9781728186436

Publication Date

March 1, 2021

Start / End Page

48 / 58