Scholars@Duke publication: Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture

Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture

Publication , Conference

Zheng, Q; Li, S; Wang, Y; Li, Z; Chen, Y; Li, HH

Published in: Proceedings Design Automation Conference

January 1, 2023

Attention-based neural networks have shown superior performance in a wide range of tasks. Non-volatile processing-in-memory (NVPIM) architecture shows its great potential to accelerate the dense attention model. However, the unique unstructured and dynamic sparsity pattern in the sparse attention model challenges the mapping efficiency of the NVPIM architecture, as the conventional NVPIM architecture uses a vector-matrix-multiplication primitives. In this paper, we propose a NVPIM architecture to accelerate a dynamic and unstructured sparse computation in the sparse attention. We aim to improve the mapping efficiency for both SDDMM and SpMM by introducing two vector-based primitives with a reconfigurable NVPIM bank. Further, based on our reconfigurable NVPIM bank, we further propose a hybrid stationary data flow to hide the latency. Our evaluation result shows that, over previous NVPIM accelerators, our design could deliver up to 12.36× performance improvement and 3.4× energy efficiency improvement on a range of vision and language tasks.

Duke Scholars

Author Hai "Helen" Li Pierre R. Lamond Department of Electrical and Computer Engin ...

Author Yiran Chen Pierre R. Lamond Department of Electrical and Computer Engin ...

Published In

Proceedings Design Automation Conference

DOI

10.1109/DAC56929.2023.10247908

ISSN

0738-100X

Publication Date

January 1, 2023

Volume

2023-July

Citation

APA

Chicago

ICMJE

MLA

NLM

Zheng, Q., Li, S., Wang, Y., Li, Z., Chen, Y., & Li, H. H. (2023). Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture. In Proceedings Design Automation Conference (Vol. 2023-July). https://doi.org/10.1109/DAC56929.2023.10247908

Zheng, Q., S. Li, Y. Wang, Z. Li, Y. Chen, and H. H. Li. “Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture.” In Proceedings Design Automation Conference, Vol. 2023-July, 2023. https://doi.org/10.1109/DAC56929.2023.10247908.

Zheng Q, Li S, Wang Y, Li Z, Chen Y, Li HH. Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture. In: Proceedings Design Automation Conference. 2023.

Zheng, Q., et al. “Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture.” Proceedings Design Automation Conference, vol. 2023-July, 2023. Scopus, doi:10.1109/DAC56929.2023.10247908.

Zheng Q, Li S, Wang Y, Li Z, Chen Y, Li HH. Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture. Proceedings Design Automation Conference. 2023.

Published In

Proceedings Design Automation Conference

DOI

10.1109/DAC56929.2023.10247908

ISSN

0738-100X

Publication Date

January 1, 2023

Volume

2023-July