Skip to main content

Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture

Publication ,  Conference
Zheng, Q; Li, S; Wang, Y; Li, Z; Chen, Y; Li, HH
Published in: Proceedings - Design Automation Conference
January 1, 2023

Attention-based neural networks have shown superior performance in a wide range of tasks. Non-volatile processing-in-memory (NVPIM) architecture shows its great potential to accelerate the dense attention model. However, the unique unstructured and dynamic sparsity pattern in the sparse attention model challenges the mapping efficiency of the NVPIM architecture, as the conventional NVPIM architecture uses a vector-matrix-multiplication primitives. In this paper, we propose a NVPIM architecture to accelerate a dynamic and unstructured sparse computation in the sparse attention. We aim to improve the mapping efficiency for both SDDMM and SpMM by introducing two vector-based primitives with a reconfigurable NVPIM bank. Further, based on our reconfigurable NVPIM bank, we further propose a hybrid stationary data flow to hide the latency. Our evaluation result shows that, over previous NVPIM accelerators, our design could deliver up to 12.36× performance improvement and 3.4× energy efficiency improvement on a range of vision and language tasks.

Duke Scholars

Published In

Proceedings - Design Automation Conference

DOI

ISSN

0738-100X

Publication Date

January 1, 2023

Volume

2023-July
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zheng, Q., Li, S., Wang, Y., Li, Z., Chen, Y., & Li, H. H. (2023). Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture. In Proceedings - Design Automation Conference (Vol. 2023-July). https://doi.org/10.1109/DAC56929.2023.10247908
Zheng, Q., S. Li, Y. Wang, Z. Li, Y. Chen, and H. H. Li. “Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture.” In Proceedings - Design Automation Conference, Vol. 2023-July, 2023. https://doi.org/10.1109/DAC56929.2023.10247908.
Zheng Q, Li S, Wang Y, Li Z, Chen Y, Li HH. Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture. In: Proceedings - Design Automation Conference. 2023.
Zheng, Q., et al. “Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture.” Proceedings - Design Automation Conference, vol. 2023-July, 2023. Scopus, doi:10.1109/DAC56929.2023.10247908.
Zheng Q, Li S, Wang Y, Li Z, Chen Y, Li HH. Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture. Proceedings - Design Automation Conference. 2023.

Published In

Proceedings - Design Automation Conference

DOI

ISSN

0738-100X

Publication Date

January 1, 2023

Volume

2023-July