Scholars@Duke publication: ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration

ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration

Publication , Conference

Yang, X; Yan, B; Li, H; Chen, Y

Published in: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

November 2, 2020

Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Author Yiran Chen Electrical and Computer Engineering

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

10.1145/3400302.3415640

ISSN

1092-3152

Publication Date

November 2, 2020

Volume

2020-November

Citation

APA

Chicago

ICMJE

MLA

NLM

Yang, X., Yan, B., Li, H., & Chen, Y. (2020). ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad (Vol. 2020-November). https://doi.org/10.1145/3400302.3415640

Yang, X., B. Yan, H. Li, and Y. Chen. “ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration.” In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, Vol. 2020-November, 2020. https://doi.org/10.1145/3400302.3415640.

Yang X, Yan B, Li H, Chen Y. ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. In: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2020.

Yang, X., et al. “ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration.” IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, vol. 2020-November, 2020. Scopus, doi:10.1145/3400302.3415640.

Yang X, Yan B, Li H, Chen Y. ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2020.

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

10.1145/3400302.3415640

ISSN

1092-3152

Publication Date

November 2, 2020

Volume

2020-November