Skip to main content

ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration

Publication ,  Conference
Yang, X; Yan, B; Li, H; Chen, Y
Published in: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad
November 2, 2020

Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

ISSN

1092-3152

Publication Date

November 2, 2020

Volume

2020-November
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Yang, X., Yan, B., Li, H., & Chen, Y. (2020). ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad (Vol. 2020-November). https://doi.org/10.1145/3400302.3415640
Yang, X., B. Yan, H. Li, and Y. Chen. “ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration.” In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, Vol. 2020-November, 2020. https://doi.org/10.1145/3400302.3415640.
Yang X, Yan B, Li H, Chen Y. ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. In: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2020.
Yang, X., et al. “ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration.” IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, vol. 2020-November, 2020. Scopus, doi:10.1145/3400302.3415640.
Yang X, Yan B, Li H, Chen Y. ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2020.

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

ISSN

1092-3152

Publication Date

November 2, 2020

Volume

2020-November