Skip to main content

ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration

Publication ,  Conference
Yang, X; Yan, B; Li, H; Chen, Y
Published in: IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
November 2, 2020

Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD

DOI

ISSN

1092-3152

Publication Date

November 2, 2020

Volume

2020-November
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Yang, X., Yan, B., Li, H., & Chen, Y. (2020). ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. In IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD (Vol. 2020-November). https://doi.org/10.1145/3400302.3415640
Yang, X., B. Yan, H. Li, and Y. Chen. “ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration.” In IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, Vol. 2020-November, 2020. https://doi.org/10.1145/3400302.3415640.
Yang X, Yan B, Li H, Chen Y. ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. In: IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD. 2020.
Yang, X., et al. “ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration.” IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, vol. 2020-November, 2020. Scopus, doi:10.1145/3400302.3415640.
Yang X, Yan B, Li H, Chen Y. ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration. IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD. 2020.

Published In

IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD

DOI

ISSN

1092-3152

Publication Date

November 2, 2020

Volume

2020-November