Scholars@Duke publication: TPrune: Efficient Transformer Pruning for Mobile Devices

TPrune: Efficient Transformer Pruning for Mobile Devices

Publication , Journal Article

Mao, J; Yang, H; Li, A; Li, H; Chen, Y

Published in: ACM Transactions on Cyber-Physical Systems

July 1, 2021

The invention of Transformer model structure boosts the performance of Neural Machine Translation (NMT) tasks to an unprecedented level. Many previous works have been done to make the Transformer model more execution-friendly on resource-constrained platforms. These researches can be categorized into three key fields: Model Pruning, Transfer Learning, and Efficient Transformer Variants. The family of model pruning methods are popular for their simplicity in practice and promising compression rate and have achieved great success in the field of convolution neural networks (CNNs) for many vision tasks. Nonetheless, previous Transformer pruning works did not perform a thorough model analysis and evaluation on each Transformer component on off-the-shelf mobile devices. In this work, we analyze and prune transformer models at the line-wise granularity and also implement our pruning method on real mobile platforms. We explore the properties of all Transformer components as well as their sparsity features, which are leveraged to guide Transformer model pruning. We name our whole Transformer analysis and pruning pipeline as TPrune. In TPrune, we first propose Block-wise Structured Sparsity Learning (BSSL) to analyze Transformer model property. Then, based on the characters derived from BSSL, we apply Structured Hoyer Square (SHS) to derive the final pruned models. Comparing with the state-of-the-art Transformer pruning methods, TPrune is able to achieve a higher model compression rate with less performance degradation. Experimental results show that our pruned models achieve 1.16×-1.92× speedup on mobile devices with 0%-8% BLEU score degradation compared with the original Transformer model.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Author Yiran Chen Electrical and Computer Engineering

Altmetric Attention Stats

Dimensions Citation Stats

Published In

ACM Transactions on Cyber-Physical Systems

DOI

10.1145/3446640

EISSN

2378-9638

ISSN

2378-962X

Publication Date

July 1, 2021

Volume

Issue

Citation

APA

Chicago

ICMJE

MLA

NLM

Mao, J., Yang, H., Li, A., Li, H., & Chen, Y. (2021). TPrune: Efficient Transformer Pruning for Mobile Devices. ACM Transactions on Cyber-Physical Systems, 5(3). https://doi.org/10.1145/3446640

Mao, J., H. Yang, A. Li, H. Li, and Y. Chen. “TPrune: Efficient Transformer Pruning for Mobile Devices.” ACM Transactions on Cyber-Physical Systems 5, no. 3 (July 1, 2021). https://doi.org/10.1145/3446640.

Mao J, Yang H, Li A, Li H, Chen Y. TPrune: Efficient Transformer Pruning for Mobile Devices. ACM Transactions on Cyber-Physical Systems. 2021 Jul 1;5(3).

Mao, J., et al. “TPrune: Efficient Transformer Pruning for Mobile Devices.” ACM Transactions on Cyber-Physical Systems, vol. 5, no. 3, July 2021. Scopus, doi:10.1145/3446640.

Mao J, Yang H, Li A, Li H, Chen Y. TPrune: Efficient Transformer Pruning for Mobile Devices. ACM Transactions on Cyber-Physical Systems. 2021 Jul 1;5(3).

Published In

ACM Transactions on Cyber-Physical Systems

DOI

10.1145/3446640

EISSN

2378-9638

ISSN

2378-962X

Publication Date

July 1, 2021