Skip to main content

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Publication ,  Conference
Li, Z; Zhuang, S; Guo, S; Zhuo, D; Zhang, H; Song, D; Stoica, I
Published in: Proceedings of Machine Learning Research
January 1, 2021

Model parallelism has become a necessity for training modern large-scale deep language models. In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single training sequence for Transformer-based language models thanks to its autoregressive property. This enables a more fine-grained pipeline compared with previous work. With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models. We develop a novel dynamic programming-based algorithm to calculate the optimal pipelining execution scheme given a specific model and cluster configuration. We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster with 48 p3.16xlarge instances compared with state-of-the-art model-parallel methods. The code for reproduction can be found at https://github.com/zhuohan123/terapipe.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2021

Volume

139

Start / End Page

6543 / 6552
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, Z., Zhuang, S., Guo, S., Zhuo, D., Zhang, H., Song, D., & Stoica, I. (2021). TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In Proceedings of Machine Learning Research (Vol. 139, pp. 6543–6552).
Li, Z., S. Zhuang, S. Guo, D. Zhuo, H. Zhang, D. Song, and I. Stoica. “TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.” In Proceedings of Machine Learning Research, 139:6543–52, 2021.
Li Z, Zhuang S, Guo S, Zhuo D, Zhang H, Song D, et al. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In: Proceedings of Machine Learning Research. 2021. p. 6543–52.
Li, Z., et al. “TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.” Proceedings of Machine Learning Research, vol. 139, 2021, pp. 6543–52.
Li Z, Zhuang S, Guo S, Zhuo D, Zhang H, Song D, Stoica I. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. Proceedings of Machine Learning Research. 2021. p. 6543–6552.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2021

Volume

139

Start / End Page

6543 / 6552