Scholars@Duke publication: Ouroboros: On accelerating training of transformer-based language models

Ouroboros: On accelerating training of transformer-based language models

Publication , Conference

Yang, Q; Huo, Z; Wang, W; Huang, H; Carin, L

Published in: Advances in Neural Information Processing Systems

January 1, 2019

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy. Code to reproduce experiments is to be found at https://github.com/LaraQianYang/Ouroboros.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2019

Volume

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology

Citation

APA

Chicago

ICMJE

MLA

NLM

Yang, Q., Huo, Z., Wang, W., Huang, H., & Carin, L. (2019). Ouroboros: On accelerating training of transformer-based language models. In Advances in Neural Information Processing Systems (Vol. 32).

Yang, Q., Z. Huo, W. Wang, H. Huang, and L. Carin. “Ouroboros: On accelerating training of transformer-based language models.” In Advances in Neural Information Processing Systems, Vol. 32, 2019.

Yang Q, Huo Z, Wang W, Huang H, Carin L. Ouroboros: On accelerating training of transformer-based language models. In: Advances in Neural Information Processing Systems. 2019.

Yang, Q., et al. “Ouroboros: On accelerating training of transformer-based language models.” Advances in Neural Information Processing Systems, vol. 32, 2019.

Yang Q, Huo Z, Wang W, Huang H, Carin L. Ouroboros: On accelerating training of transformer-based language models. Advances in Neural Information Processing Systems. 2019.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2019

Volume

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology