Scholars@Duke publication: Cyclical annealing schedule: A simple approach to mitigating KL vanishing

Cyclical annealing schedule: A simple approach to mitigating KL vanishing

Publication , Conference

Fu, H; Li, C; Liu, X; Gao, J; Celikyilmaz, A; Carin, L

Published in: Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference

January 1, 2019

Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter β. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for β, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing β multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks, including language modeling, dialog response generation and unsupervised language pre-training.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference

Publication Date

January 1, 2019

Volume

Start / End Page

240 / 250

Citation

APA

Chicago

ICMJE

MLA

NLM

Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., & Carin, L. (2019). Cyclical annealing schedule: A simple approach to mitigating KL vanishing. In Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference (Vol. 1, pp. 240–250).

Fu, H., C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin. “Cyclical annealing schedule: A simple approach to mitigating KL vanishing.” In Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference, 1:240–50, 2019.

Fu H, Li C, Liu X, Gao J, Celikyilmaz A, Carin L. Cyclical annealing schedule: A simple approach to mitigating KL vanishing. In: Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference. 2019. p. 240–50.

Fu, H., et al. “Cyclical annealing schedule: A simple approach to mitigating KL vanishing.” Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference, vol. 1, 2019, pp. 240–50.

Fu H, Li C, Liu X, Gao J, Celikyilmaz A, Carin L. Cyclical annealing schedule: A simple approach to mitigating KL vanishing. Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference. 2019. p. 240–250.

Published In

Naacl Hlt 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Proceedings of the Conference