Scholars@Duke publication: Scalable Bayesian learning of recurrent neural networks for language modeling

Scalable Bayesian learning of recurrent neural networks for language modeling

Publication , Conference

Gan, Z; Li, C; Chen, C; Pu, Y; Su, Q; Carin, L

Published in: Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers

January 1, 2017

Published version (DOI)

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach relative to stochastic optimization.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers

DOI

10.18653/v1/P17-1030

Publication Date

January 1, 2017

Volume

Start / End Page

321 / 331

Citation

APA

Chicago

ICMJE

MLA

NLM

Gan, Z., Li, C., Chen, C., Pu, Y., Su, Q., & Carin, L. (2017). Scalable Bayesian learning of recurrent neural networks for language modeling. In Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers (Vol. 1, pp. 321–331). https://doi.org/10.18653/v1/P17-1030

Gan, Z., C. Li, C. Chen, Y. Pu, Q. Su, and L. Carin. “Scalable Bayesian learning of recurrent neural networks for language modeling.” In Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers, 1:321–31, 2017. https://doi.org/10.18653/v1/P17-1030.

Gan Z, Li C, Chen C, Pu Y, Su Q, Carin L. Scalable Bayesian learning of recurrent neural networks for language modeling. In: Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers. 2017. p. 321–31.

Gan, Z., et al. “Scalable Bayesian learning of recurrent neural networks for language modeling.” Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers, vol. 1, 2017, pp. 321–31. Scopus, doi:10.18653/v1/P17-1030.

Gan Z, Li C, Chen C, Pu Y, Su Q, Carin L. Scalable Bayesian learning of recurrent neural networks for language modeling. Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers. 2017. p. 321–331.

Published In

Acl 2017 55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Long Papers

DOI

10.18653/v1/P17-1030

Publication Date

January 1, 2017

Volume

Start / End Page

321 / 331