Learning Efficient Sparse Structures in Speech Recognition

Conference Paper

Recurrent neural networks (RNNs), especially long short-term memories (LSTMs) have been widely used in speech recognition and natural language processing. As the sizes of RNN models grow for better performance, the computation cost and therefore the required hardware resource increase rapidly. We propose an efficient structural sparsity (ESS) learning method for acoustic modeling in speech recognition. ESS aims to generate a model that offers higher execution efficiency while maintaining the accuracy. A three-step training pipeline is developed in our work. First, we apply the group Lasso regularization method during training process and learn a structural sparse model from scratch. Then the learned sparse structures will be fixed and cannot be changed. Finally, we retrain the model and update the nonzero parameters in the model. We applied our ESS method on classic HMM+LSTM model on Kaldi toolkit. The experimental results show that ESS can remove 72.5% weight groups in the weight matrices when slightly increasing the word error rate (WER) 1.1%.

Full Text

Duke Authors

Cited Authors

  • Zhang, J; Wen, W; Deisher, M; Cheng, HP; Li, H; Chen, Y

Published Date

  • May 1, 2019

Published In

Volume / Issue

  • 2019-May /

Start / End Page

  • 2717 - 2721

International Standard Serial Number (ISSN)

  • 1520-6149

International Standard Book Number 13 (ISBN-13)

  • 9781479981311

Digital Object Identifier (DOI)

  • 10.1109/ICASSP.2019.8683620

Citation Source

  • Scopus