Scholars@Duke publication: Learning Efficient Sparse Structures in Speech Recognition

Learning Efficient Sparse Structures in Speech Recognition

Publication , Conference

Zhang, J; Wen, W; Deisher, M; Cheng, HP; Li, H; Chen, Y

Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

May 1, 2019

Recurrent neural networks (RNNs), especially long short-term memories (LSTMs) have been widely used in speech recognition and natural language processing. As the sizes of RNN models grow for better performance, the computation cost and therefore the required hardware resource increase rapidly. We propose an efficient structural sparsity (ESS) learning method for acoustic modeling in speech recognition. ESS aims to generate a model that offers higher execution efficiency while maintaining the accuracy. A three-step training pipeline is developed in our work. First, we apply the group Lasso regularization method during training process and learn a structural sparse model from scratch. Then the learned sparse structures will be fixed and cannot be changed. Finally, we retrain the model and update the nonzero parameters in the model. We applied our ESS method on classic HMM+LSTM model on Kaldi toolkit. The experimental results show that ESS can remove 72.5% weight groups in the weight matrices when slightly increasing the word error rate (WER) 1.1%.

Duke Scholars

Author Hai "Helen" Li Pierre R. Lamond Department of Electrical and Computer Engin ...

Author Yiran Chen Pierre R. Lamond Department of Electrical and Computer Engin ...

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2019.8683620

ISSN

1520-6149

Publication Date

May 1, 2019

Volume

2019-May

Start / End Page

2717 / 2721

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhang, J., Wen, W., Deisher, M., Cheng, H. P., Li, H., & Chen, Y. (2019). Learning Efficient Sparse Structures in Speech Recognition. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 2019-May, pp. 2717–2721). https://doi.org/10.1109/ICASSP.2019.8683620

Zhang, J., W. Wen, M. Deisher, H. P. Cheng, H. Li, and Y. Chen. “Learning Efficient Sparse Structures in Speech Recognition.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2019-May:2717–21, 2019. https://doi.org/10.1109/ICASSP.2019.8683620.

Zhang J, Wen W, Deisher M, Cheng HP, Li H, Chen Y. Learning Efficient Sparse Structures in Speech Recognition. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2019. p. 2717–21.

Zhang, J., et al. “Learning Efficient Sparse Structures in Speech Recognition.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2019-May, 2019, pp. 2717–21. Scopus, doi:10.1109/ICASSP.2019.8683620.

Zhang J, Wen W, Deisher M, Cheng HP, Li H, Chen Y. Learning Efficient Sparse Structures in Speech Recognition. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2019. p. 2717–2721.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2019.8683620

ISSN

1520-6149

Publication Date

May 1, 2019

Volume

2019-May

Start / End Page

2717 / 2721