Skip to main content

Learning Efficient Sparse Structures in Speech Recognition

Publication ,  Conference
Zhang, J; Wen, W; Deisher, M; Cheng, HP; Li, H; Chen, Y
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
May 1, 2019

Recurrent neural networks (RNNs), especially long short-term memories (LSTMs) have been widely used in speech recognition and natural language processing. As the sizes of RNN models grow for better performance, the computation cost and therefore the required hardware resource increase rapidly. We propose an efficient structural sparsity (ESS) learning method for acoustic modeling in speech recognition. ESS aims to generate a model that offers higher execution efficiency while maintaining the accuracy. A three-step training pipeline is developed in our work. First, we apply the group Lasso regularization method during training process and learn a structural sparse model from scratch. Then the learned sparse structures will be fixed and cannot be changed. Finally, we retrain the model and update the nonzero parameters in the model. We applied our ESS method on classic HMM+LSTM model on Kaldi toolkit. The experimental results show that ESS can remove 72.5% weight groups in the weight matrices when slightly increasing the word error rate (WER) 1.1%.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

May 1, 2019

Volume

2019-May

Start / End Page

2717 / 2721
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zhang, J., Wen, W., Deisher, M., Cheng, H. P., Li, H., & Chen, Y. (2019). Learning Efficient Sparse Structures in Speech Recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2019-May, pp. 2717–2721). https://doi.org/10.1109/ICASSP.2019.8683620
Zhang, J., W. Wen, M. Deisher, H. P. Cheng, H. Li, and Y. Chen. “Learning Efficient Sparse Structures in Speech Recognition.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2019-May:2717–21, 2019. https://doi.org/10.1109/ICASSP.2019.8683620.
Zhang J, Wen W, Deisher M, Cheng HP, Li H, Chen Y. Learning Efficient Sparse Structures in Speech Recognition. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2019. p. 2717–21.
Zhang, J., et al. “Learning Efficient Sparse Structures in Speech Recognition.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, 2019, pp. 2717–21. Scopus, doi:10.1109/ICASSP.2019.8683620.
Zhang J, Wen W, Deisher M, Cheng HP, Li H, Chen Y. Learning Efficient Sparse Structures in Speech Recognition. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2019. p. 2717–2721.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

May 1, 2019

Volume

2019-May

Start / End Page

2717 / 2721