Learning Efficient Sparse Structures in Speech Recognition
Recurrent neural networks (RNNs), especially long short-term memories (LSTMs) have been widely used in speech recognition and natural language processing. As the sizes of RNN models grow for better performance, the computation cost and therefore the required hardware resource increase rapidly. We propose an efficient structural sparsity (ESS) learning method for acoustic modeling in speech recognition. ESS aims to generate a model that offers higher execution efficiency while maintaining the accuracy. A three-step training pipeline is developed in our work. First, we apply the group Lasso regularization method during training process and learn a structural sparse model from scratch. Then the learned sparse structures will be fixed and cannot be changed. Finally, we retrain the model and update the nonzero parameters in the model. We applied our ESS method on classic HMM+LSTM model on Kaldi toolkit. The experimental results show that ESS can remove 72.5% weight groups in the weight matrices when slightly increasing the word error rate (WER) 1.1%.