Skip to main content

Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM

Publication ,  Conference
Cai, W; Cai, D; Huang, S; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
May 1, 2019

In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory (CNN-BLSTM). The model is performed on the utterance level, which means the utterance-level decision can be directly obtained from the output of the neural network. To handle speech utterances with entire arbitrary and potentially long duration, we combine CNN-BLSTM model with a self-attentive pooling layer together. The front-end CNN-BLSTM module plays a role as local pattern extractor for the variable-length inputs, and the following self-attentive pooling layer is built on top to get the fixed-dimensional utterance-level representation. We conducted experiments on NIST LRE07 closed-set task, and the results reveal that the proposed attention-based CNN-BLSTM model achieves comparable error reduction with other state-of-the-art utterance-level neural network approaches for all 3 seconds, 10 seconds, 30 seconds duration tasks.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

May 1, 2019

Volume

2019-May

Start / End Page

5991 / 5995
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, W., Cai, D., Huang, S., & Li, M. (2019). Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2019-May, pp. 5991–5995). https://doi.org/10.1109/ICASSP.2019.8682386
Cai, W., D. Cai, S. Huang, and M. Li. “Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2019-May:5991–95, 2019. https://doi.org/10.1109/ICASSP.2019.8682386.
Cai W, Cai D, Huang S, Li M. Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2019. p. 5991–5.
Cai, W., et al. “Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, 2019, pp. 5991–95. Scopus, doi:10.1109/ICASSP.2019.8682386.
Cai W, Cai D, Huang S, Li M. Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2019. p. 5991–5995.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

May 1, 2019

Volume

2019-May

Start / End Page

5991 / 5995