Skip to main content

A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification

Publication ,  Conference
Miao, X; McLoughlin, I; Yan, Y
Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
January 1, 2019

In this paper, we aim to improve traditional DNN x-vector language identification (LID) performance by employing Convolutional and Long Short Term Memory-Recurrent (CLSTM) Neural Networks, as they can strengthen feature extraction and capture longer temporal dependencies. We also propose a two-dimensional attention mechanism. Compared with conventional one-dimensional time attention, our method introduces a frequency attention mechanism to give different weights to different frequency bands to generate weighted means and standard deviations. This mechanism can direct attention to either time or frequency information, and can be trained or fused singly or jointly. Experimental results show firstly that CLSTM can significantly outperform a traditional DNN x-vector implementation. Secondly, the proposed frequency attention method is more effective than time attention, particularly when the number of frequency bands matches the feature size. Furthermore, frequency-time score merging is the best, whereas frequency-time feature merge only shows improvements for small frequency dimension.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4080 / 4084
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Miao, X., McLoughlin, I., & Yan, Y. (2019). A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 2019-September, pp. 4080–4084). https://doi.org/10.21437/Interspeech.2019-1256
Miao, X., I. McLoughlin, and Y. Yan. “A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2019-September:4080–84, 2019. https://doi.org/10.21437/Interspeech.2019-1256.
Miao X, McLoughlin I, Yan Y. A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2019. p. 4080–4.
Miao, X., et al. “A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 2019-September, 2019, pp. 4080–84. Scopus, doi:10.21437/Interspeech.2019-1256.
Miao X, McLoughlin I, Yan Y. A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2019. p. 4080–4084.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4080 / 4084