Skip to main content

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification

Publication ,  Conference
Cai, W; Cai, Z; Zhang, X; Wang, X; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
September 10, 2018

A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation. Unlike the conventional methods, our new approach provides an end-to-end learning framework, where the inherent dictionary are learned directly from the loss function. The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. We conducted a preliminary experiment on NIST LRE07 closed-set task, and the results reveal that our proposed dictionary encoding layer achieves significant error reduction comparing with the simple average pooling.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

September 10, 2018

Volume

2018-April

Start / End Page

5189 / 5193
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, W., Cai, Z., Zhang, X., Wang, X., & Li, M. (2018). A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2018-April, pp. 5189–5193). https://doi.org/10.1109/ICASSP.2018.8462025
Cai, W., Z. Cai, X. Zhang, X. Wang, and M. Li. “A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2018-April:5189–93, 2018. https://doi.org/10.1109/ICASSP.2018.8462025.
Cai W, Cai Z, Zhang X, Wang X, Li M. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2018. p. 5189–93.
Cai, W., et al. “A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018-April, 2018, pp. 5189–93. Scopus, doi:10.1109/ICASSP.2018.8462025.
Cai W, Cai Z, Zhang X, Wang X, Li M. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2018. p. 5189–5193.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

September 10, 2018

Volume

2018-April

Start / End Page

5189 / 5193