Skip to main content

End-to-end language identification using NetFV and NetVLAD

Publication ,  Conference
Chen, J; Cai, W; Cai, D; Cai, Z; Zhong, H; Li, M
Published in: 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
July 2, 2018

In this paper, we apply the NetFV and NetVLAD layers for the end-to-end language identification task. NetFV and NetVLAD layers are the differentiable implementations of the standard Fisher Vector and Vector of Locally Aggregated Descriptors (VLAD) methods, respectively. Both of them can encode a sequence of feature vectors into a fixed dimensional vector which is very important to process those variable-length utterances. We first present the relevances and differences between the classical i-vector and the aforementioned encoding schemes. Then, we construct a flexible end-to-end framework including a con-volutional neural network (CNN) architecture and an encoding layer (NetFV or NetVLAD) for the language identification task. Experimental results on the NIST LRE 2007 close-set task show that the proposed system achieves significant EER reductions against the conventional i-vector baseline and the CNN temporal average pooling system, respectively.

Duke Scholars

Published In

2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings

DOI

Publication Date

July 2, 2018

Start / End Page

319 / 323
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Chen, J., Cai, W., Cai, D., Cai, Z., Zhong, H., & Li, M. (2018). End-to-end language identification using NetFV and NetVLAD. In 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings (pp. 319–323). https://doi.org/10.1109/ISCSLP.2018.8706687
Chen, J., W. Cai, D. Cai, Z. Cai, H. Zhong, and M. Li. “End-to-end language identification using NetFV and NetVLAD.” In 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings, 319–23, 2018. https://doi.org/10.1109/ISCSLP.2018.8706687.
Chen J, Cai W, Cai D, Cai Z, Zhong H, Li M. End-to-end language identification using NetFV and NetVLAD. In: 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings. 2018. p. 319–23.
Chen, J., et al. “End-to-end language identification using NetFV and NetVLAD.” 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings, 2018, pp. 319–23. Scopus, doi:10.1109/ISCSLP.2018.8706687.
Chen J, Cai W, Cai D, Cai Z, Zhong H, Li M. End-to-end language identification using NetFV and NetVLAD. 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings. 2018. p. 319–323.

Published In

2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings

DOI

Publication Date

July 2, 2018

Start / End Page

319 / 323