Scholars@Duke publication: Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

Publication , Conference

Cai, W; Chen, J; Li, M

Published in: Speaker and Language Recognition Workshop Odyssey 2018

January 1, 2018

In this paper, we explore the encoding/pooling layer and loss function in the end-to-end speaker and language recognition system. First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result. In the end-to-end system, the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation. Besides the basic temporal average pooling, we introduce a self-attentive pooling layer and a learnable dictionary encoding layer to get the utterance level representation. In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system. Experimental results on Voxceleb and NIST LRE 07 datasets show that the performance of end-to-end learning system could be significantly improved by the proposed encoding layer and loss function.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Speaker and Language Recognition Workshop Odyssey 2018

DOI

10.21437/Odyssey.2018-11

Publication Date

January 1, 2018

Start / End Page

74 / 81

Citation

APA

Chicago

ICMJE

MLA

NLM

Cai, W., Chen, J., & Li, M. (2018). Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System. In Speaker and Language Recognition Workshop Odyssey 2018 (pp. 74–81). https://doi.org/10.21437/Odyssey.2018-11

Cai, W., J. Chen, and M. Li. “Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System.” In Speaker and Language Recognition Workshop Odyssey 2018, 74–81, 2018. https://doi.org/10.21437/Odyssey.2018-11.

Cai W, Chen J, Li M. Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System. In: Speaker and Language Recognition Workshop Odyssey 2018. 2018. p. 74–81.

Cai, W., et al. “Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System.” Speaker and Language Recognition Workshop Odyssey 2018, 2018, pp. 74–81. Scopus, doi:10.21437/Odyssey.2018-11.

Cai W, Chen J, Li M. Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System. Speaker and Language Recognition Workshop Odyssey 2018. 2018. p. 74–81.

Published In

Speaker and Language Recognition Workshop Odyssey 2018

DOI

10.21437/Odyssey.2018-11

Publication Date

January 1, 2018

Start / End Page

74 / 81