Scholars@Duke publication: Analysis of length normalization in end-to-end speaker verification system

Analysis of length normalization in end-to-end speaker verification system

Publication , Conference

Cai, W; Chen, J; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2018

The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterance-level representations in automatic speaker verification systems. Traditionally, once i-vectors or deep speaker embeddings are extracted, we rely on an extra length normalization step to normalize the representations into unit-length hyperspace before back-end modeling. In this paper, we explore how the neural network learns length-normalized deep speaker embeddings in an end-to-end manner. To this end, we add a length normalization layer followed by a scale layer before the output layer of the common classification network. We conducted experiments on the verification task of the Voxceleb1 dataset. The results show that integrating this simple step in the end-to-end training pipeline significantly boosts the performance of speaker verification. In the testing stage of our L2-normalized end-to-end system, a simple inner-product can achieve the state-of-the-art.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2018-92

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2018

Volume

2018-September

Start / End Page

3618 / 3622

Citation

APA

Chicago

ICMJE

MLA

NLM

Cai, W., Chen, J., & Li, M. (2018). Analysis of length normalization in end-to-end speaker verification system. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 2018-September, pp. 3618–3622). https://doi.org/10.21437/Interspeech.2018-92

Cai, W., J. Chen, and M. Li. “Analysis of length normalization in end-to-end speaker verification system.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2018-September:3618–22, 2018. https://doi.org/10.21437/Interspeech.2018-92.

Cai W, Chen J, Li M. Analysis of length normalization in end-to-end speaker verification system. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2018. p. 3618–22.

Cai, W., et al. “Analysis of length normalization in end-to-end speaker verification system.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 2018-September, 2018, pp. 3618–22. Scopus, doi:10.21437/Interspeech.2018-92.

Cai W, Chen J, Li M. Analysis of length normalization in end-to-end speaker verification system. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2018. p. 3618–3622.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2018-92

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2018

Volume

2018-September

Start / End Page

3618 / 3622