Scholars@Duke publication: Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation

Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation

Publication , Conference

Qin, X; Cai, D; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2019

In this paper, we focus on the far-field end-to-end text-dependent speaker verification task with a small-scale far-field text dependent dataset and a large scale close-talking text independent database for training. First, we show that simulating far-field text independent data from the existing large-scale clean database for data augmentation can reduce the mismatch. Second, using a small far-field text dependent data set to fine-tune the deep speaker embedding model pre-trained from the simulated far-field as well as original clean text independent data can significantly improve the system performance. Third, in special applications when using the close-talking clean utterances for enrollment and employing the real far-field noisy utterances for testing, adding reverberant noises on the clean enrollment data can further enhance the system performance. We evaluate our methods on AISHELL ASR0009 and AISHELL 2019B-eval databases and achieve an equal error rate (EER) of 5.75% for far-field text-dependent speaker verification under noisy environments.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2019-1542

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4045 / 4049

Citation

APA

Chicago

ICMJE

MLA

NLM

Qin, X., Cai, D., & Li, M. (2019). Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 2019-September, pp. 4045–4049). https://doi.org/10.21437/Interspeech.2019-1542

Qin, X., D. Cai, and M. Li. “Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2019-September:4045–49, 2019. https://doi.org/10.21437/Interspeech.2019-1542.

Qin X, Cai D, Li M. Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2019. p. 4045–9.

Qin, X., et al. “Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 2019-September, 2019, pp. 4045–49. Scopus, doi:10.21437/Interspeech.2019-1542.

Qin X, Cai D, Li M. Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2019. p. 4045–4049.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2019-1542

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4045 / 4049