Skip to main content

Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation

Publication ,  Conference
Qin, X; Cai, D; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2019

In this paper, we focus on the far-field end-to-end text-dependent speaker verification task with a small-scale far-field text dependent dataset and a large scale close-talking text independent database for training. First, we show that simulating far-field text independent data from the existing large-scale clean database for data augmentation can reduce the mismatch. Second, using a small far-field text dependent data set to fine-tune the deep speaker embedding model pre-trained from the simulated far-field as well as original clean text independent data can significantly improve the system performance. Third, in special applications when using the close-talking clean utterances for enrollment and employing the real far-field noisy utterances for testing, adding reverberant noises on the clean enrollment data can further enhance the system performance. We evaluate our methods on AISHELL ASR0009 and AISHELL 2019B-eval databases and achieve an equal error rate (EER) of 5.75% for far-field text-dependent speaker verification under noisy environments.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4045 / 4049
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Qin, X., Cai, D., & Li, M. (2019). Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2019-September, pp. 4045–4049). https://doi.org/10.21437/Interspeech.2019-1542
Qin, X., D. Cai, and M. Li. “Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September:4045–49, 2019. https://doi.org/10.21437/Interspeech.2019-1542.
Qin X, Cai D, Li M. Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2019. p. 4045–9.
Qin, X., et al. “Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-September, 2019, pp. 4045–49. Scopus, doi:10.21437/Interspeech.2019-1542.
Qin X, Cai D, Li M. Far-field end-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2019. p. 4045–4049.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4045 / 4049