Scholars@Duke publication: Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Publication , Conference

Qin, X; Li, N; Weng, C; Su, D; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2022

Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs to align with Vox-H cases. The baseline system performance drops from 1.939% EER on the Vox-H test set to 10.419% on the Vox-CA20 test set, which indicates how difficult the cross-age scenario is. Consequently, we propose an age-decoupling adversarial learning (ADAL) method to alleviate the negative effect of the age gap and reduce intra-class variance. Our method outperforms the baseline system by over 10% related EER reduction on the Vox-CA20 test set. The source code and trial resources are available on https://github.com/qinxiaoyi/Cross-Age Speaker Verification.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2022-648

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2022

Volume

2022-September

Start / End Page

1436 / 1440

Citation

APA

Chicago

ICMJE

MLA

NLM

Qin, X., Li, N., Weng, C., Su, D., & Li, M. (2022). Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 2022-September, pp. 1436–1440). https://doi.org/10.21437/Interspeech.2022-648

Qin, X., N. Li, C. Weng, D. Su, and M. Li. “Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2022-September:1436–40, 2022. https://doi.org/10.21437/Interspeech.2022-648.

Qin X, Li N, Weng C, Su D, Li M. Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2022. p. 1436–40.

Qin, X., et al. “Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 2022-September, 2022, pp. 1436–40. Scopus, doi:10.21437/Interspeech.2022-648.

Qin X, Li N, Weng C, Su D, Li M. Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2022. p. 1436–1440.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2022-648

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2022

Volume

2022-September

Start / End Page

1436 / 1440