Scholars@Duke publication: VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification

VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification

Publication , Conference

Qin, X; Yang, Y; Shi, Y; Yang, L; Wang, X; Wang, J; Li, M

Published in: Communications in Computer and Information Science

January 1, 2023

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The deep learning based text-dependent speaker verification system generally needs a large-scale text-dependent training data set which could be both labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rate performance from 6.51% to 4.48% in the scenario of limited training data. In addition, we also explore the out-of-set and unseen speaker voice conversion based data augmentation.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Communications in Computer and Information Science

DOI

10.1007/978-981-99-2401-1_21

EISSN

1865-0937

ISSN

1865-0929

Publication Date

January 1, 2023

Volume

1765 CCIS

Start / End Page

227 / 237

Citation

APA

Chicago

ICMJE

MLA

NLM

Qin, X., Yang, Y., Shi, Y., Yang, L., Wang, X., Wang, J., & Li, M. (2023). VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification. In Communications in Computer and Information Science (Vol. 1765 CCIS, pp. 227–237). https://doi.org/10.1007/978-981-99-2401-1_21

Qin, X., Y. Yang, Y. Shi, L. Yang, X. Wang, J. Wang, and M. Li. “VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification.” In Communications in Computer and Information Science, 1765 CCIS:227–37, 2023. https://doi.org/10.1007/978-981-99-2401-1_21.

Qin X, Yang Y, Shi Y, Yang L, Wang X, Wang J, et al. VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification. In: Communications in Computer and Information Science. 2023. p. 227–37.

Qin, X., et al. “VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification.” Communications in Computer and Information Science, vol. 1765 CCIS, 2023, pp. 227–37. Scopus, doi:10.1007/978-981-99-2401-1_21.

Qin X, Yang Y, Shi Y, Yang L, Wang X, Wang J, Li M. VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification. Communications in Computer and Information Science. 2023. p. 227–237.

Published In

Communications in Computer and Information Science

DOI

10.1007/978-981-99-2401-1_21

EISSN

1865-0937

ISSN

1865-0929

Publication Date

January 1, 2023

Volume

1765 CCIS

Start / End Page

227 / 237