Scholars@Duke publication: Investigating Long-Term and Short-Term Time-Varying Speaker Verification

Investigating Long-Term and Short-Term Time-Varying Speaker Verification

Publication , Journal Article

Qin, X; Li, N; Duan, S; Li, M

Published in: IEEE ACM Transactions on Audio Speech and Language Processing

January 1, 2024

—The performance of speaker verification systems can be adversely affected by time domain variations. However, limited research has been conducted on time-varying speaker verification due to the absence of appropriate datasets. This paper aims to investigate the impact of long-term and short-term time-varying in speaker verification and proposes solutions to mitigate these effects. For long-term speaker verification (i.e., cross-age speaker verification), we introduce an age-decoupling adversarial learning method to learn age-invariant speaker representation by mining age information from the VoxCeleb dataset. For short-term speaker verification, we collect the SMIIP-TimeVarying (SMIIP-TV) Dataset, which includes recordings at multiple time slots every day from 373 speakers for 90 consecutive days and other relevant meta information. Using this dataset, we analyze the time-varying of speaker embeddings and propose a novel but realistic time-varying speaker verification task, termed incremental sequence-pair speaker verification. This task involves continuous interaction between enrollment audios and a sequence of testing audios with the aim of improving performance over time. We introduce the template updating method to counter the negative effects over time, and then formulate the template updating processing as a Markov Decision Process and propose a template updating method based on deep reinforcement learning (DRL). The policy network of DRL is treated as an agent to determine if and how much should the template be updated. In summary, this paper releases our collected database, investigates both the long-term and short-term time-varying scenarios and provides insights and solutions into time-varying speaker verification.

Duke Scholars

Author Ming Li DKU Faculty

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2024.3428910

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2024

Volume

Start / End Page

3408 / 3423

Citation

APA

Chicago

ICMJE

MLA

NLM

Qin, X., Li, N., Duan, S., & Li, M. (2024). Investigating Long-Term and Short-Term Time-Varying Speaker Verification. IEEE ACM Transactions on Audio Speech and Language Processing, 32, 3408–3423. https://doi.org/10.1109/TASLP.2024.3428910

Qin, X., N. Li, S. Duan, and M. Li. “Investigating Long-Term and Short-Term Time-Varying Speaker Verification.” IEEE ACM Transactions on Audio Speech and Language Processing 32 (January 1, 2024): 3408–23. https://doi.org/10.1109/TASLP.2024.3428910.

Qin X, Li N, Duan S, Li M. Investigating Long-Term and Short-Term Time-Varying Speaker Verification. IEEE ACM Transactions on Audio Speech and Language Processing. 2024 Jan 1;32:3408–23.

Qin, X., et al. “Investigating Long-Term and Short-Term Time-Varying Speaker Verification.” IEEE ACM Transactions on Audio Speech and Language Processing, vol. 32, Jan. 2024, pp. 3408–23. Scopus, doi:10.1109/TASLP.2024.3428910.

Qin X, Li N, Duan S, Li M. Investigating Long-Term and Short-Term Time-Varying Speaker Verification. IEEE ACM Transactions on Audio Speech and Language Processing. 2024 Jan 1;32:3408–3423.

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2024.3428910

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2024

Volume

Start / End Page

3408 / 3423