Skip to main content

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Publication ,  Conference
Lin, Y; Cheng, M; Zhang, F; Gao, Y; Zhang, S; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
January 1, 2024

In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2024

Start / End Page

4263 / 4267
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lin, Y., Cheng, M., Zhang, F., Gao, Y., Zhang, S., & Li, M. (2024). VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (pp. 4263–4267). https://doi.org/10.21437/Interspeech.2024-1490
Lin, Y., M. Cheng, F. Zhang, Y. Gao, S. Zhang, and M. Li. “VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 4263–67, 2024. https://doi.org/10.21437/Interspeech.2024-1490.
Lin Y, Cheng M, Zhang F, Gao Y, Zhang S, Li M. VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2024. p. 4263–7.
Lin, Y., et al. “VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2024, pp. 4263–67. Scopus, doi:10.21437/Interspeech.2024-1490.
Lin Y, Cheng M, Zhang F, Gao Y, Zhang S, Li M. VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2024. p. 4263–4267.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2024

Start / End Page

4263 / 4267