Scholars@Duke publication: The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results

The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results

Publication , Conference

Jia, Y; Wang, X; Qin, X; Zhang, Y; Wang, J; Zhang, D; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

January 1, 2021

The 2020 Personalized Voice Trigger Challenge (PVTC2020) addresses two different research problems in a unified setup: joint wake-up word detection with speaker verification on closetalking single microphone data and far-field multi-channel microphone array data. Specially, the second task poses an additional cross-channel matching challenge on top of the far-field condition. To simulate the real-life application scenario, the enrollment utterances are recorded from close-talking cell-phone only, while the test utterances are recorded from both the closetalking cell-phone and the far-field microphone arrays. This paper introduces our challenge setup and the released database as well as the evaluation metrics. In addition, we present a sequential two stage end-to-end neural network baseline system trained with the proposed database for speaker-dependent wake-up word detection. Results show that state-of-the-art personalized voice trigger methods are still based on the two stage design, however, this benchmark database could also be used to evaluate multi-task joint learning methods. The official website1, the open-source baseline system2 and results3 of submitted systems have been released.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

10.21437/Interspeech.2021-602

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2021

Volume

Start / End Page

4066 / 4070

Citation

APA

Chicago

ICMJE

MLA

NLM

Jia, Y., Wang, X., Qin, X., Zhang, Y., Wang, J., Zhang, D., & Li, M. (2021). The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 6, pp. 4066–4070). https://doi.org/10.21437/Interspeech.2021-602

Jia, Y., X. Wang, X. Qin, Y. Zhang, J. Wang, D. Zhang, and M. Li. “The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 6:4066–70, 2021. https://doi.org/10.21437/Interspeech.2021-602.

Jia Y, Wang X, Qin X, Zhang Y, Wang J, Zhang D, et al. The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021. p. 4066–70.

Jia, Y., et al. “The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 6, 2021, pp. 4066–70. Scopus, doi:10.21437/Interspeech.2021-602.

Jia Y, Wang X, Qin X, Zhang Y, Wang J, Zhang D, Li M. The 2020 personalized voice trigger challenge: Open datasets, evaluation metrics, baseline system and results. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2021. p. 4066–4070.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

10.21437/Interspeech.2021-602

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2021

Volume

Start / End Page

4066 / 4070