Scholars@Duke publication: EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH

EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH

Publication , Conference

Zeng, B; Cheng, M; Tian, Y; Liu, H; Li, M

Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

January 1, 2024

Personal voice activity detection (PVAD) is gradually used in speech assistants. Traditional PVAD schemes extract the target speaker's embedding from existing query reference speech through a pre-trained speaker verification model. Consequently, the performance of the PVAD model may suffer if the quality of the extracted speaker embedding is poor, such as when only utilizing wake word speech as the reference. In this work, we introduce a novel and efficient PVAD model. In contrast to conventional approaches that rely on speaker embeddings extracted from a pre-trained speaker verification model, our proposed method directly uses the raw frame-level features of the reference speech as the target speaker's attributes. In this way, our proposed model achieves an ultra-high recall rate, which is vital for speech assistant applications. The experimental results show the effectiveness of our proposed method in both cases of using existing query speech or wake word speech as reference.

Duke Scholars

Author Ming Li DKU Faculty

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP48485.2024.10446042

ISSN

1520-6149

Publication Date

January 1, 2024

Start / End Page

12241 / 12245

Citation

APA

Chicago

ICMJE

MLA

NLM

Zeng, B., Cheng, M., Tian, Y., Liu, H., & Li, M. (2024). EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (pp. 12241–12245). https://doi.org/10.1109/ICASSP48485.2024.10446042

Zeng, B., M. Cheng, Y. Tian, H. Liu, and M. Li. “EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 12241–45, 2024. https://doi.org/10.1109/ICASSP48485.2024.10446042.

Zeng B, Cheng M, Tian Y, Liu H, Li M. EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2024. p. 12241–5.

Zeng, B., et al. “EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2024, pp. 12241–45. Scopus, doi:10.1109/ICASSP48485.2024.10446042.

Zeng B, Cheng M, Tian Y, Liu H, Li M. EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2024. p. 12241–12245.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP48485.2024.10446042

ISSN

1520-6149

Publication Date

January 1, 2024

Start / End Page

12241 / 12245