Skip to main content

EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH

Publication ,  Conference
Zeng, B; Cheng, M; Tian, Y; Liu, H; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
January 1, 2024

Personal voice activity detection (PVAD) is gradually used in speech assistants. Traditional PVAD schemes extract the target speaker's embedding from existing query reference speech through a pre-trained speaker verification model. Consequently, the performance of the PVAD model may suffer if the quality of the extracted speaker embedding is poor, such as when only utilizing wake word speech as the reference. In this work, we introduce a novel and efficient PVAD model. In contrast to conventional approaches that rely on speaker embeddings extracted from a pre-trained speaker verification model, our proposed method directly uses the raw frame-level features of the reference speech as the target speaker's attributes. In this way, our proposed model achieves an ultra-high recall rate, which is vital for speech assistant applications. The experimental results show the effectiveness of our proposed method in both cases of using existing query speech or wake word speech as reference.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2024

Start / End Page

12241 / 12245
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zeng, B., Cheng, M., Tian, Y., Liu, H., & Li, M. (2024). EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 12241–12245). https://doi.org/10.1109/ICASSP48485.2024.10446042
Zeng, B., M. Cheng, Y. Tian, H. Liu, and M. Li. “EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 12241–45, 2024. https://doi.org/10.1109/ICASSP48485.2024.10446042.
Zeng B, Cheng M, Tian Y, Liu H, Li M. EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2024. p. 12241–5.
Zeng, B., et al. “EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2024, pp. 12241–45. Scopus, doi:10.1109/ICASSP48485.2024.10446042.
Zeng B, Cheng M, Tian Y, Liu H, Li M. EFFICIENT PERSONAL VOICE ACTIVITY DETECTION WITH WAKE WORD REFERENCE SPEECH. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2024. p. 12241–12245.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2024

Start / End Page

12241 / 12245