Scholars@Duke publication: Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings

Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings

Publication , Conference

Zhang, H; Cheng, M; Feng, J; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2025

Speaker diarization benefits from multi-channel microphone arrays, yet current systems struggle with diverse configurations. We address this by simulating a dataset with various microphone topologies and proposing Selective Channel Attention-based Target Speaker Voice Activity Detection (SCATSVAD). We utilize cross-channel self-attention with masking mechanisms to enable selective attention on specific channels, allowing for the effective processing of audio data with variable multi-channel configurations. SCA-TSVAD is built upon the foundation of single-channel TSVAD. It performs superior on our simulated dataset, showcasing its robustness across diverse array configurations. To further validate the effectiveness of a real dataset, we evaluate SCA-TSVAD on the real-world Ali-Meeting database, where it successfully handles multi-channel audio inputs even when some channels were unavailable or malfunctioning, proving its practical applicability.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2025-1749

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2025

Start / End Page

5228 / 5232

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhang, H., Cheng, M., Feng, J., & Li, M. (2025). Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (pp. 5228–5232). https://doi.org/10.21437/Interspeech.2025-1749

Zhang, H., M. Cheng, J. Feng, and M. Li. “Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 5228–32, 2025. https://doi.org/10.21437/Interspeech.2025-1749.

Zhang H, Cheng M, Feng J, Li M. Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2025. p. 5228–32.

Zhang, H., et al. “Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2025, pp. 5228–32. Scopus, doi:10.21437/Interspeech.2025-1749.

Zhang H, Cheng M, Feng J, Li M. Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2025. p. 5228–5232.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2025-1749

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2025

Start / End Page

5228 / 5232