Skip to main content

Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings

Publication ,  Conference
Zhang, H; Cheng, M; Feng, J; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
January 1, 2025

Speaker diarization benefits from multi-channel microphone arrays, yet current systems struggle with diverse configurations. We address this by simulating a dataset with various microphone topologies and proposing Selective Channel Attention-based Target Speaker Voice Activity Detection (SCATSVAD). We utilize cross-channel self-attention with masking mechanisms to enable selective attention on specific channels, allowing for the effective processing of audio data with variable multi-channel configurations. SCA-TSVAD is built upon the foundation of single-channel TSVAD. It performs superior on our simulated dataset, showcasing its robustness across diverse array configurations. To further validate the effectiveness of a real dataset, we evaluate SCA-TSVAD on the real-world Ali-Meeting database, where it successfully handles multi-channel audio inputs even when some channels were unavailable or malfunctioning, proving its practical applicability.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2025

Start / End Page

5228 / 5232
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zhang, H., Cheng, M., Feng, J., & Li, M. (2025). Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (pp. 5228–5232). https://doi.org/10.21437/Interspeech.2025-1749
Zhang, H., M. Cheng, J. Feng, and M. Li. “Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 5228–32, 2025. https://doi.org/10.21437/Interspeech.2025-1749.
Zhang H, Cheng M, Feng J, Li M. Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2025. p. 5228–32.
Zhang, H., et al. “Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2025, pp. 5228–32. Scopus, doi:10.21437/Interspeech.2025-1749.
Zhang H, Cheng M, Feng J, Li M. Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2025. p. 5228–5232.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2025

Start / End Page

5228 / 5232