Skip to main content

The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge

Publication ,  Conference
Cheng, M; Wang, H; Wang, Z; Fu, Q; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
January 1, 2023

This paper describes the system developed by the WHU-Alibaba team for the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. We extend the Sequence-to-Sequence Target-Speaker Voice Activity Detection framework to simultaneously detect multiple speakers' voice activities from audio-visual signals. The final system achieves a diarization error rate (DER) of 8.82% on the evaluation set of the competition database, which ranks 1st in the speaker diarization track of the MISP 2022, ICASSP Signal Processing Grand Challenge.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2023

Volume

2023-June
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cheng, M., Wang, H., Wang, Z., Fu, Q., & Li, M. (2023). The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2023-June). https://doi.org/10.1109/ICASSP49357.2023.10095802
Cheng, M., H. Wang, Z. Wang, Q. Fu, and M. Li. “The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 2023-June, 2023. https://doi.org/10.1109/ICASSP49357.2023.10095802.
Cheng M, Wang H, Wang Z, Fu Q, Li M. The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2023.
Cheng, M., et al. “The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2023-June, 2023. Scopus, doi:10.1109/ICASSP49357.2023.10095802.
Cheng M, Wang H, Wang Z, Fu Q, Li M. The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2023.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2023

Volume

2023-June