The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge
Publication
, Conference
Cheng, M; Wang, H; Wang, Z; Fu, Q; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
January 1, 2023
This paper describes the system developed by the WHU-Alibaba team for the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. We extend the Sequence-to-Sequence Target-Speaker Voice Activity Detection framework to simultaneously detect multiple speakers' voice activities from audio-visual signals. The final system achieves a diarization error rate (DER) of 8.82% on the evaluation set of the competition database, which ranks 1st in the speaker diarization track of the MISP 2022, ICASSP Signal Processing Grand Challenge.
Duke Scholars
Published In
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOI
ISSN
1520-6149
Publication Date
January 1, 2023
Volume
2023-June
Citation
APA
Chicago
ICMJE
MLA
NLM
Cheng, M., Wang, H., Wang, Z., Fu, Q., & Li, M. (2023). The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2023-June). https://doi.org/10.1109/ICASSP49357.2023.10095802
Cheng, M., H. Wang, Z. Wang, Q. Fu, and M. Li. “The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 2023-June, 2023. https://doi.org/10.1109/ICASSP49357.2023.10095802.
Cheng M, Wang H, Wang Z, Fu Q, Li M. The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2023.
Cheng, M., et al. “The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2023-June, 2023. Scopus, doi:10.1109/ICASSP49357.2023.10095802.
Cheng M, Wang H, Wang Z, Fu Q, Li M. The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2023.
Published In
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOI
ISSN
1520-6149
Publication Date
January 1, 2023
Volume
2023-June