Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization
Publication
, Journal Article
Cheng, M; Li, M
Published in: IEEE Transactions on Audio, Speech and Language Processing
2025
Duke Scholars
Published In
IEEE Transactions on Audio, Speech and Language Processing
DOI
EISSN
2998-4173
Publication Date
2025
Volume
33
Start / End Page
3522 / 3536
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Related Subject Headings
- Speech-Language Pathology & Audiology
- 4603 Computer vision and multimedia computation
- 4602 Artificial intelligence
- 4006 Communications engineering
- 0906 Electrical and Electronic Engineering
- 0801 Artificial Intelligence and Image Processing
Citation
APA
Chicago
ICMJE
MLA
NLM
Cheng, M., & Li, M. (2025). Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization. IEEE Transactions on Audio, Speech and Language Processing, 33, 3522–3536. https://doi.org/10.1109/taslpro.2025.3597450
Cheng, Ming, and Ming Li. “Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization.” IEEE Transactions on Audio, Speech and Language Processing 33 (2025): 3522–36. https://doi.org/10.1109/taslpro.2025.3597450.
Cheng M, Li M. Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization. IEEE Transactions on Audio, Speech and Language Processing. 2025;33:3522–36.
Cheng, Ming, and Ming Li. “Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization.” IEEE Transactions on Audio, Speech and Language Processing, vol. 33, Institute of Electrical and Electronics Engineers (IEEE), 2025, pp. 3522–36. Crossref, doi:10.1109/taslpro.2025.3597450.
Cheng M, Li M. Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization. IEEE Transactions on Audio, Speech and Language Processing. Institute of Electrical and Electronics Engineers (IEEE); 2025;33:3522–3536.
Published In
IEEE Transactions on Audio, Speech and Language Processing
DOI
EISSN
2998-4173
Publication Date
2025
Volume
33
Start / End Page
3522 / 3536
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Related Subject Headings
- Speech-Language Pathology & Audiology
- 4603 Computer vision and multimedia computation
- 4602 Artificial intelligence
- 4006 Communications engineering
- 0906 Electrical and Electronic Engineering
- 0801 Artificial Intelligence and Image Processing