Skip to main content

Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization

Publication ,  Journal Article
Cheng, M; Li, M
Published in: IEEE Transactions on Audio, Speech and Language Processing
2025

Duke Scholars

Published In

IEEE Transactions on Audio, Speech and Language Processing

DOI

EISSN

2998-4173

Publication Date

2025

Volume

33

Start / End Page

3522 / 3536

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 4603 Computer vision and multimedia computation
  • 4602 Artificial intelligence
  • 4006 Communications engineering
  • 0906 Electrical and Electronic Engineering
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cheng, M., & Li, M. (2025). Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization. IEEE Transactions on Audio, Speech and Language Processing, 33, 3522–3536. https://doi.org/10.1109/taslpro.2025.3597450
Cheng, Ming, and Ming Li. “Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization.” IEEE Transactions on Audio, Speech and Language Processing 33 (2025): 3522–36. https://doi.org/10.1109/taslpro.2025.3597450.
Cheng M, Li M. Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization. IEEE Transactions on Audio, Speech and Language Processing. 2025;33:3522–36.
Cheng, Ming, and Ming Li. “Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization.” IEEE Transactions on Audio, Speech and Language Processing, vol. 33, Institute of Electrical and Electronics Engineers (IEEE), 2025, pp. 3522–36. Crossref, doi:10.1109/taslpro.2025.3597450.
Cheng M, Li M. Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization. IEEE Transactions on Audio, Speech and Language Processing. Institute of Electrical and Electronics Engineers (IEEE); 2025;33:3522–3536.

Published In

IEEE Transactions on Audio, Speech and Language Processing

DOI

EISSN

2998-4173

Publication Date

2025

Volume

33

Start / End Page

3522 / 3536

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 4603 Computer vision and multimedia computation
  • 4602 Artificial intelligence
  • 4006 Communications engineering
  • 0906 Electrical and Electronic Engineering
  • 0801 Artificial Intelligence and Image Processing