Skip to main content

JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING

Publication ,  Conference
Wang, W; Cai, D; Cheng, M; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
January 1, 2024

In this paper, we introduce a novel approach that unifies Automatic Speech Recognition (ASR) and speaker diarization in a cohesive framework. Utilizing the synergies between the two tasks, our method effectively extracts speaker-specific information from the lower layers of a pretrained Conformer-based ASR model while leveraging the higher layers for enhanced diarization performance. In particular, the integration of ASR contextual details into the diarization process has been demonstrated to be effective. Results on the DIHARD III dataset indicate that our approach achieves a Diarization Error Rate (DER) of 10.52%, which can be further reduced to 10.39% when integrating ASR features into the diarization model. These findings highlight the potential of our approach, suggesting competitive performance against other state-of-the-art systems. Additionally, our framework's ability to simultaneously generate text transcripts for each speaker marks a distinct advantage, which can further enhance ASR capabilities and transition towards an end-to-end multitask framework encompassing both ASR and speaker diarization.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2024

Start / End Page

11011 / 11015
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, W., Cai, D., Cheng, M., & Li, M. (2024). JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 11011–11015). https://doi.org/10.1109/ICASSP48485.2024.10446724
Wang, W., D. Cai, M. Cheng, and M. Li. “JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 11011–15, 2024. https://doi.org/10.1109/ICASSP48485.2024.10446724.
Wang W, Cai D, Cheng M, Li M. JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2024. p. 11011–5.
Wang, W., et al. “JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2024, pp. 11011–15. Scopus, doi:10.1109/ICASSP48485.2024.10446724.
Wang W, Cai D, Cheng M, Li M. JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2024. p. 11011–11015.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2024

Start / End Page

11011 / 11015