Scholars@Duke publication: Role-aware Speaker Diarization in Autism Interview Scenarios

Role-aware Speaker Diarization in Autism Interview Scenarios

Publication , Journal Article

Wang, K; Cheng, M; Xie, Y; Zou, X; Li, M

Published in: Computer Science

February 15, 2025

Speaker diarization technology plays a pivotal role in the field of intelligent speech transcription, with its core task being the segmentation and clustering of multi-speaker audio based on speaker identities, thereby facilitating better organization of audio content and transcribed text. In the scenarios of medical interview, speaker diarization technology serves as a prerequisite for subsequent automated assessment. Role information is naturally present in the field of medical interactive dialogue, taking autism as an example, the typical situation includes three well-defined roles: doctor, parent, and child undergoing diagnosis. However, in actual conversation, the correspondence between the role and the speaker may not always be one-to-one. For instance, during autism diagnosis, each conversation may involve only one child, while the number of doctors or parents may vary. We believe that the role information and the speaker information embedded in each speech segment can effectively complement each other, thereby reducing the diarization error rate. In this study, we propose a method integrating role information into the sequence-to-sequence target speaker voice activity detection(Seq2Seq-TSVAD) framework, achieving a diarization error rate(DER) of 20. 61 % on the CPEP-3 dataset. This error rate is 9. 8% lower compared to the Seq2Seq-TSVAD baseline method and 19. 3% lower compared to the conventional modular speaker diarization method, underscoring the significant effect of role information in enhancing speaker diarization performance in autism interview scenarios.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Computer Science

DOI

10.11896/jsjkx.240100059

ISSN

1002-137X

Publication Date

February 15, 2025

Volume

Issue

Start / End Page

231 / 241

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, K., Cheng, M., Xie, Y., Zou, X., & Li, M. (2025). Role-aware Speaker Diarization in Autism Interview Scenarios. Computer Science, 52(2), 231–241. https://doi.org/10.11896/jsjkx.240100059

Wang, K., M. Cheng, Y. Xie, X. Zou, and M. Li. “Role-aware Speaker Diarization in Autism Interview Scenarios.” Computer Science 52, no. 2 (February 15, 2025): 231–41. https://doi.org/10.11896/jsjkx.240100059.

Wang K, Cheng M, Xie Y, Zou X, Li M. Role-aware Speaker Diarization in Autism Interview Scenarios. Computer Science. 2025 Feb 15;52(2):231–41.

Wang, K., et al. “Role-aware Speaker Diarization in Autism Interview Scenarios.” Computer Science, vol. 52, no. 2, Feb. 2025, pp. 231–41. Scopus, doi:10.11896/jsjkx.240100059.

Wang K, Cheng M, Xie Y, Zou X, Li M. Role-aware Speaker Diarization in Autism Interview Scenarios. Computer Science. 2025 Feb 15;52(2):231–241.

Published In

Computer Science

DOI

10.11896/jsjkx.240100059

ISSN

1002-137X

Publication Date

February 15, 2025

Volume

Issue

Start / End Page

231 / 241