Skip to main content

Role-aware Speaker Diarization in Autism Interview Scenarios

Publication ,  Journal Article
Wang, K; Cheng, M; Xie, Y; Zou, X; Li, M
Published in: Computer Science
February 15, 2025

Speaker diarization technology plays a pivotal role in the field of intelligent speech transcription, with its core task being the segmentation and clustering of multi-speaker audio based on speaker identities, thereby facilitating better organization of audio content and transcribed text. In the scenarios of medical interview, speaker diarization technology serves as a prerequisite for subsequent automated assessment. Role information is naturally present in the field of medical interactive dialogue, taking autism as an example, the typical situation includes three well-defined roles: doctor, parent, and child undergoing diagnosis. However, in actual conversation, the correspondence between the role and the speaker may not always be one-to-one. For instance, during autism diagnosis, each conversation may involve only one child, while the number of doctors or parents may vary. We believe that the role information and the speaker information embedded in each speech segment can effectively complement each other, thereby reducing the diarization error rate. In this study, we propose a method integrating role information into the sequence-to-sequence target speaker voice activity detection(Seq2Seq-TSVAD) framework, achieving a diarization error rate(DER) of 20. 61 % on the CPEP-3 dataset. This error rate is 9. 8% lower compared to the Seq2Seq-TSVAD baseline method and 19. 3% lower compared to the conventional modular speaker diarization method, underscoring the significant effect of role information in enhancing speaker diarization performance in autism interview scenarios.

Duke Scholars

Published In

Computer Science

DOI

ISSN

1002-137X

Publication Date

February 15, 2025

Volume

52

Issue

2

Start / End Page

231 / 241
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, K., Cheng, M., Xie, Y., Zou, X., & Li, M. (2025). Role-aware Speaker Diarization in Autism Interview Scenarios. Computer Science, 52(2), 231–241. https://doi.org/10.11896/jsjkx.240100059
Wang, K., M. Cheng, Y. Xie, X. Zou, and M. Li. “Role-aware Speaker Diarization in Autism Interview Scenarios.” Computer Science 52, no. 2 (February 15, 2025): 231–41. https://doi.org/10.11896/jsjkx.240100059.
Wang K, Cheng M, Xie Y, Zou X, Li M. Role-aware Speaker Diarization in Autism Interview Scenarios. Computer Science. 2025 Feb 15;52(2):231–41.
Wang, K., et al. “Role-aware Speaker Diarization in Autism Interview Scenarios.” Computer Science, vol. 52, no. 2, Feb. 2025, pp. 231–41. Scopus, doi:10.11896/jsjkx.240100059.
Wang K, Cheng M, Xie Y, Zou X, Li M. Role-aware Speaker Diarization in Autism Interview Scenarios. Computer Science. 2025 Feb 15;52(2):231–241.

Published In

Computer Science

DOI

ISSN

1002-137X

Publication Date

February 15, 2025

Volume

52

Issue

2

Start / End Page

231 / 241