Scholars@Duke publication: Speaker diarization system for autism children's real-life audio data

Speaker diarization system for autism children's real-life audio data

Publication , Conference

Zhou, T; Cai, W; Chen, X; Zou, X; Zhang, S; Li, M

Published in: Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016

May 2, 2017

In this paper, we introduce several methods to improve the performance of speaker diarization system for autism children's real-life audio data. This system serves as the frontend module for further speech analysis. Our objective is to detect the children's speech from single channel, noisy and daily audio recordings collected by wearable devices in real environment. First, in the conventional generalized likelihood ratio (GLR) distance with agglomerative hierarchical clustering (AHC) framework, besides using the line spectral pair (LSP) based GLR distance, we further propose a weighted summation of multiple GLR distances combining LSP, pitch, energy and phoneme duration information together. Second, since we only want to extract children's speech in high purity for further speech analysis, we utilize a 30 seconds long enrollment utterance from each child to perform supervised child cluster selection using i-vector cosine distance. We find out that performing supervised cluster selection at AHC early stages generates higher purity. We evaluate our methods on a 120 minutes subset data collected from three children during the child-therapist interactions. Experimental results show that our methods significantly outperform the GLR-AHC baseline in terms of child cluster's recall and precision.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016

DOI

10.1109/ISCSLP.2016.7918477

Publication Date

May 2, 2017

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhou, T., Cai, W., Chen, X., Zou, X., Zhang, S., & Li, M. (2017). Speaker diarization system for autism children's real-life audio data. In Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016. https://doi.org/10.1109/ISCSLP.2016.7918477

Zhou, T., W. Cai, X. Chen, X. Zou, S. Zhang, and M. Li. “Speaker diarization system for autism children's real-life audio data.” In Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016, 2017. https://doi.org/10.1109/ISCSLP.2016.7918477.

Zhou T, Cai W, Chen X, Zou X, Zhang S, Li M. Speaker diarization system for autism children's real-life audio data. In: Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016. 2017.

Zhou, T., et al. “Speaker diarization system for autism children's real-life audio data.” Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016, 2017. Scopus, doi:10.1109/ISCSLP.2016.7918477.

Zhou T, Cai W, Chen X, Zou X, Zhang S, Li M. Speaker diarization system for autism children's real-life audio data. Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016. 2017.

Published In

Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing Iscslp 2016

DOI

10.1109/ISCSLP.2016.7918477

Publication Date

May 2, 2017