Skip to main content

LSTM based similarity measurement with spectral clustering for speaker diarization

Publication ,  Conference
Lin, Q; Yin, R; Li, M; Bredin, H; Barras, C
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2019

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALLHOME database.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

366 / 370
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lin, Q., Yin, R., Li, M., Bredin, H., & Barras, C. (2019). LSTM based similarity measurement with spectral clustering for speaker diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2019-September, pp. 366–370). https://doi.org/10.21437/Interspeech.2019-1388
Lin, Q., R. Yin, M. Li, H. Bredin, and C. Barras. “LSTM based similarity measurement with spectral clustering for speaker diarization.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September:366–70, 2019. https://doi.org/10.21437/Interspeech.2019-1388.
Lin Q, Yin R, Li M, Bredin H, Barras C. LSTM based similarity measurement with spectral clustering for speaker diarization. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2019. p. 366–70.
Lin, Q., et al. “LSTM based similarity measurement with spectral clustering for speaker diarization.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-September, 2019, pp. 366–70. Scopus, doi:10.21437/Interspeech.2019-1388.
Lin Q, Yin R, Li M, Bredin H, Barras C. LSTM based similarity measurement with spectral clustering for speaker diarization. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2019. p. 366–370.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

366 / 370