Skip to main content

Self-attentive similarity measurement strategies in speaker diarization

Publication ,  Conference
Lin, Q; Hou, Y; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2020

Speaker diarization can be described as the process of extracting sequential speaker embeddings from an audio stream and clustering them according to speaker identities. Nowadays, deep neural network based approaches like x-vector have been widely adopted for speaker embedding extraction. However, in the clustering back-end, probabilistic linear discriminant analysis (PLDA) is still the dominant algorithm for similarity measurement. PLDA works in a pair-wise and independent manner, which may ignore the positional correlation of adjacent speaker embeddings. To address this issue, our previous work proposed the long short-term memory (LSTM) based scoring model, followed by the spectral clustering algorithm. In this paper, we further propose two enhanced methods based on the self-attention mechanism, which no longer focuses on the local correlation but searches for similar speaker embeddings in the whole sequence. The first approach achieves state-of-the-art performance on the DIHARD II Eval Set (18.44% DER after resegmentation), while the second one operates with higher efficiency.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2020

Volume

2020-October

Start / End Page

284 / 288
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lin, Q., Hou, Y., & Li, M. (2020). Self-attentive similarity measurement strategies in speaker diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2020-October, pp. 284–288). https://doi.org/10.21437/Interspeech.2020-1908
Lin, Q., Y. Hou, and M. Li. “Self-attentive similarity measurement strategies in speaker diarization.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020-October:284–88, 2020. https://doi.org/10.21437/Interspeech.2020-1908.
Lin Q, Hou Y, Li M. Self-attentive similarity measurement strategies in speaker diarization. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2020. p. 284–8.
Lin, Q., et al. “Self-attentive similarity measurement strategies in speaker diarization.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2020-October, 2020, pp. 284–88. Scopus, doi:10.21437/Interspeech.2020-1908.
Lin Q, Hou Y, Li M. Self-attentive similarity measurement strategies in speaker diarization. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2020. p. 284–288.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2020

Volume

2020-October

Start / End Page

284 / 288