Scholars@Duke publication: Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization

Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization

Publication , Journal Article

Wang, W; Lin, Q; Cai, D; Li, M

Published in: IEEE ACM Transactions on Audio Speech and Language Processing

January 1, 2022

In this paper, we propose a neural-network-based similarity measurement method to learn the similarity between any two speaker embeddings, where both previous and future contexts are considered. Moreover, we propose the segmental pooling strategy and jointly train the speaker embedding network along with the similarity measurement model. Later, this joint training framework is further extended to the target-speaker voice activity detection (TS-VAD), with only slight modification in the network architecture. Experimental results of the DIHARD II, DIHARD III and VoxConverse datasets show that our clustering-based system with the neural similarity measurement achieves superior performance to recent approaches on all three datasets. In addition, the segment-level TS-VAD method further improves the clustering-based results and achieves DER of 16.48%, 11.62% and 4.39% on the DIHARD II, DIHARD III and VoxConverse datasets, respectively.

Duke Scholars

Author Ming Li DKU Faculty

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2022.3196178

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2022

Volume

Start / End Page

2645 / 2658

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, W., Lin, Q., Cai, D., & Li, M. (2022). Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. IEEE ACM Transactions on Audio Speech and Language Processing, 30, 2645–2658. https://doi.org/10.1109/TASLP.2022.3196178

Wang, W., Q. Lin, D. Cai, and M. Li. “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.” IEEE ACM Transactions on Audio Speech and Language Processing 30 (January 1, 2022): 2645–58. https://doi.org/10.1109/TASLP.2022.3196178.

Wang W, Lin Q, Cai D, Li M. Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. IEEE ACM Transactions on Audio Speech and Language Processing. 2022 Jan 1;30:2645–58.

Wang, W., et al. “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.” IEEE ACM Transactions on Audio Speech and Language Processing, vol. 30, Jan. 2022, pp. 2645–58. Scopus, doi:10.1109/TASLP.2022.3196178.

Wang W, Lin Q, Cai D, Li M. Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. IEEE ACM Transactions on Audio Speech and Language Processing. 2022 Jan 1;30:2645–2658.

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2022.3196178

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2022

Volume

Start / End Page

2645 / 2658