Skip to main content

Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization

Publication ,  Journal Article
Wang, W; Lin, Q; Cai, D; Li, M
Published in: IEEE/ACM Transactions on Audio Speech and Language Processing
January 1, 2022

In this paper, we propose a neural-network-based similarity measurement method to learn the similarity between any two speaker embeddings, where both previous and future contexts are considered. Moreover, we propose the segmental pooling strategy and jointly train the speaker embedding network along with the similarity measurement model. Later, this joint training framework is further extended to the target-speaker voice activity detection (TS-VAD), with only slight modification in the network architecture. Experimental results of the DIHARD II, DIHARD III and VoxConverse datasets show that our clustering-based system with the neural similarity measurement achieves superior performance to recent approaches on all three datasets. In addition, the segment-level TS-VAD method further improves the clustering-based results and achieves DER of 16.48%, 11.62% and 4.39% on the DIHARD II, DIHARD III and VoxConverse datasets, respectively.

Duke Scholars

Published In

IEEE/ACM Transactions on Audio Speech and Language Processing

DOI

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2022

Volume

30

Start / End Page

2645 / 2658
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, W., Lin, Q., Cai, D., & Li, M. (2022). Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 2645–2658. https://doi.org/10.1109/TASLP.2022.3196178
Wang, W., Q. Lin, D. Cai, and M. Li. “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.” IEEE/ACM Transactions on Audio Speech and Language Processing 30 (January 1, 2022): 2645–58. https://doi.org/10.1109/TASLP.2022.3196178.
Wang W, Lin Q, Cai D, Li M. Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. IEEE/ACM Transactions on Audio Speech and Language Processing. 2022 Jan 1;30:2645–58.
Wang, W., et al. “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 30, Jan. 2022, pp. 2645–58. Scopus, doi:10.1109/TASLP.2022.3196178.
Wang W, Lin Q, Cai D, Li M. Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. IEEE/ACM Transactions on Audio Speech and Language Processing. 2022 Jan 1;30:2645–2658.

Published In

IEEE/ACM Transactions on Audio Speech and Language Processing

DOI

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2022

Volume

30

Start / End Page

2645 / 2658