Skip to main content

Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays

Publication ,  Conference
Cai, D; Li, M
Published in: 2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings
January 19, 2021

With the successful application of deep speaker embedding networks, the performance of speaker verification systems has significantly improved under clean and close-talking settings; however, unsatisfactory performance persists under noisy and far-field environments. This study aims at improving the performance of far-field speaker verification systems with distributed microphone arrays in the smart home scenario. The proposed learning framework consists of two modules: a deep speaker embedding module and an aggregation module. The former extracts a speaker embedding for each recording. The latter, based on either averaged pooling or attentive pooling, aggregates speaker embeddings and learns a unified representation for all recordings captured by distributed microphone arrays. The two modules are trained in an end-to-end manner. To evaluate this framework, we conduct experiments on the real text-dependent far-field datasets Hi Mia. Results show that our framework outperforms the naive averaged aggregation methods by 20% in terms of equal error rate (EER) with six distributed microphone arrays. Also, we find that the attention-based aggregation advocates high-quality recordings and repels low-quality ones.

Duke Scholars

Published In

2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings

DOI

Publication Date

January 19, 2021

Start / End Page

308 / 315
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, D., & Li, M. (2021). Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. In 2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings (pp. 308–315). https://doi.org/10.1109/SLT48900.2021.9383501
Cai, D., and M. Li. “Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays.” In 2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings, 308–15, 2021. https://doi.org/10.1109/SLT48900.2021.9383501.
Cai D, Li M. Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. In: 2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings. 2021. p. 308–15.
Cai, D., and M. Li. “Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays.” 2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings, 2021, pp. 308–15. Scopus, doi:10.1109/SLT48900.2021.9383501.
Cai D, Li M. Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. 2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings. 2021. p. 308–315.

Published In

2021 IEEE Spoken Language Technology Workshop Slt 2021 Proceedings

DOI

Publication Date

January 19, 2021

Start / End Page

308 / 315