Skip to main content

Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays

Publication ,  Conference
Cai, D; Li, M
Published in: 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
January 19, 2021

With the successful application of deep speaker embedding networks, the performance of speaker verification systems has significantly improved under clean and close-talking settings; however, unsatisfactory performance persists under noisy and far-field environments. This study aims at improving the performance of far-field speaker verification systems with distributed microphone arrays in the smart home scenario. The proposed learning framework consists of two modules: a deep speaker embedding module and an aggregation module. The former extracts a speaker embedding for each recording. The latter, based on either averaged pooling or attentive pooling, aggregates speaker embeddings and learns a unified representation for all recordings captured by distributed microphone arrays. The two modules are trained in an end-to-end manner. To evaluate this framework, we conduct experiments on the real text-dependent far-field datasets Hi Mia. Results show that our framework outperforms the naive averaged aggregation methods by 20% in terms of equal error rate (EER) with six distributed microphone arrays. Also, we find that the attention-based aggregation advocates high-quality recordings and repels low-quality ones.

Duke Scholars

Published In

2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

DOI

Publication Date

January 19, 2021

Start / End Page

308 / 315
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, D., & Li, M. (2021). Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. In 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings (pp. 308–315). https://doi.org/10.1109/SLT48900.2021.9383501
Cai, D., and M. Li. “Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays.” In 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings, 308–15, 2021. https://doi.org/10.1109/SLT48900.2021.9383501.
Cai D, Li M. Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. In: 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings. 2021. p. 308–15.
Cai, D., and M. Li. “Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays.” 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings, 2021, pp. 308–15. Scopus, doi:10.1109/SLT48900.2021.9383501.
Cai D, Li M. Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings. 2021. p. 308–315.

Published In

2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

DOI

Publication Date

January 19, 2021

Start / End Page

308 / 315