Skip to main content

Multi-channel training for end-to-end speaker recognition under reverberant and noisy environment

Publication ,  Conference
Cai, D; Qin, X; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2019

Despite the significant improvements in speaker recognition enabled by deep neural networks, unsatisfactory performance persists under far-field scenarios due to the effects of the long range fading, room reverberation, and environmental noises. In this study, we focus on far-field speaker recognition with a microphone array. We propose a multi-channel training framework for the deep speaker embedding neural network on noisy and reverberant data. The proposed multi-channel training framework simultaneously processes the time-, frequency- and channel-information to learn a robust deep speaker embedding. Based on the 2-dimensional or 3-dimensional convolution layer, we investigate different multi-channel training schemes. Experiments on the simulated multi-channel reverberant and noisy data show that the proposed method obtains significant improvements over the single-channel trained deep speaker embedding system with front end speech enhancement or multichannel embedding fusion.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4365 / 4369
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, D., Qin, X., & Li, M. (2019). Multi-channel training for end-to-end speaker recognition under reverberant and noisy environment. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2019-September, pp. 4365–4369). https://doi.org/10.21437/Interspeech.2019-1437
Cai, D., X. Qin, and M. Li. “Multi-channel training for end-to-end speaker recognition under reverberant and noisy environment.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September:4365–69, 2019. https://doi.org/10.21437/Interspeech.2019-1437.
Cai D, Qin X, Li M. Multi-channel training for end-to-end speaker recognition under reverberant and noisy environment. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2019. p. 4365–9.
Cai, D., et al. “Multi-channel training for end-to-end speaker recognition under reverberant and noisy environment.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-September, 2019, pp. 4365–69. Scopus, doi:10.21437/Interspeech.2019-1437.
Cai D, Qin X, Li M. Multi-channel training for end-to-end speaker recognition under reverberant and noisy environment. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2019. p. 4365–4369.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2019

Volume

2019-September

Start / End Page

4365 / 4369