Skip to main content

Pretraining Conformer with ASR for Speaker Verification

Publication ,  Conference
Cai, D; Wang, W; Li, M; Xia, R; Huang, C
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
January 1, 2023

This paper proposes to pretrain Conformer with automatic speech recognition (ASR) task for speaker verification. Conformer combines convolution neural network (CNN) and Transformer model for modeling local and global features, respectively. Recently, multi-scale feature aggregation Conformer (MFA-Conformer) has been proposed for automatic speaker verification. MFA-Conformer concatenates frame-level outputs from all Conformer blocks for further pooling. However, our experiments show that Conformer can be easily overfitted with limited speaker recognition training data. To avoid overfitting, we propose to transfer the knowledge learned from ASR to speaker verification. Specifically, an ASR pretrained Conformer is used to initialize the training of MFA-Conformer for speaker verification. Our experiments show that pretraining Conformer with ASR leads to significant performance gains across model sizes. The best model achieves 0.48%, 0.71% and 1.54% EER on Voxceleb1-O, Voxceleb1-E, and Voxceleb1-H, respectively.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2023

Volume

2023-June
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, D., Wang, W., Li, M., Xia, R., & Huang, C. (2023). Pretraining Conformer with ASR for Speaker Verification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2023-June). https://doi.org/10.1109/ICASSP49357.2023.10096659
Cai, D., W. Wang, M. Li, R. Xia, and C. Huang. “Pretraining Conformer with ASR for Speaker Verification.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 2023-June, 2023. https://doi.org/10.1109/ICASSP49357.2023.10096659.
Cai D, Wang W, Li M, Xia R, Huang C. Pretraining Conformer with ASR for Speaker Verification. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2023.
Cai, D., et al. “Pretraining Conformer with ASR for Speaker Verification.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2023-June, 2023. Scopus, doi:10.1109/ICASSP49357.2023.10096659.
Cai D, Wang W, Li M, Xia R, Huang C. Pretraining Conformer with ASR for Speaker Verification. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2023.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2023

Volume

2023-June