Scholars@Duke publication: Robust Multi-Channel Far-Field Speaker Verification under Different In-Domain Data Availability Scenarios

Robust Multi-Channel Far-Field Speaker Verification under Different In-Domain Data Availability Scenarios

Publication , Journal Article

Qin, X; Cai, D; Li, M

Published in: IEEE/ACM Transactions on Audio Speech and Language Processing

January 1, 2023

The popularity and application of smart home devices have made far-field speaker verification an urgent need. However, speaker verification performance is unsatisfactory under far-field environments despite its significant improvements enabled by deep neural networks (DNN). In this paper, we summarize our previous work and propose multiple training strategies and models for multi-channel far-field speaker verification with different in-domain data availability scenarios. The experiments are conducted on the FFSVC20 dataset, and we proposed the cross-device and cross-domain trials. We focus on single-channel and multi-channel speaker verification training based on the dataset. For single-channel speaker verification, considering the size of training data and availability of labels, we introduce three training scenarios and given our proposed training methods, including 1) given zero out-of-domain data and few in-domain labeled data; 2) given large-scale out-of-domain labeled data and few in-domain labeled data; 3) given large-scale out-of-domain labeled data and few in-domain unlabeled data. To this end, we propose a meta-learning approach, refined transfer learning methods, and semi-supervised learning for three scenarios, respectively. For multi-channel speaker verification, we first introduce two types of 3 dimension convolution (3D Conv) residual network (ResNet) models proposed in our previous works, including fully 3D ResNet and incorporating 3D Conv with 2D Conv ResNet (3D2D-ResNet). In this paper, we propose channel-wise 3D squeeze-and-excitation ResNet (C3DSE-ResNet) and spatial-wise 3D SE ResNet (S3DSE-ResNet) to further explore the channel dependencies and improve the 3D ConvNet performance. The results show that the proposed strategies and models can significantly boost performance under the far-field scenario.

Duke Scholars

Author Ming Li DKU Faculty

Published In

IEEE/ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2022.3212834

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2023

Volume

Start / End Page

71 / 85

Citation

APA

Chicago

ICMJE

MLA

NLM

Qin, X., Cai, D., & Li, M. (2023). Robust Multi-Channel Far-Field Speaker Verification under Different In-Domain Data Availability Scenarios. IEEE/ACM Transactions on Audio Speech and Language Processing, 31, 71–85. https://doi.org/10.1109/TASLP.2022.3212834

Qin, X., D. Cai, and M. Li. “Robust Multi-Channel Far-Field Speaker Verification under Different In-Domain Data Availability Scenarios.” IEEE/ACM Transactions on Audio Speech and Language Processing 31 (January 1, 2023): 71–85. https://doi.org/10.1109/TASLP.2022.3212834.

Qin X, Cai D, Li M. Robust Multi-Channel Far-Field Speaker Verification under Different In-Domain Data Availability Scenarios. IEEE/ACM Transactions on Audio Speech and Language Processing. 2023 Jan 1;31:71–85.

Qin, X., et al. “Robust Multi-Channel Far-Field Speaker Verification under Different In-Domain Data Availability Scenarios.” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 31, Jan. 2023, pp. 71–85. Scopus, doi:10.1109/TASLP.2022.3212834.

Published In

IEEE/ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2022.3212834

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2023

Volume

Start / End Page

71 / 85