Skip to main content

On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition

Publication ,  Journal Article
Cai, W; Chen, J; Zhang, J; Li, M
Published in: IEEE/ACM Transactions on Audio Speech and Language Processing
January 1, 2020

In this article, our recent efforts on directly modeling utterance-level aggregation for speaker and language recognition is summarized. First, an on-the-fly data loader for efficient network training is proposed. The data loader acts as a bridge between the full-length utterances and the network. It generates mini-batch samples on the fly, which allows batch-wise variable-length training and online data augmentation. Second, the traditional dictionary learning and Baum-Welch statistical accumulation mechanisms are applied to the network structure, and a learnable dictionary encoding (LDE) layer is introduced. The former accumulates discriminative statistics from the variable-length input sequence and outputs a single fixed-dimensional utterance-level representation. Experiments were conducted on four different datasets, namely NIST LRE 2007, AP17-OLR, SITW, and NIST SRE 2016. Experimental results show the effectiveness of the proposed batch-wise variable-length training with online data augmentation and the LDE layer, which significantly outperforms the baseline methods.

Duke Scholars

Published In

IEEE/ACM Transactions on Audio Speech and Language Processing

DOI

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2020

Volume

28

Start / End Page

1038 / 1051
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, W., Chen, J., Zhang, J., & Li, M. (2020). On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. IEEE/ACM Transactions on Audio Speech and Language Processing, 28, 1038–1051. https://doi.org/10.1109/TASLP.2020.2980991
Cai, W., J. Chen, J. Zhang, and M. Li. “On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition.” IEEE/ACM Transactions on Audio Speech and Language Processing 28 (January 1, 2020): 1038–51. https://doi.org/10.1109/TASLP.2020.2980991.
Cai W, Chen J, Zhang J, Li M. On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. IEEE/ACM Transactions on Audio Speech and Language Processing. 2020 Jan 1;28:1038–51.
Cai, W., et al. “On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition.” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 28, Jan. 2020, pp. 1038–51. Scopus, doi:10.1109/TASLP.2020.2980991.
Cai W, Chen J, Zhang J, Li M. On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. IEEE/ACM Transactions on Audio Speech and Language Processing. 2020 Jan 1;28:1038–1051.

Published In

IEEE/ACM Transactions on Audio Speech and Language Processing

DOI

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2020

Volume

28

Start / End Page

1038 / 1051