Skip to main content

On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition

Publication ,  Journal Article
Cai, W; Chen, J; Zhang, J; Li, M
Published in: IEEE ACM Transactions on Audio Speech and Language Processing
January 1, 2020

In this article, our recent efforts on directly modeling utterance-level aggregation for speaker and language recognition is summarized. First, an on-the-fly data loader for efficient network training is proposed. The data loader acts as a bridge between the full-length utterances and the network. It generates mini-batch samples on the fly, which allows batch-wise variable-length training and online data augmentation. Second, the traditional dictionary learning and Baum-Welch statistical accumulation mechanisms are applied to the network structure, and a learnable dictionary encoding (LDE) layer is introduced. The former accumulates discriminative statistics from the variable-length input sequence and outputs a single fixed-dimensional utterance-level representation. Experiments were conducted on four different datasets, namely NIST LRE 2007, AP17-OLR, SITW, and NIST SRE 2016. Experimental results show the effectiveness of the proposed batch-wise variable-length training with online data augmentation and the LDE layer, which significantly outperforms the baseline methods.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2020

Volume

28

Start / End Page

1038 / 1051
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, W., Chen, J., Zhang, J., & Li, M. (2020). On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. IEEE ACM Transactions on Audio Speech and Language Processing, 28, 1038–1051. https://doi.org/10.1109/TASLP.2020.2980991
Cai, W., J. Chen, J. Zhang, and M. Li. “On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition.” IEEE ACM Transactions on Audio Speech and Language Processing 28 (January 1, 2020): 1038–51. https://doi.org/10.1109/TASLP.2020.2980991.
Cai W, Chen J, Zhang J, Li M. On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. IEEE ACM Transactions on Audio Speech and Language Processing. 2020 Jan 1;28:1038–51.
Cai, W., et al. “On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition.” IEEE ACM Transactions on Audio Speech and Language Processing, vol. 28, Jan. 2020, pp. 1038–51. Scopus, doi:10.1109/TASLP.2020.2980991.
Cai W, Chen J, Zhang J, Li M. On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. IEEE ACM Transactions on Audio Speech and Language Processing. 2020 Jan 1;28:1038–1051.

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2020

Volume

28

Start / End Page

1038 / 1051