Scholars@Duke publication: Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

Publication , Journal Article

Li, M; Narayanan, S

Published in: Computer Speech and Language

January 1, 2014

This paper presents a simplified and supervised i-vector modeling approach with applications to robust and efficient language identification and speaker verification. First, by concatenating the label vector and the linear regression matrix at the end of the mean supervector and the i-vector factor loading matrix, respectively, the traditional i-vectors are extended to label-regularized supervised i-vectors. These supervised i-vectors are optimized to not only reconstruct the mean supervectors well but also minimize the mean square error between the original and the reconstructed label vectors to make the supervised i-vectors become more discriminative in terms of the label information. Second, factor analysis (FA) is performed on the pre-normalized centered GMM first order statistics supervector to ensure each gaussian component's statistics sub-vector is treated equally in the FA, which reduces the computational cost by a factor of 25 in the simplified i-vector framework. Third, since the entire matrix inversion term in the simplified i-vector extraction only depends on one single variable (total frame number), we make a global table of the resulting matrices against the frame numbers' log values. Using this lookup table, each utterance's simplified i-vector extraction is further sped up by a factor of 4 and suffers only a small quantization error. Finally, the simplified version of the supervised i-vector modeling is proposed to enhance both the robustness and efficiency. The proposed methods are evaluated on the DARPA RATS dev2 task, the NIST LRE 2007 general task and the NIST SRE 2010 female condition 5 task for noisy channel language identification, clean channel language identification and clean channel speaker verification, respectively. For language identification on the DARPA RATS, the simplified supervised i-vector modeling achieved 2%, 16%, and 7% relative equal error rate (EER) reduction on three different feature sets and sped up by a factor of more than 100 against the baseline i-vector method for the 120 s task. Similar results were observed on the NIST LRE 2007 30 s task with 7% relative average cost reduction. Results also show that the use of Gammatone frequency cepstral coefficients, Mel-frequency cepstral coefficients and spectro-temporal Gabor features in conjunction with shifted-delta-cepstral features improves the overall language identification performance significantly. For speaker verification, the proposed supervised i-vector approach outperforms the i-vector baseline by relatively 12% and 7% in terms of EER and norm old minDCF values, respectively. © 2014 Elsevier Ltd.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Computer Speech and Language

DOI

10.1016/j.csl.2014.02.004

EISSN

1095-8363

ISSN

0885-2308

Publication Date

January 1, 2014

Volume

Issue

Start / End Page

940 / 958

Related Subject Headings

Speech-Language Pathology & Audiology
46 Information and computing sciences
40 Engineering
2004 Linguistics
1702 Cognitive Sciences
0801 Artificial Intelligence and Image Processing

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech and Language, 28(4), 940–958. https://doi.org/10.1016/j.csl.2014.02.004

Li, M., and S. Narayanan. “Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification.” Computer Speech and Language 28, no. 4 (January 1, 2014): 940–58. https://doi.org/10.1016/j.csl.2014.02.004.

Li M, Narayanan S. Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech and Language. 2014 Jan 1;28(4):940–58.

Li, M., and S. Narayanan. “Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification.” Computer Speech and Language, vol. 28, no. 4, Jan. 2014, pp. 940–58. Scopus, doi:10.1016/j.csl.2014.02.004.

Published In

Computer Speech and Language

DOI

10.1016/j.csl.2014.02.004

EISSN

1095-8363

ISSN

0885-2308

Publication Date

January 1, 2014

Volume

Issue

Start / End Page

940 / 958

Related Subject Headings

Speech-Language Pathology & Audiology
46 Information and computing sciences
40 Engineering
2004 Linguistics
1702 Cognitive Sciences
0801 Artificial Intelligence and Image Processing