Skip to main content
Journal cover image

Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

Publication ,  Journal Article
Li, M; Liu, L; Cai, W; Liu, W
Published in: Journal of Signal Processing Systems
February 1, 2016

This paper presents a generalized i-vector representation framework with phonetic tokenization and tandem features for text independent as well as text dependent speaker verification. In the conventional i-vector framework, the tokens for calculating the zero-order and first-order Baum-Welch statistics are Gaussian Mixture Model (GMM) components trained from acoustic level MFCC features. Yet besides MFCC, we believe that phonetic information makes another direction that can benefit the system performance. Our contribution in this paper lies in integrating phonetic information into the i-vector representation by several extensions, forming a more generalized i-vector framework. First, the tokens for calculating the zero-order statistics is extended from the MFCC trained GMM components to phonetic phonemes, trigrams and tandem feature trained GMM components, using phoneme posterior probabilities. Second, given the zero-order statistics (posterior probabilities on tokens), the feature used to calculate the first-order statistics is also extended from MFCC to tandem feature, and is not necessarily the same feature employed by the tokenizer. Third, the zero-order and first-order statistics vectors are then concatenated and represented by the simplified supervised i-vector approach followed by the standard Probabilistic Linear Discriminant Analysis (PLDA) back-end. We study different token and feature combinations, and we show that the feature level fusion of acoustic level MFCC features and phonetic level tandem features with GMM based i-vector representation achieves the best performance for text independent speaker verification. Furthermore, we demonstrate that the phonetic level phoneme constraints introduced by the tandem features help the text dependent speaker verification system to reject wrong password trials and improve the performance dramatically. Experimental results are reported on the NIST SRE 2010 common condition 5 female part task and the RSR 2015 part 1 female part task for text independent and text dependent speaker verification, respectively. For the text independent speaker verification task, the proposed generalized i-vector representation outperforms the i-vector baseline by relatively 53 % in terms of equal error rate (EER) and norm minDCF values. For the text dependent speaker verification task, our proposed approach also reduced the EER significantly from 23 % to 90 % relatively for different types of trials.

Duke Scholars

Published In

Journal of Signal Processing Systems

DOI

EISSN

1939-8115

ISSN

1939-8018

Publication Date

February 1, 2016

Volume

82

Issue

2

Start / End Page

207 / 215

Related Subject Headings

  • Networking & Telecommunications
  • Computer Hardware & Architecture
  • 4611 Machine learning
  • 4008 Electrical engineering
  • 4006 Communications engineering
  • 0906 Electrical and Electronic Engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, M., Liu, L., Cai, W., & Liu, W. (2016). Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification. Journal of Signal Processing Systems, 82(2), 207–215. https://doi.org/10.1007/s11265-015-1019-z
Li, M., L. Liu, W. Cai, and W. Liu. “Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification.” Journal of Signal Processing Systems 82, no. 2 (February 1, 2016): 207–15. https://doi.org/10.1007/s11265-015-1019-z.
Li, M., et al. “Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification.” Journal of Signal Processing Systems, vol. 82, no. 2, Feb. 2016, pp. 207–15. Scopus, doi:10.1007/s11265-015-1019-z.
Journal cover image

Published In

Journal of Signal Processing Systems

DOI

EISSN

1939-8115

ISSN

1939-8018

Publication Date

February 1, 2016

Volume

82

Issue

2

Start / End Page

207 / 215

Related Subject Headings

  • Networking & Telecommunications
  • Computer Hardware & Architecture
  • 4611 Machine learning
  • 4008 Electrical engineering
  • 4006 Communications engineering
  • 0906 Electrical and Electronic Engineering