Scholars@Duke publication: Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling

Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling

Publication , Conference

Li, M; Metallinou, A; Bone, D; Narayanan, S

Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

October 23, 2012

This paper presents an automatic speaker state recognition approach which models the factor vectors in the latent factor analysis framework improving upon the Gaussian Mixture Model (GMM) baseline performance. We investigate both intoxicated and affective speaker states. We consider the affective speech signal as the original normal average speech signal being corrupted by the affective channel effects. Rather than reducing the channel variability to enhance the robustness as in the speaker verification task, we directly model the speaker state on the channel factors under the factor analysis framework. In this work, the speaker state factor vectors are extracted and modeled by the latent factor analysis approach in the GMM modeling framework and support vector machine classification method. Experimental results show that the proposed speaker state factor vector modeling system achieved 5.34% and 1.49% unweighted accuracy improvement over the GMM baseline on the intoxicated speech detection task (Alcohol Language Corpus) and the emotion recognition task (IEMOCAP database), respectively. © 2012 IEEE.

Duke Scholars

Author Ming Li DKU Faculty

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2012.6288284

ISSN

1520-6149

Publication Date

October 23, 2012

Start / End Page

1937 / 1940

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., Metallinou, A., Bone, D., & Narayanan, S. (2012). Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (pp. 1937–1940). https://doi.org/10.1109/ICASSP.2012.6288284

Li, M., A. Metallinou, D. Bone, and S. Narayanan. “Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1937–40, 2012. https://doi.org/10.1109/ICASSP.2012.6288284.

Li M, Metallinou A, Bone D, Narayanan S. Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2012. p. 1937–40.

Li, M., et al. “Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2012, pp. 1937–40. Scopus, doi:10.1109/ICASSP.2012.6288284.

Li M, Metallinou A, Bone D, Narayanan S. Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2012. p. 1937–1940.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2012.6288284

ISSN

1520-6149

Publication Date

October 23, 2012

Start / End Page

1937 / 1940