Scholars@Duke publication: Combining five acoustic level modeling methods for automatic speaker age and gender recognition

Combining five acoustic level modeling methods for automatic speaker age and gender recognition

Publication , Conference

Li, M; Jung, CS; Han, KJ

Published in: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

January 1, 2010

This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors, (4) SVM based on GMM 'Tandem' supervectors, and (5) SVM baseline system based on the 450-dimensional feature vectors including prosodic features at the utterance level provided by the challenge organizing committee. To improve the overall classification performance, fusion of these five subsystems at the score level is performed. The proposed fusion system achieves 52.7% unweighted accuracy for the joint age-gender classification task and outperforms the GMM-MFCC system and SVM baseline, respectively, by 9.6% and 8.2% absolute improvement on the 2010 Interspeech Paralinguistic Challenge aGender database. © 2010 ISCA.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Publication Date

January 1, 2010

Start / End Page

2826 / 2829

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., Jung, C. S., & Han, K. J. (2010). Combining five acoustic level modeling methods for automatic speaker age and gender recognition. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 2826–2829).

Li, M., C. S. Jung, and K. J. Han. “Combining five acoustic level modeling methods for automatic speaker age and gender recognition.” In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2826–29, 2010.

Li M, Jung CS, Han KJ. Combining five acoustic level modeling methods for automatic speaker age and gender recognition. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 2826–9.

Li, M., et al. “Combining five acoustic level modeling methods for automatic speaker age and gender recognition.” Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2826–29.

Li M, Jung CS, Han KJ. Combining five acoustic level modeling methods for automatic speaker age and gender recognition. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 2826–2829.

Published In

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Publication Date

January 1, 2010

Start / End Page

2826 / 2829