Skip to main content
Journal cover image

Automatic speaker age and gender recognition using acoustic and prosodic level information fusion

Publication ,  Journal Article
Li, M; Han, KJ; Narayanan, S
Published in: Computer Speech and Language
January 1, 2013

The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods at both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture model (GMM) based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors and (3) SVM based on 450-dimensional utterance level features including acoustic, prosodic and voice quality information. In addition, we propose four subsystems: (1) SVM based on UBM weight posterior probability supervectors using the Bhattacharyya probability product kernel, (2) Sparse representation based on UBM weight posterior probability supervectors, (3) SVM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors and (4) SVM based on the polynomial expansion coefficients of the syllable level prosodic feature contours in voiced speech segments. Contours of pitch, time domain energy, frequency domain harmonic structure energy and formant for each syllable (segmented using energy information in the voiced speech segment) are considered for analysis in subsystem (4). The proposed four subsystems have been demonstrated to be effective and able to achieve competitive results in classifying different age and gender groups. To further improve the overall classification performance, weighted summation based fusion of these seven subsystems at the score level is demonstrated. Experiment results are reported on the development and test set of the 2010 Interspeech Paralinguistic Challenge aGender database. Compared to the SVM baseline system (3), which is the baseline system suggested by the challenge committee, the proposed fusion system achieves 5.6 absolute improvement in unweighted accuracy for the age task and 4.2 for the gender task on the development set. On the final test set, we obtain 3.1 and 3.8 absolute improvement, respectively. © 2012 Elsevier Ltd. All rights reserved.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

January 1, 2013

Volume

27

Issue

1

Start / End Page

151 / 167

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, M., Han, K. J., & Narayanan, S. (2013). Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Computer Speech and Language, 27(1), 151–167. https://doi.org/10.1016/j.csl.2012.01.008
Li, M., K. J. Han, and S. Narayanan. “Automatic speaker age and gender recognition using acoustic and prosodic level information fusion.” Computer Speech and Language 27, no. 1 (January 1, 2013): 151–67. https://doi.org/10.1016/j.csl.2012.01.008.
Li M, Han KJ, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Computer Speech and Language. 2013 Jan 1;27(1):151–67.
Li, M., et al. “Automatic speaker age and gender recognition using acoustic and prosodic level information fusion.” Computer Speech and Language, vol. 27, no. 1, Jan. 2013, pp. 151–67. Scopus, doi:10.1016/j.csl.2012.01.008.
Li M, Han KJ, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Computer Speech and Language. 2013 Jan 1;27(1):151–167.
Journal cover image

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

January 1, 2013

Volume

27

Issue

1

Start / End Page

151 / 167

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing