Scholars@Duke publication: Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens

Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens

Publication , Conference

Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

January 1, 2014

This paper presents an automatic speaker physical load recognition approach using posterior probability based features from acoustic and phonetic tokens. In this method, the tokens for calculating the posterior probability or zero-order statistics are extended from the conventional MFCC trained Gaussian Mixture Models (GMM) components to parallel phonetic phonemes and tandem feature trained GMM components. Phoneme recognizers from five different languages are employed to extract the phoneme posterior probabilities. We show that these histogram style features at both the acoustic and phonetic levels are effective and complementary for capturing the speaker physical load information from short utterances. Support vector machine is adopted as the supervised classifier. By combining the proposed methods with the OpenSMILE baseline which covers the acoustic and prosodic information further improves the final performance. The proposed fusion system achieves 70.18% and 72.81% unweighted accuracy on the validation and test set of the Munich Bio-voice Corpus for the binary physical load level recognition task in the INTERSPEECH 2014 Computational Paralinguistics Challenge.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

10.21437/interspeech.2014-106

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2014

Start / End Page

437 / 441

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M. (2014). Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 437–441). https://doi.org/10.21437/interspeech.2014-106

Li, M. “Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 437–41, 2014. https://doi.org/10.21437/interspeech.2014-106.

Li M. Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2014. p. 437–41.

Li, M. “Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2014, pp. 437–41. Scopus, doi:10.21437/interspeech.2014-106.

Li M. Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2014. p. 437–441.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

10.21437/interspeech.2014-106

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2014

Start / End Page

437 / 441