Skip to main content
Journal cover image

Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors

Publication ,  Journal Article
Bone, D; Li, M; Black, MP; Narayanan, SS
Published in: Computer Speech and Language
March 1, 2014

Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external bio-chemical actions (e.g.; sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting speaker state from speech is a challenging task. In this paper, we present a system constructed with multiple representations of prosodic and spectral features that provided the best result at the Intoxication Subchallenge of Interspeech 2011 on the Alcohol Language Corpus. We discuss the details of each classifier and show that fusion improves performance. We additionally address the question of how best to construct a speaker state detection system in terms of robust and practical marginalization of associated variability such as through modeling speakers, utterance type, gender, and utterance length. As is the case in human perception, speaker normalization provides significant improvements to our system. We show that a held-out set of baseline (sober) data can be used to achieve comparable gains to other speaker normalization techniques. Our fused frame-level statistic-functional systems, fused GMM systems, and final combined system achieve unweighted average recalls (UARs) of 69.7%, 65.1%, and 68.8%, respectively, on the test set. More consistent numbers compared to development set results occur with matched-prompt training, where the UARs are 70.4%, 66.2%, and 71.4%, respectively. The combined system improves over the Challenge baseline by 5.5% absolute (8.4% relative), also improving upon our previously best result. © 2013 Elsevier Inc. All rights reserved.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

March 1, 2014

Volume

28

Issue

2

Start / End Page

375 / 391

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Bone, D., Li, M., Black, M. P., & Narayanan, S. S. (2014). Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer Speech and Language, 28(2), 375–391. https://doi.org/10.1016/j.csl.2012.09.004
Bone, D., M. Li, M. P. Black, and S. S. Narayanan. “Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors.” Computer Speech and Language 28, no. 2 (March 1, 2014): 375–91. https://doi.org/10.1016/j.csl.2012.09.004.
Bone D, Li M, Black MP, Narayanan SS. Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer Speech and Language. 2014 Mar 1;28(2):375–91.
Bone, D., et al. “Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors.” Computer Speech and Language, vol. 28, no. 2, Mar. 2014, pp. 375–91. Scopus, doi:10.1016/j.csl.2012.09.004.
Bone D, Li M, Black MP, Narayanan SS. Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer Speech and Language. 2014 Mar 1;28(2):375–391.
Journal cover image

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

March 1, 2014

Volume

28

Issue

2

Start / End Page

375 / 391

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing