Scholars@Duke publication: Speech emotion recognition with dual-sequence LSTM architecture

Speech emotion recognition with dual-sequence LSTM architecture

Publication , Journal Article

Wang, J; Xue, M; Culhane, R; Diao, E; Ding, J; Tarokh, V

Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

May 1, 2020

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new duallevel model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DSLSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%-a 6% improvement over current state-of-the-art unimodal models-and is comparable with multimodal models that leverage textual information as well as audio signals.

Duke Scholars

Author Vahid Tarokh Pierre R. Lamond Department of Electrical and Computer Engin ...

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP40776.2020.9054629

ISSN

1520-6149

Publication Date

May 1, 2020

Volume

2020-May

Start / End Page

6474 / 6478

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, J., Xue, M., Culhane, R., Diao, E., Ding, J., & Tarokh, V. (2020). Speech emotion recognition with dual-sequence LSTM architecture. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2020-May, 6474–6478. https://doi.org/10.1109/ICASSP40776.2020.9054629

Wang, J., M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh. “Speech emotion recognition with dual-sequence LSTM architecture.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings 2020-May (May 1, 2020): 6474–78. https://doi.org/10.1109/ICASSP40776.2020.9054629.

Wang J, Xue M, Culhane R, Diao E, Ding J, Tarokh V. Speech emotion recognition with dual-sequence LSTM architecture. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2020 May 1;2020-May:6474–8.

Wang, J., et al. “Speech emotion recognition with dual-sequence LSTM architecture.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2020-May, May 2020, pp. 6474–78. Scopus, doi:10.1109/ICASSP40776.2020.9054629.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP40776.2020.9054629

ISSN

1520-6149

Publication Date

May 1, 2020

Volume

2020-May

Start / End Page

6474 / 6478