Skip to main content

Speech emotion recognition with dual-sequence LSTM architecture

Publication ,  Journal Article
Wang, J; Xue, M; Culhane, R; Diao, E; Ding, J; Tarokh, V
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
May 1, 2020

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new duallevel model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DSLSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%-a 6% improvement over current state-of-the-art unimodal models-and is comparable with multimodal models that leverage textual information as well as audio signals.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

May 1, 2020

Volume

2020-May

Start / End Page

6474 / 6478
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, J., Xue, M., Culhane, R., Diao, E., Ding, J., & Tarokh, V. (2020). Speech emotion recognition with dual-sequence LSTM architecture. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May, 6474–6478. https://doi.org/10.1109/ICASSP40776.2020.9054629
Wang, J., M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh. “Speech emotion recognition with dual-sequence LSTM architecture.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2020-May (May 1, 2020): 6474–78. https://doi.org/10.1109/ICASSP40776.2020.9054629.
Wang J, Xue M, Culhane R, Diao E, Ding J, Tarokh V. Speech emotion recognition with dual-sequence LSTM architecture. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2020 May 1;2020-May:6474–8.
Wang, J., et al. “Speech emotion recognition with dual-sequence LSTM architecture.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2020-May, May 2020, pp. 6474–78. Scopus, doi:10.1109/ICASSP40776.2020.9054629.
Wang J, Xue M, Culhane R, Diao E, Ding J, Tarokh V. Speech emotion recognition with dual-sequence LSTM architecture. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2020 May 1;2020-May:6474–6478.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

May 1, 2020

Volume

2020-May

Start / End Page

6474 / 6478