Scholars@Duke publication: Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion

Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion

Publication , Conference

Zhou, Z; Xu, Y; Li, M

Published in: Communications in Computer and Information Science

January 1, 2023

Textual escalation detection has been widely applied to e-commerce companies’ customer service systems to pre-alert and prevent potential conflicts. Similarly, acoustic-based escalation detection systems are also helpful in enhancing passengers’ safety and maintaining public order in public areas such as airports and train stations, where many impersonal conversations frequently occur. To this end, we introduce a multimodal system based on acoustic-linguistic features to detect escalation levels from human speech. Voice Activity Detection (VAD) and Label Smoothing are adopted to enhance the performance of this task further. Given the difficulty and high cost of data collection in open scenarios, the datasets we used in this task are subject to severe low resource constraints. To address this problem, we introduce transfer learning using a multi-corpus framework involving emotion detection datasets such as RAVDESS and CREMA-D to integrate emotion features into escalation signals representation learning. On the development set, our proposed system achieves 81.5% unweighted average recall (UAR), which significantly outperforms the baseline of 72.2%.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Communications in Computer and Information Science

DOI

10.1007/978-981-99-2401-1_14

EISSN

1865-0937

ISSN

1865-0929

Publication Date

January 1, 2023

Volume

1765 CCIS

Start / End Page

149 / 161

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhou, Z., Xu, Y., & Li, M. (2023). Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion. In Communications in Computer and Information Science (Vol. 1765 CCIS, pp. 149–161). https://doi.org/10.1007/978-981-99-2401-1_14

Zhou, Z., Y. Xu, and M. Li. “Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion.” In Communications in Computer and Information Science, 1765 CCIS:149–61, 2023. https://doi.org/10.1007/978-981-99-2401-1_14.

Zhou Z, Xu Y, Li M. Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion. In: Communications in Computer and Information Science. 2023. p. 149–61.

Zhou, Z., et al. “Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion.” Communications in Computer and Information Science, vol. 1765 CCIS, 2023, pp. 149–61. Scopus, doi:10.1007/978-981-99-2401-1_14.

Zhou Z, Xu Y, Li M. Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion. Communications in Computer and Information Science. 2023. p. 149–161.

Published In

Communications in Computer and Information Science

DOI

10.1007/978-981-99-2401-1_14

EISSN

1865-0937

ISSN

1865-0929

Publication Date

January 1, 2023

Volume

1765 CCIS

Start / End Page

149 / 161