Skip to main content
Journal cover image

Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios

Publication ,  Journal Article
Zeng, B; Suo, H; Wan, Y; Li, M
Published in: Journal of Shanghai Jiaotong University (Science)
January 1, 2024

The common target speech separation directly estimates the target source, ignoring the interrelationship between different speakers at each frame. We propose a multiple-target speech separation (MTSS) model to simultaneously extract each speaker’s voice from the mixed speech rather than just optimally estimating the target source. Moreover, we propose a speaker diarization (SD) aware MTSS system (SD-MTSS). By exploiting the target speaker voice activity detection (TSVAD) and the estimated mask, our SD-MTSS model can extract the speech signal of each speaker concurrently in a conversational recording without additional enrollment audio in advance. Experimental results show that our MTSS model achieves improvements of 1.38 dB signal-to-distortion ratio (SDR), 1.34 dB scale-invariant signal-to-distortion ratio (SISDR), and 0.13 perceptual evaluation of speech quality (PESQ) over the baseline on the WSJ0-2mix-extr dataset, separately. The SD-MTSS system makes a 19.2% relative speaker dependent character error rate reduction on the Alimeeting dataset.

Duke Scholars

Published In

Journal of Shanghai Jiaotong University (Science)

DOI

EISSN

1995-8188

ISSN

1007-1172

Publication Date

January 1, 2024

Related Subject Headings

  • General Science & Technology
  • 4015 Maritime engineering
  • 0911 Maritime Engineering
  • 0906 Electrical and Electronic Engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zeng, B., Suo, H., Wan, Y., & Li, M. (2024). Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios. Journal of Shanghai Jiaotong University (Science). https://doi.org/10.1007/s12204-024-2739-7
Zeng, B., H. Suo, Y. Wan, and M. Li. “Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios.” Journal of Shanghai Jiaotong University (Science), January 1, 2024. https://doi.org/10.1007/s12204-024-2739-7.
Zeng B, Suo H, Wan Y, Li M. Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios. Journal of Shanghai Jiaotong University (Science). 2024 Jan 1;
Zeng, B., et al. “Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios.” Journal of Shanghai Jiaotong University (Science), Jan. 2024. Scopus, doi:10.1007/s12204-024-2739-7.
Zeng B, Suo H, Wan Y, Li M. Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios. Journal of Shanghai Jiaotong University (Science). 2024 Jan 1;
Journal cover image

Published In

Journal of Shanghai Jiaotong University (Science)

DOI

EISSN

1995-8188

ISSN

1007-1172

Publication Date

January 1, 2024

Related Subject Headings

  • General Science & Technology
  • 4015 Maritime engineering
  • 0911 Maritime Engineering
  • 0906 Electrical and Electronic Engineering