Scholars@Duke publication: Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues

Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues

Publication , Conference

Zeng, B; Suo, H; Wan, Y; Li, M

Published in: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

January 1, 2023

Published version (DOI)

Most multi-channel speaker extraction schemes use the target speaker's location information as a reference, which must be known in advance or derived from visual cues. In addition, memory and computation costs are enormous when the model deals with the fusion input. In this paper, we propose Speaker-extraction-and-filter Network (SeafNet), which is a low-complexity multi-channel speaker extraction network with only speech cues. Specifically, the SeafNet separates the mixture by utilizing the correlation between an estimation of target speaker on reference channel and the mixed input on rest channels. Experimental results show that compared with the baseline, the SeafNet model achieves 6.4% relative SISNRi improvement on the fixed geometry array and 8.9% average relative SISNRi improvement on the ad-hoc array. Meanwhile, the SeafNet achieves 60% relative reduction in the number of parameters and 42% relative reduction in the computational cost.

Duke Scholars

Author Ming Li DKU Faculty

Published In

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

DOI

10.1109/APSIPAASC58517.2023.10317330

Publication Date

January 1, 2023

Start / End Page

114 / 118

Citation

APA

Chicago

ICMJE

MLA

NLM

Zeng, B., Suo, H., Wan, Y., & Li, M. (2023). Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 (pp. 114–118). https://doi.org/10.1109/APSIPAASC58517.2023.10317330

Zeng, B., H. Suo, Y. Wan, and M. Li. “Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues.” In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023, 114–18, 2023. https://doi.org/10.1109/APSIPAASC58517.2023.10317330.

Zeng B, Suo H, Wan Y, Li M. Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues. In: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023. 2023. p. 114–8.

Zeng, B., et al. “Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues.” 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023, 2023, pp. 114–18. Scopus, doi:10.1109/APSIPAASC58517.2023.10317330.

Zeng B, Suo H, Wan Y, Li M. Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues. 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023. 2023. p. 114–118.

Published In

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

DOI

10.1109/APSIPAASC58517.2023.10317330

Publication Date

January 1, 2023

Start / End Page

114 / 118