Skip to main content

Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues

Publication ,  Conference
Zeng, B; Suo, H; Wan, Y; Li, M
Published in: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
January 1, 2023

Most multi-channel speaker extraction schemes use the target speaker's location information as a reference, which must be known in advance or derived from visual cues. In addition, memory and computation costs are enormous when the model deals with the fusion input. In this paper, we propose Speaker-extraction-and-filter Network (SeafNet), which is a low-complexity multi-channel speaker extraction network with only speech cues. Specifically, the SeafNet separates the mixture by utilizing the correlation between an estimation of target speaker on reference channel and the mixed input on rest channels. Experimental results show that compared with the baseline, the SeafNet model achieves 6.4% relative SISNRi improvement on the fixed geometry array and 8.9% average relative SISNRi improvement on the ad-hoc array. Meanwhile, the SeafNet achieves 60% relative reduction in the number of parameters and 42% relative reduction in the computational cost.

Duke Scholars

Published In

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

DOI

Publication Date

January 1, 2023

Start / End Page

114 / 118
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zeng, B., Suo, H., Wan, Y., & Li, M. (2023). Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 (pp. 114–118). https://doi.org/10.1109/APSIPAASC58517.2023.10317330
Zeng, B., H. Suo, Y. Wan, and M. Li. “Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues.” In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023, 114–18, 2023. https://doi.org/10.1109/APSIPAASC58517.2023.10317330.
Zeng B, Suo H, Wan Y, Li M. Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues. In: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023. 2023. p. 114–8.
Zeng, B., et al. “Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues.” 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023, 2023, pp. 114–18. Scopus, doi:10.1109/APSIPAASC58517.2023.10317330.
Zeng B, Suo H, Wan Y, Li M. Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues. 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023. 2023. p. 114–118.

Published In

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

DOI

Publication Date

January 1, 2023

Start / End Page

114 / 118