Skip to main content

SEF-Net: Speaker Embedding Free Target Speaker Extraction Network

Publication ,  Conference
Zeng, B; Suo, H; Wan, Y; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2023

Most target speaker extraction methods use the target speaker embedding as reference information. However, the speaker embedding extracted by a speaker recognition module may not be optimal for the target speaker extraction tasks. In this paper, we proposes Speaker Embedding Free target speaker extraction Network (SEF-Net), a novel target speaker extraction model without relying on speaker embedding. SEF-Net uses cross multi-head attention in the transformer decoder to implicitly utilize the speaker information in the reference speech's conformer encoding outputs. Experimental results show that our proposed model achieves comparable performance to other target speaker extraction models. SEF-Net provides a feasible new solution to perform target speaker extraction without using a speaker embedding extractor or speaker recognition loss function.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2023

Volume

2023-August

Start / End Page

3452 / 3456
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zeng, B., Suo, H., Wan, Y., & Li, M. (2023). SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2023-August, pp. 3452–3456). https://doi.org/10.21437/Interspeech.2023-1749
Zeng, B., H. Suo, Y. Wan, and M. Li. “SEF-Net: Speaker Embedding Free Target Speaker Extraction Network.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023-August:3452–56, 2023. https://doi.org/10.21437/Interspeech.2023-1749.
Zeng B, Suo H, Wan Y, Li M. SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2023. p. 3452–6.
Zeng, B., et al. “SEF-Net: Speaker Embedding Free Target Speaker Extraction Network.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2023-August, 2023, pp. 3452–56. Scopus, doi:10.21437/Interspeech.2023-1749.
Zeng B, Suo H, Wan Y, Li M. SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2023. p. 3452–3456.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2023

Volume

2023-August

Start / End Page

3452 / 3456