Scholars@Duke publication: Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion

Publication , Journal Article

Yang, Y; Zhang, H; Cai, Z; Shi, Y; Li, M; Zhang, D; Ding, X; Deng, J; Wang, J

Published in: Biomedical Signal Processing and Control

February 1, 2023

An electrolarynx (EL) is a medical device that generates speech for people who lost their biological larynx. However, EL speech signals are unnatural and unintelligible due to the monotonous pitch and the mechanical excitation of the EL device. This paper proposes an end-to-end voice conversion method to enhance EL speech. We adopt a speaker-independent automatic speech recognition model to extract bottleneck features as the intermediate phonetic features for enhancement. Our system includes two stages: the bottleneck feature vectors of the EL speech are mapped by a parallel non-autoregressive model to the corresponding feature vectors of the normal speech in stage one. Then another voice conversion model maps normal speech's bottleneck feature vectors directly to normal speech's Mel-spectrogram in stage two, followed by a MelGAN-based vocoder to convert the Mel-spectrogram into waveform. In addition, we incorporate data augmentation and transfer learning to improve conversion performance. Experimental results show that the proposed method outperforms our baseline methods and performs well in terms of naturalness and intelligibility. The audio samples are available online.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Biomedical Signal Processing and Control

DOI

10.1016/j.bspc.2022.104279

EISSN

1746-8108

ISSN

1746-8094

Publication Date

February 1, 2023

Volume

Related Subject Headings

Biomedical Engineering
4003 Biomedical engineering
3006 Food sciences
1004 Medical Biotechnology
0906 Electrical and Electronic Engineering
0903 Biomedical Engineering

Citation

APA

Chicago

ICMJE

MLA

NLM

Yang, Y., Zhang, H., Cai, Z., Shi, Y., Li, M., Zhang, D., … Wang, J. (2023). Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion. Biomedical Signal Processing and Control, 80. https://doi.org/10.1016/j.bspc.2022.104279

Yang, Y., H. Zhang, Z. Cai, Y. Shi, M. Li, D. Zhang, X. Ding, J. Deng, and J. Wang. “Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion.” Biomedical Signal Processing and Control 80 (February 1, 2023). https://doi.org/10.1016/j.bspc.2022.104279.

Yang Y, Zhang H, Cai Z, Shi Y, Li M, Zhang D, et al. Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion. Biomedical Signal Processing and Control. 2023 Feb 1;80.

Yang, Y., et al. “Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion.” Biomedical Signal Processing and Control, vol. 80, Feb. 2023. Scopus, doi:10.1016/j.bspc.2022.104279.

Yang Y, Zhang H, Cai Z, Shi Y, Li M, Zhang D, Ding X, Deng J, Wang J. Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion. Biomedical Signal Processing and Control. 2023 Feb 1;80.

Published In

Biomedical Signal Processing and Control

DOI

10.1016/j.bspc.2022.104279

EISSN

1746-8108

ISSN

1746-8094

Publication Date

February 1, 2023

Volume

Related Subject Headings

Biomedical Engineering
4003 Biomedical engineering
3006 Food sciences
1004 Medical Biotechnology
0906 Electrical and Electronic Engineering
0903 Biomedical Engineering