Scholars@Duke publication: Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization

Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization

Publication , Conference

Li, M; Wang, L; Xu, Z; Cai, D

Published in: Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017

July 2, 2017

Published version (DOI)

Electrolarynx (EL) is a speaking-aid device that helps laryngectomees who have their larynx removed to generate voice. However, the voice generated by EL is unnatural and unintelligible due to its flat pitch and strong vibration noise. Targeting these challenges, previous works show that the electrolaryngeal speech can be enhanced using Gaussian Mixture Model (GMM) based voice conversion (VC). Although effective in improving the naturalness, it degrades the intelligibility of the converted speech. To address this issue, we propose a hybrid approach using both Non-negative Matrix Factorization (NMF) and GMM methods. For better intelligibility, we apply the NMF to estimate the high quality spectral features. For better naturalness, we use the GMM with dynamic trajectory constraint to recover a smoothed F0. Additionally, to suppress the EL vibration noise, we include the 0^th MCC coefficient in the GMM-based VC. The proposed method significantly increases the F0 dynamic range, reduces vibration noise, and improves both speech naturalness and intelligibility. One hundred pairs of the normal and electrolaryngeal speech in daily mandarin are recorded as our evaluation data. Experimental results show that our proposed hybrid method reduces the mel-cepstral distortion by 7.1 dB and increases the F0 correlation coefficient to 0.54.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017

DOI

10.1109/APSIPA.2017.8282244

Publication Date

July 2, 2017

Volume

2018-February

Start / End Page

1360 / 1363

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., Wang, L., Xu, Z., & Cai, D. (2017). Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. In Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017 (Vol. 2018-February, pp. 1360–1363). https://doi.org/10.1109/APSIPA.2017.8282244

Li, M., L. Wang, Z. Xu, and D. Cai. “Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization.” In Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017, 2018-February:1360–63, 2017. https://doi.org/10.1109/APSIPA.2017.8282244.

Li M, Wang L, Xu Z, Cai D. Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. In: Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017. 2017. p. 1360–3.

Li, M., et al. “Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization.” Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017, vol. 2018-February, 2017, pp. 1360–63. Scopus, doi:10.1109/APSIPA.2017.8282244.

Li M, Wang L, Xu Z, Cai D. Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017. 2017. p. 1360–1363.

Published In

Proceedings 9th Asia Pacific Signal and Information Processing Association Annual Summit and Conference Apsipa ASC 2017

DOI

10.1109/APSIPA.2017.8282244

Publication Date

July 2, 2017

Volume

2018-February

Start / End Page

1360 / 1363