Skip to main content

Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization

Publication ,  Conference
Li, M; Wang, L; Xu, Z; Cai, D
Published in: Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
July 2, 2017

Electrolarynx (EL) is a speaking-aid device that helps laryngectomees who have their larynx removed to generate voice. However, the voice generated by EL is unnatural and unintelligible due to its flat pitch and strong vibration noise. Targeting these challenges, previous works show that the electrolaryngeal speech can be enhanced using Gaussian Mixture Model (GMM) based voice conversion (VC). Although effective in improving the naturalness, it degrades the intelligibility of the converted speech. To address this issue, we propose a hybrid approach using both Non-negative Matrix Factorization (NMF) and GMM methods. For better intelligibility, we apply the NMF to estimate the high quality spectral features. For better naturalness, we use the GMM with dynamic trajectory constraint to recover a smoothed F0. Additionally, to suppress the EL vibration noise, we include the 0th MCC coefficient in the GMM-based VC. The proposed method significantly increases the F0 dynamic range, reduces vibration noise, and improves both speech naturalness and intelligibility. One hundred pairs of the normal and electrolaryngeal speech in daily mandarin are recorded as our evaluation data. Experimental results show that our proposed hybrid method reduces the mel-cepstral distortion by 7.1 dB and increases the F0 correlation coefficient to 0.54.

Duke Scholars

Published In

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

DOI

Publication Date

July 2, 2017

Volume

2018-February

Start / End Page

1360 / 1363
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, M., Wang, L., Xu, Z., & Cai, D. (2017). Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 (Vol. 2018-February, pp. 1360–1363). https://doi.org/10.1109/APSIPA.2017.8282244
Li, M., L. Wang, Z. Xu, and D. Cai. “Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization.” In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, 2018-February:1360–63, 2017. https://doi.org/10.1109/APSIPA.2017.8282244.
Li M, Wang L, Xu Z, Cai D. Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. In: Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. 2017. p. 1360–3.
Li, M., et al. “Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization.” Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, vol. 2018-February, 2017, pp. 1360–63. Scopus, doi:10.1109/APSIPA.2017.8282244.
Li M, Wang L, Xu Z, Cai D. Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. 2017. p. 1360–1363.

Published In

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

DOI

Publication Date

July 2, 2017

Volume

2018-February

Start / End Page

1360 / 1363