Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization
Electrolarynx (EL) is a speaking-aid device that helps laryngectomees who have their larynx removed to generate voice. However, the voice generated by EL is unnatural and unintelligible due to its flat pitch and strong vibration noise. Targeting these challenges, previous works show that the electrolaryngeal speech can be enhanced using Gaussian Mixture Model (GMM) based voice conversion (VC). Although effective in improving the naturalness, it degrades the intelligibility of the converted speech. To address this issue, we propose a hybrid approach using both Non-negative Matrix Factorization (NMF) and GMM methods. For better intelligibility, we apply the NMF to estimate the high quality spectral features. For better naturalness, we use the GMM with dynamic trajectory constraint to recover a smoothed F