Unsupervised learning of vowels from continuous speech based on self-organized phoneme acquisition model
All normal humans can acquire their native phoneme systems simply by living in their native language environment. However, it is unclear as to how infants learn the acoustic expression of each phoneme of their native languages. In recent studies, researchers have inspected phoneme acquisition by using a computational model. However, these studies have used read speech that has a limited vocabulary as input and do not handle a continuous speech that is almost comparable to a natural environment. Therefore, in this study, we use natural continuous speech and build a self-organization model that simulates the cognitive ability of the humans, and we analyze the quality and quantity of the speech information that is necessary for the acquisition of the native vowel system. Our model is designed to learn values of the acoustic characteristic of a natural continuous speech and to estimate the number and boundaries of the vowel categories without using explicit instructions. In the simulation trial, we investigate the relationship between the quantity of learning and the accuracy for the vowels in a single Japanese speaker's natural speech. As a result, it is found that the vowel recognition accuracy of our model is comparable to that of an adult. © 2010 ISCA.
Miyazawa, K; Kikuchi, H; Mazuka, R
Proceedings of the 11th Annual Conference of the International Speech Communication Association, Interspeech 2010
Start / End Page