Vocal development in a large-scale crosslinguistic corpus.

Journal Article (Journal Article)

This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio recordings of 49 children (1-36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., "ba" vs. "ee"). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Furthermore, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter-annotator reliability on the crowdsourcing platform, relative to more traditional in-lab expert annotators, means that a larger number of unique annotators and/or annotations are required, and that crowdsourcing may not be a suitable method for more fine-grained annotation decisions. Audio clips used for this project are compiled into a large-scale infant vocalization corpus that is available for other researchers to use in future work.

Full Text

Duke Authors

Cited Authors

  • Cychosz, M; Cristia, A; Bergelson, E; Casillas, M; Baudet, G; Warlaumont, AS; Scaff, C; Yankowitz, L; Seidl, A

Published Date

  • September 2021

Published In

Volume / Issue

  • 24 / 5

Start / End Page

  • e13090 -

PubMed ID

  • 33497512

Pubmed Central ID

  • PMC8310893

Electronic International Standard Serial Number (EISSN)

  • 1467-7687

International Standard Serial Number (ISSN)

  • 1363-755X

Digital Object Identifier (DOI)

  • 10.1111/desc.13090


  • eng