Unsupervised Temporal Analysis of Mouse Vocalizations
Mice communicate using ultrasonic vocalizations (USVs) that vary according to parameters such as sex, genetic background, and environmental stimuli. The study of USVs production provides useful models of the underlying neurobiology mechanisms of human speech and many methods exist to detect USVs in mice recordings. In order to achieve temporal analysis of these vocalizations, one must first group them into categories. This grouping of USVs is a rather demanding task considering the high volume of USVs even in small recordings. Most existing tools can recognize a predefined number of categories and offer no temporal analysis capabilities. In this work, we used the open-source software Analysis of Mouse VOcal Communication (AMVOC) for USVs detection and propose an unsupervised learning approach based on features extracted from a Convolutional Autoencoder (CAE). For the evaluation of the CAE approach we built a benchmark dataset. Using USVs transition matrices we propose three metrics that quantity differences in the temporal structure between different recordings. We evaluate these metrics using a dataset from mice with a FoxP2 mutation, a gene involved in speech function. In this way, a researcher can perform batch comparisons of the temporal structure of recordings, extract insights and identify differences in syntax composition prior to more thorough analysis.