Bilevel sparse models for polyphonic music transcription

Conference Paper

In this work, we propose a trainable sparse model for automatic polyphonic music transcription, which incorporates several successful approaches into a unified optimization framework. Our model combines unsupervised synthesis models similar to latent component analysis and nonnegative factorization with metric learning techniques that allow supervised discriminative learning. We develop efficient stochastic gradient training schemes allowing unsupervised, semi-, and fully supervised training of the model as well its adaptation to test data. We show efficient fixed complexity and latency approximation that can replace iterative minimization algorithms in time-critical applications. Experimental evaluation on synthetic and real data shows promising initial results.

Duke Authors

Cited Authors

  • Yakar, TB; Litman, R; Sprechmann, P; Bronstein, A; Sapiro, G

Published Date

  • January 1, 2013

Published In

  • Proceedings of the 14th International Society for Music Information Retrieval Conference, Ismir 2013

Start / End Page

  • 65 - 70

International Standard Book Number 13 (ISBN-13)

  • 9780615900650

Citation Source

  • Scopus