Real-time online singing voice separation from monaural recordings using robust low-rank modeling


Conference Paper

Separating the leading vocals from the musical accompaniment is a challenging task that appears naturally in several music processing applications. Robust principal component analysis (RPCA) has been recently employed to this problem producing very successful results. The method decomposes the signal into a low-rank component corresponding to the accompaniment with its repetitive structure, and a sparse component corresponding to the voice with its quasi-harmonic structure. In this paper we first introduce a non-negative variant of RPCA, termed as robust low-rank non-negative matrix factorization (RNMF). This new framework better suits audio applications. We then propose two efficient feed-forward architectures that approximate the RPCA and RNMF with low latency and a fraction of the complexity of the original optimization method. These approximants allow incorporating elements of unsupervised, semi- and fully-supervised learning into the RPCA and RNMF frameworks. Our basic implementation shows several orders of magnitude speedup compared to the exact solvers with no performance degradation, and allows online and faster-than-real-time processing. Evaluation on the MIR-1K dataset demonstrates state-of-the-art performance. © 2012 International Society for Music Information Retrieval.

Duke Authors

Cited Authors

  • Sprechmann, P; Bronstein, A; Sapiro, G

Published Date

  • December 1, 2012

Published In

  • Proceedings of the 13th International Society for Music Information Retrieval Conference, Ismir 2012

Start / End Page

  • 67 - 72

International Standard Book Number 13 (ISBN-13)

  • 9789727521449

Citation Source

  • Scopus