Rate-distortion bounds on Bayes risk in supervised learning

Conference Paper

An information-theoretic framework is presented for estimating the number of labeled samples needed to train a classifier in a parametric Bayesian setting. Ideas from rate-distortion theory are used to derive bounds for the average L1 or L∞ distance between the learned classifier and the true maximum a posteriori classifier in terms of familiar information-theoretic quantities and the number of training samples available. The maximum a posteriori classifier is viewed as a random source, labeled training data are viewed as a finite-rate encoding of the source, and the L1 or L∞ Bayes risk is viewed as the average distortion. The result is a framework dual to the well-known probably approximately correct (PAC) framework. PAC bounds characterize worst-case learning performance of a family of classifiers whose complexity is captured by the Vapnik-Chervonenkis (VC) dimension. The rate-distortion framework, on the other hand, characterizes the average-case performance of a family of data distributions in terms of a quantity called the interpolation dimension, which represents the complexity of the family of data distributions. The resulting bounds do not suffer from the pessimism typical of the PAC framework, particularly when the training set is small.

Full Text

Duke Authors

Cited Authors

  • Nokleby, M; Beirami, A; Calderbank, R

Published Date

  • August 10, 2016

Published In

Volume / Issue

  • 2016-August /

Start / End Page

  • 2099 - 2103

International Standard Serial Number (ISSN)

  • 2157-8095

International Standard Book Number 13 (ISBN-13)

  • 9781509018062

Digital Object Identifier (DOI)

  • 10.1109/ISIT.2016.7541669

Citation Source

  • Scopus