Logistic regression with an auxiliary data source


Journal Article

To achieve good generalization in supervised learning, the training and testing examples are usually required to be drawn from the same source distribution. In this paper we propose a method to relax this requirement in the context of logistic regression. Assuming Dpand Daare two sets of examples drawn from two mismatched distributions, where Daare fully labeled and Dppartially labeled, our objective is to complete the labels of Dp. We introduce an auxiliary variable μ for each example in Dato reflect its mismatch with Dp. Under an appropriate constraint the μ's are estimated as a byproduct, along with the classifier. We also present an active learning approach for selecting the labeled examples in Dp. The proposed algorithm, called "Migratory-Logit" or M-Logit, is demonstrated successfully on simulated as well as real data sets.

Duke Authors

Cited Authors

  • Liao, X; Xue, Y; Carin, L

Published Date

  • December 1, 2005

Published In

  • Icml 2005 Proceedings of the 22nd International Conference on Machine Learning

Start / End Page

  • 505 - 512

Citation Source

  • Scopus