Post-Selection Inference for Generalized Linear Models With Many Controls

Published

Journal Article

© 2016 American Statistical Association. This article considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunizes against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate α0 at the root-n rate when the total number p of other regressors, called controls, potentially exceeds the sample size n using sparsity assumptions. The sparsity assumption means that there is a subset of s < n controls, which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over s-sparse models satisfying s2log 2p = o(n) and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this article.

Full Text

Duke Authors

Cited Authors

  • Belloni, A; Chernozhukov, V; Wei, Y

Published Date

  • October 1, 2016

Published In

Volume / Issue

  • 34 / 4

Start / End Page

  • 606 - 619

Electronic International Standard Serial Number (EISSN)

  • 1537-2707

International Standard Serial Number (ISSN)

  • 0735-0015

Digital Object Identifier (DOI)

  • 10.1080/07350015.2016.1166116

Citation Source

  • Scopus