Scholars@Duke publication: The all-or-nothing phenomenon in sparse linear regression

The all-or-nothing phenomenon in sparse linear regression

Publication , Journal Article

Reeves, G; Xu, J; Zadik, I

Published in: Mathematical Statistics and Learning

January 1, 2020

We study the problem of recovering a hidden binary k-sparse p-dimensional vector β from n noisy linear observations Y = Xβ + W, where Xij are i.i.d. N (0,1) and Wi are i.i.d. N (0, σ²). A closely related hypothesis testing problem is to distinguish the pair (X, Y) generated from this structured model from a corresponding null model where (X, Y) consist of purely independent Gaussian entries. In the low sparsity k = o.(√p) and high signal-to- noise ratio k/σ² → ∞ regime, we establish an “all-or-nothing” information-theoretic phase transition at a critical sample size n* = 2k log(p = k)/ log(1 C k/σ²), resolving a conjecture of Gamarnik and Zadik (2017). Specifically, we show that if lim infp→∞ n/n* > 1, then the maximum likelihood estimator almost perfectly recovers the hidden vector with high probability and moreover the true hypothesis can be detected with a vanishing error probability. Conversely, if lim supp→∞ n/n* < 1, then it becomes information-theoretically impossible even to recover an arbitrarily small but fixed fraction of the hidden vector support, or to test hypotheses strictly better than random guess. Our proof of the impossibility result builds upon two key techniques, which could be of independent interest. First, we use a conditional second moment method to upper bound the Kullback-Leibler (KL) divergence between the structured and the null model. Second, inspired by the celebrated area theorem, we establish a lower bound to the minimum mean squared estimation error of the hidden vector in terms of the KL divergence between the two models.1

Duke Scholars

Author Galen Reeves Electrical and Computer Engineering

Published In

Mathematical Statistics and Learning

DOI

10.4171/MSL/22

EISSN

2520-2324

ISSN

2520-2316

Publication Date

January 1, 2020

Volume

Issue

3-4

Start / End Page

259 / 313

Citation

APA

Chicago

ICMJE

MLA

NLM

Reeves, G., Xu, J., & Zadik, I. (2020). The all-or-nothing phenomenon in sparse linear regression. Mathematical Statistics and Learning, 3(3–4), 259–313. https://doi.org/10.4171/MSL/22

Reeves, G., J. Xu, and I. Zadik. “The all-or-nothing phenomenon in sparse linear regression.” Mathematical Statistics and Learning 3, no. 3–4 (January 1, 2020): 259–313. https://doi.org/10.4171/MSL/22.

Reeves G, Xu J, Zadik I. The all-or-nothing phenomenon in sparse linear regression. Mathematical Statistics and Learning. 2020 Jan 1;3(3–4):259–313.

Reeves, G., et al. “The all-or-nothing phenomenon in sparse linear regression.” Mathematical Statistics and Learning, vol. 3, no. 3–4, Jan. 2020, pp. 259–313. Scopus, doi:10.4171/MSL/22.

Reeves G, Xu J, Zadik I. The all-or-nothing phenomenon in sparse linear regression. Mathematical Statistics and Learning. 2020 Jan 1;3(3–4):259–313.

Published In

Mathematical Statistics and Learning

DOI

10.4171/MSL/22

EISSN

2520-2324

ISSN

2520-2316

Publication Date

January 1, 2020

Volume

Issue

3-4

Start / End Page

259 / 313