Skip to main content

Median selection subset aggregation for parallel inference

Publication ,  Conference
Wang, X; Peng, P; Dunson, DB
Published in: Advances in Neural Information Processing Systems
January 1, 2014

For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems involving many features. A variety of distributed algorithms have been proposed in this context, but challenges arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. We propose a MEdian Selection Subset AGgregation Estimator (message) algorithm, which attempts to solve these problems. The algorithm applies feature selection in parallel for each subset using Lasso or another method, calculates the 'median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in both sample and feature size, and has theoretical guarantees. In particular, we show model selection consistency and coefficient estimation efficiency. Extensive experiments show excellent performance in variable selection, estimation, prediction, and computation time relative to usual competitors.

Duke Scholars

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2014

Volume

3

Issue

January

Start / End Page

2195 / 2203

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, X., Peng, P., & Dunson, D. B. (2014). Median selection subset aggregation for parallel inference. In Advances in Neural Information Processing Systems (Vol. 3, pp. 2195–2203).
Wang, X., P. Peng, and D. B. Dunson. “Median selection subset aggregation for parallel inference.” In Advances in Neural Information Processing Systems, 3:2195–2203, 2014.
Wang X, Peng P, Dunson DB. Median selection subset aggregation for parallel inference. In: Advances in Neural Information Processing Systems. 2014. p. 2195–203.
Wang, X., et al. “Median selection subset aggregation for parallel inference.” Advances in Neural Information Processing Systems, vol. 3, no. January, 2014, pp. 2195–203.
Wang X, Peng P, Dunson DB. Median selection subset aggregation for parallel inference. Advances in Neural Information Processing Systems. 2014. p. 2195–2203.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2014

Volume

3

Issue

January

Start / End Page

2195 / 2203

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology