Skip to main content

Towards Scalable and Accurate Online Feature Selection for Big Data

Publication ,  Conference
Yu, K; Wu, X; Ding, W; Pei, J
Published in: Proceedings - IEEE International Conference on Data Mining, ICDM
January 1, 2014

Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.

Duke Scholars

Published In

Proceedings - IEEE International Conference on Data Mining, ICDM

DOI

ISSN

1550-4786

Publication Date

January 1, 2014

Volume

2015-January

Issue

January

Start / End Page

660 / 669
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Yu, K., Wu, X., Ding, W., & Pei, J. (2014). Towards Scalable and Accurate Online Feature Selection for Big Data. In Proceedings - IEEE International Conference on Data Mining, ICDM (Vol. 2015-January, pp. 660–669). https://doi.org/10.1109/ICDM.2014.63
Yu, K., X. Wu, W. Ding, and J. Pei. “Towards Scalable and Accurate Online Feature Selection for Big Data.” In Proceedings - IEEE International Conference on Data Mining, ICDM, 2015-January:660–69, 2014. https://doi.org/10.1109/ICDM.2014.63.
Yu K, Wu X, Ding W, Pei J. Towards Scalable and Accurate Online Feature Selection for Big Data. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2014. p. 660–9.
Yu, K., et al. “Towards Scalable and Accurate Online Feature Selection for Big Data.” Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2015-January, no. January, 2014, pp. 660–69. Scopus, doi:10.1109/ICDM.2014.63.
Yu K, Wu X, Ding W, Pei J. Towards Scalable and Accurate Online Feature Selection for Big Data. Proceedings - IEEE International Conference on Data Mining, ICDM. 2014. p. 660–669.

Published In

Proceedings - IEEE International Conference on Data Mining, ICDM

DOI

ISSN

1550-4786

Publication Date

January 1, 2014

Volume

2015-January

Issue

January

Start / End Page

660 / 669