Skip to main content

Nonparametric IPSS: fast, flexible feature selection with false discovery control.

Publication ,  Journal Article
Melikechi, O; Dunson, DB; Miller, JW
Published in: Bioinformatics (Oxford, England)
May 2025

Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives.We introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores. The method is nonparametric whenever the importance scores are nonparametric, and it estimates q-values, which are better suited to high-dimensional data than P-values. We focus on two special cases using importance scores from gradient boosting (IPSSGB) and random forests (IPSSRF). Extensive nonlinear simulations with RNA sequencing data show that both methods accurately control the false discovery rate and detect more true positives than existing methods. Both methods are also efficient, running in under 20 s when there are 500 samples and 5000 features. We apply IPSSGB and IPSSRF to detect microRNAs and genes related to cancer, finding that they yield better predictions with fewer features than existing approaches.All code and data used in this work are available on GitHub (https://github.com/omelikechi/ipss_bioinformatics) and permanently archived on Zenodo (https://doi.org/10.5281/zenodo.15335289). A Python package for implementing IPSS is available on GitHub (https://github.com/omelikechi/ipss) and PyPI (https://pypi.org/project/ipss/). An R implementation of IPSS is also available on GitHub (https://github.com/omelikechi/ipssR).

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Bioinformatics (Oxford, England)

DOI

EISSN

1367-4811

ISSN

1367-4803

Publication Date

May 2025

Volume

41

Issue

5

Start / End Page

btaf299

Related Subject Headings

  • Software
  • Sequence Analysis, RNA
  • Neoplasms
  • MicroRNAs
  • Machine Learning
  • Humans
  • Computational Biology
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Melikechi, O., Dunson, D. B., & Miller, J. W. (2025). Nonparametric IPSS: fast, flexible feature selection with false discovery control. Bioinformatics (Oxford, England), 41(5), btaf299. https://doi.org/10.1093/bioinformatics/btaf299
Melikechi, Omar, David B. Dunson, and Jeffrey W. Miller. “Nonparametric IPSS: fast, flexible feature selection with false discovery control.Bioinformatics (Oxford, England) 41, no. 5 (May 2025): btaf299. https://doi.org/10.1093/bioinformatics/btaf299.
Melikechi O, Dunson DB, Miller JW. Nonparametric IPSS: fast, flexible feature selection with false discovery control. Bioinformatics (Oxford, England). 2025 May;41(5):btaf299.
Melikechi, Omar, et al. “Nonparametric IPSS: fast, flexible feature selection with false discovery control.Bioinformatics (Oxford, England), vol. 41, no. 5, May 2025, p. btaf299. Epmc, doi:10.1093/bioinformatics/btaf299.
Melikechi O, Dunson DB, Miller JW. Nonparametric IPSS: fast, flexible feature selection with false discovery control. Bioinformatics (Oxford, England). 2025 May;41(5):btaf299.

Published In

Bioinformatics (Oxford, England)

DOI

EISSN

1367-4811

ISSN

1367-4803

Publication Date

May 2025

Volume

41

Issue

5

Start / End Page

btaf299

Related Subject Headings

  • Software
  • Sequence Analysis, RNA
  • Neoplasms
  • MicroRNAs
  • Machine Learning
  • Humans
  • Computational Biology
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences