Skip to main content
construction release_alert
The Scholars Team is working with OIT to resolve some issues with the Scholars search index
cancel
Journal cover image

Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM.

Publication ,  Journal Article
Rad, M; Rafiei, A; Grunwell, J; Kamaleswaran, R
Published in: Int J Med Inform
April 2025

OBJECTIVE: The regression of small imbalanced horizontal datasets is an important problem in bioinformatics due to rare but vital data points impacting model performance. Most clinical studies suffer from imbalance in their distribution which impacts the learning ability of regression or classification models. The imbalance once combined with the small number of samples reduces the prediction performance. An improvement in the trainability of small imbalanced datasets hugely improves the potency of current prediction models that rely on a small set of valuable expensive samples. MATERIALS AND METHODS: A method called Stability Selection has been used to overcome the high dimensionality problem, which arises when the sample sizes are relatively small compared to the number of features. The method was used to improve the performance of the Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (SMOGN), an imbalance removal algorithm. To test the new pipeline, a small imbalanced cohort of pediatric ICU patients was used to predict the number of Ventilator-Free Days (VFD) a patient may experience for an admission period of 28 days due to respiratory illnesses. RESULTS: Our model demonstrated its effectiveness by overcoming label imbalance while predicting almost all the non-surviving patients in the test dataset using Stability Selection before applying SMOGN. Our study also highlighted the importance of Pediatrics Risk of Mortality (PRISM) as a powerful VFD predictor if combined with other clinical features. CONCLUSION: This paper shows how a hybrid strategy of Stability Selection, SMOGN, and regression can improve the outcome of highly imbalanced datasets and reduce the probability of highly expensive false negative detections in severe acute respiratory disease syndrome cases. The proposed modeling pipeline can reduce the overall VFD regression error but is also expandable to other regressable features. We also showed the importance of PRISM as a strong VFD predictor.

Duke Scholars

Published In

Int J Med Inform

DOI

EISSN

1872-8243

Publication Date

April 2025

Volume

196

Start / End Page

105809

Location

Ireland

Related Subject Headings

  • Respiration, Artificial
  • Regression Analysis
  • Normal Distribution
  • Medical Informatics
  • Intensive Care Units, Pediatric
  • Infant
  • Humans
  • Child, Preschool
  • Child
  • Algorithms
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Rad, M., Rafiei, A., Grunwell, J., & Kamaleswaran, R. (2025). Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM. Int J Med Inform, 196, 105809. https://doi.org/10.1016/j.ijmedinf.2025.105809
Rad, Milad, Alireza Rafiei, Jocelyn Grunwell, and Rishikesan Kamaleswaran. “Tackling the small imbalanced horizontal dataset regressions by Stability Selection and SMOGN: a case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM.Int J Med Inform 196 (April 2025): 105809. https://doi.org/10.1016/j.ijmedinf.2025.105809.
Journal cover image

Published In

Int J Med Inform

DOI

EISSN

1872-8243

Publication Date

April 2025

Volume

196

Start / End Page

105809

Location

Ireland

Related Subject Headings

  • Respiration, Artificial
  • Regression Analysis
  • Normal Distribution
  • Medical Informatics
  • Intensive Care Units, Pediatric
  • Infant
  • Humans
  • Child, Preschool
  • Child
  • Algorithms