A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes.


Journal Article

A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.

Full Text

Duke Authors

Cited Authors

  • Gayou, O; Das, SK; Zhou, S-M; Marks, LB; Parda, DS; Miften, M

Published Date

  • December 2008

Published In

Volume / Issue

  • 35 / 12

Start / End Page

  • 5426 - 5433

PubMed ID

  • 19175102

Pubmed Central ID

  • 19175102

International Standard Serial Number (ISSN)

  • 0094-2405

Digital Object Identifier (DOI)

  • 10.1118/1.3005974


  • eng

Conference Location

  • United States