Skip to main content
Journal cover image

Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes.

Publication ,  Journal Article
Pang, H; Jung, S-H
Published in: Genet Epidemiol
April 2013

A variety of prediction methods are used to relate high-dimensional genome data with a clinical outcome using a prediction model. Once a prediction model is developed from a data set, it should be validated using a resampling method or an independent data set. Although the existing prediction methods have been intensively evaluated by many investigators, there has not been a comprehensive study investigating the performance of the validation methods, especially with a survival clinical outcome. Understanding the properties of the various validation methods can allow researchers to perform more powerful validations while controlling for type I error. In addition, sample size calculation strategy based on these validation methods is lacking. We conduct extensive simulations to examine the statistical properties of these validation strategies. In both simulations and a real data example, we have found that 10-fold cross-validation with permutation gave the best power while controlling type I error close to the nominal level. Based on this, we have also developed a sample size calculation method that will be used to design a validation study with a user-chosen combination of prediction. Microarray and genome-wide association studies data are used as illustrations. The power calculation method in this presentation can be used for the design of any biomedical studies involving high-dimensional data and survival outcomes.

Duke Scholars

Published In

Genet Epidemiol

DOI

EISSN

1098-2272

Publication Date

April 2013

Volume

37

Issue

3

Start / End Page

276 / 282

Location

United States

Related Subject Headings

  • Validation Studies as Topic
  • Sample Size
  • Research Design
  • Proportional Hazards Models
  • Multiple Myeloma
  • Mortality
  • Microarray Analysis
  • Lung Neoplasms
  • Humans
  • Human Genome Project
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Pang, H., & Jung, S.-H. (2013). Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes. Genet Epidemiol, 37(3), 276–282. https://doi.org/10.1002/gepi.21721
Pang, Herbert, and Sin-Ho Jung. “Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes.Genet Epidemiol 37, no. 3 (April 2013): 276–82. https://doi.org/10.1002/gepi.21721.
Pang, Herbert, and Sin-Ho Jung. “Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes.Genet Epidemiol, vol. 37, no. 3, Apr. 2013, pp. 276–82. Pubmed, doi:10.1002/gepi.21721.
Journal cover image

Published In

Genet Epidemiol

DOI

EISSN

1098-2272

Publication Date

April 2013

Volume

37

Issue

3

Start / End Page

276 / 282

Location

United States

Related Subject Headings

  • Validation Studies as Topic
  • Sample Size
  • Research Design
  • Proportional Hazards Models
  • Multiple Myeloma
  • Mortality
  • Microarray Analysis
  • Lung Neoplasms
  • Humans
  • Human Genome Project