Skip to main content
Journal cover image

The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted.

Publication ,  Journal Article
Vsevolozhskaya, OA; Kuo, C-L; Ruiz, G; Diatchenko, L; Zaykin, DV
Published in: Genet Epidemiol
December 2017

The increasing accessibility of data to researchers makes it possible to conduct massive amounts of statistical testing. Rather than follow specific scientific hypotheses with statistical analysis, researchers can now test many possible relationships and let statistics generate hypotheses for them. The field of genetic epidemiology is an illustrative case, where testing of candidate genetic variants for association with an outcome has been replaced by agnostic screening of the entire genome. Poor replication rates of candidate gene studies have improved dramatically with the increase in genomic coverage, due to factors such as adoption of better statistical practices and availability of larger sample sizes. Here, we suggest that another important factor behind the improved replicability of genome-wide scans is an increase in the amount of statistical testing itself. We show that an increase in the number of tested hypotheses increases the proportion of true associations among the variants with the smallest P-values. We develop statistical theory to quantify how the expected proportion of genuine signals (EPGS) among top hits depends on the number of tests. This enrichment of top hits by real findings holds regardless of whether genome-wide statistical significance has been reached in a study. Moreover, if we consider only those "failed" studies that produce no statistically significant results, the same enrichment phenomenon takes place: the proportion of true associations among top hits grows with the number of tests. The enrichment occurs even if the true signals are encountered at the logarithmically decreasing rate with the additional testing.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Genet Epidemiol

DOI

EISSN

1098-2272

Publication Date

December 2017

Volume

41

Issue

8

Start / End Page

726 / 743

Location

United States

Related Subject Headings

  • Models, Statistical
  • Models, Genetic
  • Humans
  • Genome-Wide Association Study
  • Epidemiology
  • Bayes Theorem
  • 4202 Epidemiology
  • 3105 Genetics
  • 1117 Public Health and Health Services
  • 0604 Genetics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Vsevolozhskaya, O. A., Kuo, C.-L., Ruiz, G., Diatchenko, L., & Zaykin, D. V. (2017). The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol, 41(8), 726–743. https://doi.org/10.1002/gepi.22064
Vsevolozhskaya, Olga A., Chia-Ling Kuo, Gabriel Ruiz, Luda Diatchenko, and Dmitri V. Zaykin. “The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted.Genet Epidemiol 41, no. 8 (December 2017): 726–43. https://doi.org/10.1002/gepi.22064.
Vsevolozhskaya OA, Kuo C-L, Ruiz G, Diatchenko L, Zaykin DV. The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol. 2017 Dec;41(8):726–43.
Vsevolozhskaya, Olga A., et al. “The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted.Genet Epidemiol, vol. 41, no. 8, Dec. 2017, pp. 726–43. Pubmed, doi:10.1002/gepi.22064.
Vsevolozhskaya OA, Kuo C-L, Ruiz G, Diatchenko L, Zaykin DV. The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol. 2017 Dec;41(8):726–743.
Journal cover image

Published In

Genet Epidemiol

DOI

EISSN

1098-2272

Publication Date

December 2017

Volume

41

Issue

8

Start / End Page

726 / 743

Location

United States

Related Subject Headings

  • Models, Statistical
  • Models, Genetic
  • Humans
  • Genome-Wide Association Study
  • Epidemiology
  • Bayes Theorem
  • 4202 Epidemiology
  • 3105 Genetics
  • 1117 Public Health and Health Services
  • 0604 Genetics