Skip to main content

Limitations of principal components in quantitative genetic association models for human studies.

Publication ,  Journal Article
Yao, Y; Ochoa, A
Published in: eLife
May 2023

Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

eLife

DOI

EISSN

2050-084X

ISSN

2050-084X

Publication Date

May 2023

Volume

12

Start / End Page

e79238

Related Subject Headings

  • Principal Component Analysis
  • Phenotype
  • Multifactorial Inheritance
  • Models, Genetic
  • Linear Models
  • Humans
  • Genotype
  • Genome-Wide Association Study
  • 42 Health sciences
  • 32 Biomedical and clinical sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Yao, Y., & Ochoa, A. (2023). Limitations of principal components in quantitative genetic association models for human studies. ELife, 12, e79238. https://doi.org/10.7554/elife.79238
Yao, Yiqi, and Alejandro Ochoa. “Limitations of principal components in quantitative genetic association models for human studies.ELife 12 (May 2023): e79238. https://doi.org/10.7554/elife.79238.
Yao, Yiqi, and Alejandro Ochoa. “Limitations of principal components in quantitative genetic association models for human studies.ELife, vol. 12, May 2023, p. e79238. Epmc, doi:10.7554/elife.79238.

Published In

eLife

DOI

EISSN

2050-084X

ISSN

2050-084X

Publication Date

May 2023

Volume

12

Start / End Page

e79238

Related Subject Headings

  • Principal Component Analysis
  • Phenotype
  • Multifactorial Inheritance
  • Models, Genetic
  • Linear Models
  • Humans
  • Genotype
  • Genome-Wide Association Study
  • 42 Health sciences
  • 32 Biomedical and clinical sciences