Methods for interaction analyses using family-based case-control data: conditional logistic regression versus generalized estimating equations.
A complex web of gene-gene and gene-environment interactions likely underlies late-onset disease development. We compared conditional logistic regression (CLR) and generalized estimating equations (GEE) in modeling such interactions in pedigrees with missing parents. Using the simulation of linkage and association (SIMLA) program, disease genes, an environmental risk factor, gene-gene interaction, and gene-environment interaction were generated in family-based data sets. Four scenarios for the relationship between the marker and disease loci were examined: linkage and association, linkage without association, association without linkage, and absence of both linkage and association. Models for CLR and GEE (with exchangeable and independence correlation matrices) were built, and type I error, power, average odds ratio (OR), standard deviation, and 95% confidence intervals were estimated. CLR and GEE were valid tests of association in the presence of linkage, but type I error was inflated for association without linkage, particularly with GEE. CLR generated estimates of the OR with lower bias but often more variability than the OR estimates observed for GEE. Further, GEE was more powerful than CLR in detecting main and interactive effects. Although GEE with both matrices had similar power, use of the independence matrix resulted in lower type I error and less biased OR estimation as compared to the exchangeable matrix. Our findings support the use of GEE in maximizing power to detect gene-gene and gene-environment interactions but caution its use under potential association without linkage (e.g., population stratification) and the interpretation of its OR estimates.
Hancock, DB; Martin, ER; Li, Y-J; Scott, WK
Volume / Issue
Start / End Page
Pubmed Central ID
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)