Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets.

Journal Article

Single variant or single gene analyses generally account for only a small proportion of the phenotypic variation in complex traits. Alternatively, gene set or pathway association analyses are playing an increasingly important role in uncovering genetic architectures of complex traits through the identification of systematic genetic interactions. Two dominant paradigms for gene set analyses are association analyses based on SNP genotypes and those based on gene expression profiles. However, gene-disease association can manifest in many ways, such as alterations of gene expression, genotype, and copy number; thus, an integrative approach combining multiple forms of evidence can more accurately and comprehensively capture pathway associations. We have developed a single statistical framework, Gene Set Association Analysis (GSAA), that simultaneously measures genome-wide patterns of genetic variation and gene expression variation to identify sets of genes enriched for differential expression and/or trait-associated genetic markers. Simulation studies illustrate that joint analyses of genomic data increase the power to detect real associations when compared with gene set methods that use only one genomic data type. The analysis of two human diseases, glioblastoma and Crohn's disease, detected abnormalities in previously identified disease-associated pathways, such as pathways related to PI3K signaling, DNA damage response, and the activation of NFKB. In addition, GSAA predicted novel pathway associations, for example, differential genetic and expression characteristics in genes from the ABC transporter family in glioblastoma and from the HLA system in Crohn's disease. These demonstrate that GSAA can help uncover biological pathways underlying human diseases and complex traits.

Full Text

Duke Authors

Cited Authors

  • Xiong, Q; Ancona, N; Hauser, ER; Mukherjee, S; Furey, TS

Published Date

  • February 2012

Published In

Volume / Issue

  • 22 / 2

Start / End Page

  • 386 - 397

PubMed ID

  • 21940837

Electronic International Standard Serial Number (EISSN)

  • 1549-5469

Digital Object Identifier (DOI)

  • 10.1101/gr.124370.111

Language

  • eng

Conference Location

  • United States