Improved Pathogenic Variant Localization via a Hierarchical Model of Sub-regional Intolerance.

Published

Journal Article

Different parts of a gene can be of differential importance to development and health. This regional heterogeneity is also apparent in the distribution of disease-associated mutations, which often cluster in particular regions of disease-associated genes. The ability to precisely estimate functionally important sub-regions of genes will be key in correctly deciphering relationships between genetic variation and disease. Previous methods have had some success using standing human variation to characterize this variability in importance by measuring sub-regional intolerance, i.e., the depletion in functional variation from expectation within a given region of a gene. However, the ability to precisely estimate local intolerance was restricted by the fact that only information within a given sub-region is used, leading to instability in local estimates, especially for small regions. We show that borrowing information across regions using a Bayesian hierarchical model stabilizes estimates, leading to lower variability and improved predictive utility. Specifically, our approach more effectively identifies regions enriched for ClinVar pathogenic variants. We also identify significant correlations between sub-region intolerance and the distribution of pathogenic variation in disease-associated genes, with AUCs for classifying de novo missense variants in Online Mendelian Inheritance in Man (OMIM) genes of up to 0.86 using exonic sub-regions and 0.91 using sub-regions defined by protein domains. This result immediately suggests that considering the intolerance of regions in which variants are found may improve diagnostic interpretation. We also illustrate the utility of integrating regional intolerance into gene-level disease association tests with a study of known disease-associated genes for epileptic encephalopathy.

Full Text

Duke Authors

Cited Authors

  • Hayeck, TJ; Stong, N; Wolock, CJ; Copeland, B; Kamalakaran, S; Goldstein, DB; Allen, AS

Published Date

  • February 7, 2019

Published In

Volume / Issue

  • 104 / 2

Start / End Page

  • 299 - 309

PubMed ID

  • 30686509

Pubmed Central ID

  • 30686509

Electronic International Standard Serial Number (EISSN)

  • 1537-6605

Digital Object Identifier (DOI)

  • 10.1016/j.ajhg.2018.12.020

Language

  • eng

Conference Location

  • United States