Bioinformatics pipeline to advance the identification of transcription regulatory variants in LOAD noncoding regions
BACKGROUND: As new LOAD genetic risk loci are identified and more brain cell-type specific omics data becomes available, there is an unmet need for a bioinformatics framework to prioritize genes and variants for testing in single-cell molecular profiling experiments and validation using disease models and gene editing technologies. We developed a new bioinformatics pipeline to characterize and prioritize SNPs in enhancers located in LOAD-GWAS regions based on their predicted impact to alter transcription factor (TF) binding. The proposed bioinformatics pipeline progresses from SNPs located in LOAD-GWAS regions to a filtered set of regulatory SNPs that have a predicted strong effect on TF binding. METHOD: We utilized publicly available bioinformatics software and data sources. Software: motifbreakR, UCSC Table Browser, GTEx portal and JMP. Databases: dbSNP v150, chromatin state segmentation data from the Roadmap Consortium, expression data: brain tissue from GTEX v8, monocyte from Cardiogenics, TF ChIP-seq data from ENCODE and TF binding motifs from MotifDb. RESULT: We catalogued 61 strong enhancers in LOAD-GWAS regions that encompass 326 SNPs and 104 TF binding sites. 77 and 78 of the TFs were expressed in brain and monocytes, respectively, out of which 19 TF-binding sites showed ChIP-seq signals. Next, we evaluated the effect of SNPs mapped within this set of TF-binding sites and found that 11 SNPs interrupt with the TF binding. We then determined the LD relationships between the LOAD-risk SNP and the TF 'interrupter' SNP to interpret whether the LOAD association is driven by up- or down- regulation of transcription mediated by the corresponding TF. For example, a SNP within an active enhancer adjacent to PICALM disrupts the SPI1 TF binding site. The enhancer SNP and the LOAD-GWAS SNP are in high LD, and we found that the GWAS-risk allele links to the enhancer allele that causes loss of SPI1 binding. CONCLUSION: This study provides an analytical framework to catalogue noncoding variations in enhancers located in LOAD-GWAS loci and characterize their likelihood to perturb TF binding. The approach integrates multiple data types to characterize and prioritize SNPs for further exploration of their putative regulatory function using single-cell multi-omics assays and gene editing.
Volume / Issue
Start / End Page
Electronic International Standard Serial Number (EISSN)
Digital Object Identifier (DOI)