Identifying branch-specific positive selection throughout the regulatory genome using an appropriate proxy neutral.

Published

Journal Article

BACKGROUND:Adaptive changes in cis-regulatory elements are an essential component of evolution by natural selection. Identifying adaptive and functional noncoding DNA elements throughout the genome is therefore crucial for understanding the relationship between phenotype and genotype. RESULTS:We used ENCODE annotations to identify appropriate proxy neutral sequences and demonstrate that the conservativeness of the test can be modulated during the filtration of reference alignments. We applied the method to noncoding Human Accelerated Elements as well as open chromatin elements previously identified in 125 human tissues and cell lines to demonstrate its utility. Then, we evaluated the impact of query region length, proxy neutral sequence length, and branch count on test sensitivity and specificity. We found that the length of the query alignment can vary between 150 bp and 1 kb without affecting the estimation of selection, while for the reference alignment, we found that a length of 3 kb is adequate for proper testing. We also simulated sequence alignments under different classes of evolution and validated our ability to distinguish positive selection from relaxation of constraint and neutral evolution. Finally, we re-confirmed that a quarter of all non-coding Human Accelerated Elements are evolving by positive selection. CONCLUSION:Here, we introduce a method we called adaptiPhy, which adds significant improvements to our earlier method that tests for branch-specific directional selection in noncoding sequences. The motivation for these improvements is to provide a more sensitive and better targeted characterization of directional selection and neutral evolution across the genome.

Full Text

Duke Authors

Cited Authors

  • Berrio, A; Haygood, R; Wray, GA

Published Date

  • May 13, 2020

Published In

Volume / Issue

  • 21 / 1

Start / End Page

  • 359 -

PubMed ID

  • 32404186

Pubmed Central ID

  • 32404186

Electronic International Standard Serial Number (EISSN)

  • 1471-2164

International Standard Serial Number (ISSN)

  • 1471-2164

Digital Object Identifier (DOI)

  • 10.1186/s12864-020-6752-4

Language

  • eng