Variant Calling in the Goldilocks Zone: How Reference Genome Choice and Read Mapping Stringency Impact Heterozygosity Estimates and Phylogenetic Analyses.
The increasing numbers of published reference genomes and affordability of whole genome resequencing have enabled multispecies population genomic and phylogenomic studies on non-model organisms, but they raise a new question for comparative genomics: what reference genome and mapping method combination results in the most data with the least bias? We mapped short-read resequencing data from seven eastern North American white oak (Quercus sect. Quercus) and two related samples to four Quercus reference genomes (Q. alba, Q. lobata, Q. mongolica, and Q. rubra) which represent different degrees of phylogenetic relatedness to the samples. We used three different mapping methods: a global (Bowtie 2 --end-to-end) and two local (Bowtie 2 --local and BWA-MEM) alignment approaches. For the twelve resulting datasets, we analysed read mapping accuracy and efficiency, missing data, heterozygosity, and inferred phylogenies to evaluate the impact of reference genome and read-mapping method. We found that the genetic distance of the reference genome to the samples and mapping method together impacted heterozygosity and phylogenetic tree estimation. There were two notable effects. First, when using a global alignment method (Bowtie 2 --end-to-end), estimated heterozygosity negligibly decreased with increased genetic distance between the reference and sample. Second, the most distantly related reference genome had significantly reduced base pair recovery and resulted in under- or overestimating heterozygosity depending on the method, and a more unbalanced phylogeny. We conclude that using a closely related but not conspecific reference is ideal to minimise reference bias and using Bowtie 2 --end-to-end minimises mismapping, resulting in the most accurate variant calls.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Quercus
- Phylogeny
- Heterozygote
- Genomics
- Genome, Plant
- Evolutionary Biology
- 06 Biological Sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Quercus
- Phylogeny
- Heterozygote
- Genomics
- Genome, Plant
- Evolutionary Biology
- 06 Biological Sciences