Skip to main content
Journal cover image

Variant Calling in the Goldilocks Zone: How Reference Genome Choice and Read Mapping Stringency Impact Heterozygosity Estimates and Phylogenetic Analyses.

Publication ,  Journal Article
Mohn, RA; Garner, M; Manos, PS; Hipp, AL
Published in: Molecular ecology resources
January 2026

The increasing numbers of published reference genomes and affordability of whole genome resequencing have enabled multispecies population genomic and phylogenomic studies on non-model organisms, but they raise a new question for comparative genomics: what reference genome and mapping method combination results in the most data with the least bias? We mapped short-read resequencing data from seven eastern North American white oak (Quercus sect. Quercus) and two related samples to four Quercus reference genomes (Q. alba, Q. lobata, Q. mongolica, and Q. rubra) which represent different degrees of phylogenetic relatedness to the samples. We used three different mapping methods: a global (Bowtie 2 --end-to-end) and two local (Bowtie 2 --local and BWA-MEM) alignment approaches. For the twelve resulting datasets, we analysed read mapping accuracy and efficiency, missing data, heterozygosity, and inferred phylogenies to evaluate the impact of reference genome and read-mapping method. We found that the genetic distance of the reference genome to the samples and mapping method together impacted heterozygosity and phylogenetic tree estimation. There were two notable effects. First, when using a global alignment method (Bowtie 2 --end-to-end), estimated heterozygosity negligibly decreased with increased genetic distance between the reference and sample. Second, the most distantly related reference genome had significantly reduced base pair recovery and resulted in under- or overestimating heterozygosity depending on the method, and a more unbalanced phylogeny. We conclude that using a closely related but not conspecific reference is ideal to minimise reference bias and using Bowtie 2 --end-to-end minimises mismapping, resulting in the most accurate variant calls.

Duke Scholars

Published In

Molecular ecology resources

DOI

EISSN

1755-0998

ISSN

1755-0998

Publication Date

January 2026

Volume

26

Issue

1

Start / End Page

e70079

Related Subject Headings

  • Quercus
  • Phylogeny
  • Heterozygote
  • Genomics
  • Genome, Plant
  • Evolutionary Biology
  • 06 Biological Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Mohn, R. A., Garner, M., Manos, P. S., & Hipp, A. L. (2026). Variant Calling in the Goldilocks Zone: How Reference Genome Choice and Read Mapping Stringency Impact Heterozygosity Estimates and Phylogenetic Analyses. Molecular Ecology Resources, 26(1), e70079. https://doi.org/10.1111/1755-0998.70079
Mohn, Rebekah A., Mira Garner, Paul S. Manos, and Andrew L. Hipp. “Variant Calling in the Goldilocks Zone: How Reference Genome Choice and Read Mapping Stringency Impact Heterozygosity Estimates and Phylogenetic Analyses.Molecular Ecology Resources 26, no. 1 (January 2026): e70079. https://doi.org/10.1111/1755-0998.70079.
Mohn, Rebekah A., et al. “Variant Calling in the Goldilocks Zone: How Reference Genome Choice and Read Mapping Stringency Impact Heterozygosity Estimates and Phylogenetic Analyses.Molecular Ecology Resources, vol. 26, no. 1, Jan. 2026, p. e70079. Epmc, doi:10.1111/1755-0998.70079.
Journal cover image

Published In

Molecular ecology resources

DOI

EISSN

1755-0998

ISSN

1755-0998

Publication Date

January 2026

Volume

26

Issue

1

Start / End Page

e70079

Related Subject Headings

  • Quercus
  • Phylogeny
  • Heterozygote
  • Genomics
  • Genome, Plant
  • Evolutionary Biology
  • 06 Biological Sciences