Skip to main content
construction release_alert
Scholars@Duke will be undergoing maintenance April 11-15. Some features may be unavailable during this time.
cancel
Journal cover image

De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads.

Publication ,  Journal Article
Korlach, J; Gedman, G; Kingan, SB; Chin, C-S; Howard, JT; Audet, J-N; Cantin, L; Jarvis, ED
Published in: Gigascience
October 1, 2017

Reference-quality genomes are expected to provide a resource for studying gene structure, function, and evolution. However, often genes of interest are not completely or accurately assembled, leading to unknown errors in analyses or additional cloning efforts for the correct sequences. A promising solution is long-read sequencing. Here we tested PacBio-based long-read sequencing and diploid assembly for potential improvements to the Sanger-based intermediate-read zebra finch reference and Illumina-based short-read Anna's hummingbird reference, 2 vocal learning avian species widely studied in neuroscience and genomics. With DNA of the same individuals used to generate the reference genomes, we generated diploid assemblies with the FALCON-Unzip assembler, resulting in contigs with no gaps in the megabase range, representing 150-fold and 200-fold improvements over the current zebra finch and hummingbird references, respectively. These long-read and phased assemblies corrected and resolved what we discovered to be numerous misassemblies in the references, including missing sequences in gaps, erroneous sequences flanking gaps, base call errors in difficult-to-sequence regions, complex repeat structure errors, and allelic differences between the 2 haplotypes. These improvements were validated by single long-genome and transcriptome reads and resulted for the first time in completely resolved protein-coding genes widely studied in neuroscience and specialized in vocal learning species. These findings demonstrate the impact of long reads, sequencing of previously difficult-to-sequence regions, and phasing of haplotypes on generating the high-quality assemblies necessary for understanding gene structure, function, and evolution.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Gigascience

DOI

EISSN

2047-217X

Publication Date

October 1, 2017

Volume

6

Issue

10

Start / End Page

1 / 16

Location

United States

Related Subject Headings

  • Sequence Analysis, DNA
  • Nerve Tissue Proteins
  • Male
  • Genome
  • Forkhead Transcription Factors
  • Female
  • Early Growth Response Protein 1
  • Dual Specificity Phosphatase 1
  • Birds
  • Avian Proteins
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Korlach, J., Gedman, G., Kingan, S. B., Chin, C.-S., Howard, J. T., Audet, J.-N., … Jarvis, E. D. (2017). De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. Gigascience, 6(10), 1–16. https://doi.org/10.1093/gigascience/gix085
Korlach, Jonas, Gregory Gedman, Sarah B. Kingan, Chen-Shan Chin, Jason T. Howard, Jean-Nicolas Audet, Lindsey Cantin, and Erich D. Jarvis. “De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads.Gigascience 6, no. 10 (October 1, 2017): 1–16. https://doi.org/10.1093/gigascience/gix085.
Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, et al. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. Gigascience. 2017 Oct 1;6(10):1–16.
Korlach, Jonas, et al. “De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads.Gigascience, vol. 6, no. 10, Oct. 2017, pp. 1–16. Pubmed, doi:10.1093/gigascience/gix085.
Korlach J, Gedman G, Kingan SB, Chin C-S, Howard JT, Audet J-N, Cantin L, Jarvis ED. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. Gigascience. 2017 Oct 1;6(10):1–16.
Journal cover image

Published In

Gigascience

DOI

EISSN

2047-217X

Publication Date

October 1, 2017

Volume

6

Issue

10

Start / End Page

1 / 16

Location

United States

Related Subject Headings

  • Sequence Analysis, DNA
  • Nerve Tissue Proteins
  • Male
  • Genome
  • Forkhead Transcription Factors
  • Female
  • Early Growth Response Protein 1
  • Dual Specificity Phosphatase 1
  • Birds
  • Avian Proteins