Skip to main content
Journal cover image

Semi-automated assembly of high-quality diploid human reference genomes.

Publication ,  Journal Article
Jarvis, ED; Formenti, G; Rhie, A; Guarracino, A; Yang, C; Wood, J; Tracey, A; Thibaud-Nissen, F; Vollger, MR; Porubsky, D; Cheng, H; Asri, M ...
Published in: Nature
November 2022

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Nature

DOI

EISSN

1476-4687

Publication Date

November 2022

Volume

611

Issue

7936

Start / End Page

519 / 531

Location

England

Related Subject Headings

  • Sequence Analysis, DNA
  • Reference Standards
  • Humans
  • High-Throughput Nucleotide Sequencing
  • Haplotypes
  • Genomics
  • Genome, Human
  • Genetic Variation
  • General Science & Technology
  • Diploidy
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Jarvis, E. D., Formenti, G., Rhie, A., Guarracino, A., Yang, C., Wood, J., … Human Pangenome Reference Consortium. (2022). Semi-automated assembly of high-quality diploid human reference genomes. Nature, 611(7936), 519–531. https://doi.org/10.1038/s41586-022-05325-5
Jarvis, Erich D., Giulio Formenti, Arang Rhie, Andrea Guarracino, Chentao Yang, Jonathan Wood, Alan Tracey, et al. “Semi-automated assembly of high-quality diploid human reference genomes.Nature 611, no. 7936 (November 2022): 519–31. https://doi.org/10.1038/s41586-022-05325-5.
Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature. 2022 Nov;611(7936):519–31.
Jarvis, Erich D., et al. “Semi-automated assembly of high-quality diploid human reference genomes.Nature, vol. 611, no. 7936, Nov. 2022, pp. 519–31. Pubmed, doi:10.1038/s41586-022-05325-5.
Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin C-S, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, Miga KH, Human Pangenome Reference Consortium. Semi-automated assembly of high-quality diploid human reference genomes. Nature. 2022 Nov;611(7936):519–531.
Journal cover image

Published In

Nature

DOI

EISSN

1476-4687

Publication Date

November 2022

Volume

611

Issue

7936

Start / End Page

519 / 531

Location

England

Related Subject Headings

  • Sequence Analysis, DNA
  • Reference Standards
  • Humans
  • High-Throughput Nucleotide Sequencing
  • Haplotypes
  • Genomics
  • Genome, Human
  • Genetic Variation
  • General Science & Technology
  • Diploidy