Long-read sequencing and de novo assembly of a Chinese genome.

Published online

Journal Article

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

Full Text

Cited Authors

  • Shi, L; Guo, Y; Dong, C; Huddleston, J; Yang, H; Han, X; Fu, A; Li, Q; Li, N; Gong, S; Lintner, KE; Ding, Q; Wang, Z; Hu, J; Wang, D; Wang, F; Wang, L; Lyon, GJ; Guan, Y; Shen, Y; Evgrafov, OV; Knowles, JA; Thibaud-Nissen, F; Schneider, V; Yu, C-Y; Zhou, L; Eichler, EE; So, K-F; Wang, K

Published Date

  • June 30, 2016

Published In

Volume / Issue

  • 7 /

Start / End Page

  • 12065 -

PubMed ID

  • 27356984

Pubmed Central ID

  • 27356984

Electronic International Standard Serial Number (EISSN)

  • 2041-1723

Digital Object Identifier (DOI)

  • 10.1038/ncomms12065

Language

  • eng

Conference Location

  • England