Skip to main content

Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more.

Publication ,  Journal Article
Sugimoto, R; Nishimura, L; Nguyen, PT; Ito, J; Parrish, NF; Mori, H; Kurokawa, K; Nakaoka, H; Inoue, I
Published in: PLoS computational biology
October 2021

Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction.

Duke Scholars

Published In

PLoS computational biology

DOI

EISSN

1553-7358

ISSN

1553-734X

Publication Date

October 2021

Volume

17

Issue

10

Start / End Page

e1009428

Related Subject Headings

  • Viruses
  • Sequence Analysis, DNA
  • Plasmids
  • Metagenomics
  • Metagenome
  • Humans
  • Gastrointestinal Microbiome
  • Clustered Regularly Interspaced Short Palindromic Repeats
  • Bioinformatics
  • Archaea
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sugimoto, R., Nishimura, L., Nguyen, P. T., Ito, J., Parrish, N. F., Mori, H., … Inoue, I. (2021). Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. PLoS Computational Biology, 17(10), e1009428. https://doi.org/10.1371/journal.pcbi.1009428
Sugimoto, Ryota, Luca Nishimura, Phuong Thanh Nguyen, Jumpei Ito, Nicholas F. Parrish, Hiroshi Mori, Ken Kurokawa, Hirofumi Nakaoka, and Ituro Inoue. “Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more.PLoS Computational Biology 17, no. 10 (October 2021): e1009428. https://doi.org/10.1371/journal.pcbi.1009428.
Sugimoto R, Nishimura L, Nguyen PT, Ito J, Parrish NF, Mori H, et al. Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. PLoS computational biology. 2021 Oct;17(10):e1009428.
Sugimoto, Ryota, et al. “Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more.PLoS Computational Biology, vol. 17, no. 10, Oct. 2021, p. e1009428. Epmc, doi:10.1371/journal.pcbi.1009428.
Sugimoto R, Nishimura L, Nguyen PT, Ito J, Parrish NF, Mori H, Kurokawa K, Nakaoka H, Inoue I. Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more. PLoS computational biology. 2021 Oct;17(10):e1009428.

Published In

PLoS computational biology

DOI

EISSN

1553-7358

ISSN

1553-734X

Publication Date

October 2021

Volume

17

Issue

10

Start / End Page

e1009428

Related Subject Headings

  • Viruses
  • Sequence Analysis, DNA
  • Plasmids
  • Metagenomics
  • Metagenome
  • Humans
  • Gastrointestinal Microbiome
  • Clustered Regularly Interspaced Short Palindromic Repeats
  • Bioinformatics
  • Archaea