Skip to main content

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.

Publication ,  Journal Article
Naccache, SN; Federman, S; Veeraraghavan, N; Zaharia, M; Lee, D; Samayoa, E; Bouquet, J; Greninger, AL; Luk, K-C; Enge, B; Wadford, DA; Isa, P ...
Published in: Genome Res
July 2014

Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Genome Res

DOI

EISSN

1549-5469

Publication Date

July 2014

Volume

24

Issue

7

Start / End Page

1180 / 1192

Location

United States

Related Subject Headings

  • Software
  • Reproducibility of Results
  • ROC Curve
  • Metagenomics
  • Humans
  • High-Throughput Nucleotide Sequencing
  • Databases, Nucleic Acid
  • Computational Biology
  • Bioinformatics
  • 3105 Genetics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Naccache, S. N., Federman, S., Veeraraghavan, N., Zaharia, M., Lee, D., Samayoa, E., … Chiu, C. Y. (2014). A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res, 24(7), 1180–1192. https://doi.org/10.1101/gr.171934.113
Naccache, Samia N., Scot Federman, Narayanan Veeraraghavan, Matei Zaharia, Deanna Lee, Erik Samayoa, Jerome Bouquet, et al. “A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.Genome Res 24, no. 7 (July 2014): 1180–92. https://doi.org/10.1101/gr.171934.113.
Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014 Jul;24(7):1180–92.
Naccache, Samia N., et al. “A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.Genome Res, vol. 24, no. 7, July 2014, pp. 1180–92. Pubmed, doi:10.1101/gr.171934.113.
Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk K-C, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E, Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett J, Miller S, Chiu CY. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014 Jul;24(7):1180–1192.

Published In

Genome Res

DOI

EISSN

1549-5469

Publication Date

July 2014

Volume

24

Issue

7

Start / End Page

1180 / 1192

Location

United States

Related Subject Headings

  • Software
  • Reproducibility of Results
  • ROC Curve
  • Metagenomics
  • Humans
  • High-Throughput Nucleotide Sequencing
  • Databases, Nucleic Acid
  • Computational Biology
  • Bioinformatics
  • 3105 Genetics