Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping.

Journal Article (Journal Article)

SUMMARY: We present Nubeam-dedup, a fast and RAM-efficient tool to de-duplicate sequencing reads without reference genome. Nubeam-dedup represents nucleotides by matrices, transforms reads into products of matrices, and based on which assigns a unique number to a read. Thus, duplicate reads can be efficiently removed by using a collisionless hash function. Compared with other state-of-the-art reference-free tools, Nubeam-dedup uses 50-70% of CPU time and 10-15% of RAM. AVAILABILITY AND IMPLEMENTATION: Source code in C++ and manual are available at https://github.com/daihang16/nubeamdedup and https://haplotype.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Full Text

Duke Authors

Cited Authors

  • Dai, H; Guan, Y

Published Date

  • May 1, 2020

Published In

Volume / Issue

  • 36 / 10

Start / End Page

  • 3254 - 3256

PubMed ID

  • 32091581

Electronic International Standard Serial Number (EISSN)

  • 1367-4811

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btaa112

Language

  • eng

Conference Location

  • England