Next generation distributed computing for cancer research.

Published online

Journal Article (Review)

Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing.

Full Text

Duke Authors

Cited Authors

  • Agarwal, P; Owzar, K

Published Date

  • 2014

Published In

Volume / Issue

  • 13 / Suppl 7

Start / End Page

  • 97 - 109

PubMed ID

  • 25983539

Pubmed Central ID

  • 25983539

International Standard Serial Number (ISSN)

  • 1176-9351

Digital Object Identifier (DOI)

  • 10.4137/CIN.S16344

Language

  • eng

Conference Location

  • United States