Scholars@Duke publication: An efficient strategy for the collection and storage of large volumes of data for computation

An efficient strategy for the collection and storage of large volumes of data for computation

Publication , Journal Article

Suthakar, U; Magnoni, L; Smith, DR; Khan, A; Andreeva, J

Published in: Journal of Big Data

December 1, 2016

In recent years, there has been an increasing amount of data being produced and stored, which is known as Big Data. The social networks, internet of things, scientific experiments and commercial services play a significant role in generating a vast amount of data. Three main factors are important in Big Data; Volume, Velocity and Variety. One needs to consider all three factors when designing a platform to support Big Data. The Large Hadron Collider (LHC) particle accelerator at CERN consists of a number of data-intensive experiments, which are estimated to produce a volume of about 30 PB of data, annually. The velocity of these data that are propagated will be extremely fast. Traditional methods of collecting, storing and analysing data have become insufficient in managing the rapidly growing volume of data. Therefore, it is essential to have an efficient strategy to capture these data as they are produced. In this paper, a number of models are explored to understand what should be the best approach for collecting and storing Big Data for analytics. An evaluation of the performance of full execution cycles of these approaches on the monitoring of the Worldwide LHC Computing Grid (WLCG) infrastructure for collecting, storing and analysing data is presented. Moreover, the models discussed are applied to a community driven software solution, Apache Flume, to show how they can be integrated, seamlessly.

Duke Scholars

Author David R. Smith Pierre R. Lamond Department of Electrical and Computer Engin ...

Published In

Journal of Big Data

DOI

10.1186/s40537-016-0056-1

EISSN

2196-1115

Publication Date

December 1, 2016

Volume

Issue

Related Subject Headings

46 Information and computing sciences
08 Information and Computing Sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Suthakar, U., Magnoni, L., Smith, D. R., Khan, A., & Andreeva, J. (2016). An efficient strategy for the collection and storage of large volumes of data for computation. Journal of Big Data, 3(1). https://doi.org/10.1186/s40537-016-0056-1

Suthakar, U., L. Magnoni, D. R. Smith, A. Khan, and J. Andreeva. “An efficient strategy for the collection and storage of large volumes of data for computation.” Journal of Big Data 3, no. 1 (December 1, 2016). https://doi.org/10.1186/s40537-016-0056-1.

Suthakar U, Magnoni L, Smith DR, Khan A, Andreeva J. An efficient strategy for the collection and storage of large volumes of data for computation. Journal of Big Data. 2016 Dec 1;3(1).

Suthakar, U., et al. “An efficient strategy for the collection and storage of large volumes of data for computation.” Journal of Big Data, vol. 3, no. 1, Dec. 2016. Scopus, doi:10.1186/s40537-016-0056-1.

Suthakar U, Magnoni L, Smith DR, Khan A, Andreeva J. An efficient strategy for the collection and storage of large volumes of data for computation. Journal of Big Data. 2016 Dec 1;3(1).

Published In

Journal of Big Data

DOI

10.1186/s40537-016-0056-1

EISSN

2196-1115

Publication Date

December 1, 2016

Volume

Issue

Related Subject Headings

46 Information and computing sciences
08 Information and Computing Sciences