Skip to main content

Optimised Lambda Architecture for Monitoring Scientific Infrastructure

Publication ,  Journal Article
Suthakar, U; Magnoni, L; Smith, DR; Khan, A
Published in: IEEE Transactions on Parallel and Distributed Systems
June 1, 2021

Within scientific infrastructuscientists execute millions of computational jobs daily, resulting in the movement of petabytes of data over the heterogeneous infrastructure. Monitoring the computing and user activities over such a complex infrastructure is incredibly demanding. Whereas present solutions are traditionally based on a Relational Database Management System (RDBMS) for data storage and processing, recent developments evaluate the Lambda Architecture (LA). In particular these studies have evaluated data storage and batch processing for processing large-scale monitoring datasets using Hadoop and its MapReduce framework. Although LA performed better than the RDBMS following evaluation, it was fairly complex to implement and maintain. This paper presents an Optimised Lambda Architecture (OLA) using the Apache Spark ecosystem, which involves modelling an efficient way of joining batch computation and real-time computation transparently without the need to add complexity. A few models were explored: pure streaming, pure batch computation, and the combination of both batch and streaming. An evaluation of the OLA on the CERN IT on-premises Hadoop cluster and the public Amazon cloud infrastructure for the monitoring WLCG Data acTivities (WDT) use case are both presented, demonstrating how the new architecture can offer benefits by combining both batch and real-time processing to compensate for batch-processing latency.

Duke Scholars

Published In

IEEE Transactions on Parallel and Distributed Systems

DOI

EISSN

1558-2183

ISSN

1045-9219

Publication Date

June 1, 2021

Volume

32

Issue

6

Start / End Page

1395 / 1408

Related Subject Headings

  • Distributed Computing
  • 4606 Distributed computing and systems software
  • 1005 Communications Technologies
  • 0805 Distributed Computing
  • 0803 Computer Software
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Suthakar, U., Magnoni, L., Smith, D. R., & Khan, A. (2021). Optimised Lambda Architecture for Monitoring Scientific Infrastructure. IEEE Transactions on Parallel and Distributed Systems, 32(6), 1395–1408. https://doi.org/10.1109/TPDS.2017.2772241
Suthakar, U., L. Magnoni, D. R. Smith, and A. Khan. “Optimised Lambda Architecture for Monitoring Scientific Infrastructure.” IEEE Transactions on Parallel and Distributed Systems 32, no. 6 (June 1, 2021): 1395–1408. https://doi.org/10.1109/TPDS.2017.2772241.
Suthakar U, Magnoni L, Smith DR, Khan A. Optimised Lambda Architecture for Monitoring Scientific Infrastructure. IEEE Transactions on Parallel and Distributed Systems. 2021 Jun 1;32(6):1395–408.
Suthakar, U., et al. “Optimised Lambda Architecture for Monitoring Scientific Infrastructure.” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 6, June 2021, pp. 1395–408. Scopus, doi:10.1109/TPDS.2017.2772241.
Suthakar U, Magnoni L, Smith DR, Khan A. Optimised Lambda Architecture for Monitoring Scientific Infrastructure. IEEE Transactions on Parallel and Distributed Systems. 2021 Jun 1;32(6):1395–1408.

Published In

IEEE Transactions on Parallel and Distributed Systems

DOI

EISSN

1558-2183

ISSN

1045-9219

Publication Date

June 1, 2021

Volume

32

Issue

6

Start / End Page

1395 / 1408

Related Subject Headings

  • Distributed Computing
  • 4606 Distributed computing and systems software
  • 1005 Communications Technologies
  • 0805 Distributed Computing
  • 0803 Computer Software