Active and accelerated learning of cost models for optimizing scientific applications

Published

Conference Paper

We present the NIMO system that automatically learns cost models for predicting the execution time of computational-science applications running on large-scale networked utilities such as computational grids. Accurate cost models are important for selecting efficient plans for executing these applications on the utility. Computational-science applications are often scripts (written, e.g., in languages like Perl or Matlab) connected using a workflow-description language, and therefore, pose different challenges compared to modeling the execution of plans for declarative queries with well-Understood semantics. NIMO generates appropriate training samples for these applications to learn fairly-accurate cost models quickly using statistical learning techniques. NIMO's approach is active and noninvasive: it actively deploys and monitors the application under varying conditions, and obtains its training data from passive instrumentation streams that require no changes to the operating system or applications. Our experiments with real scientific applications demonstrate that NIMO significantly reduces the number of training samples and the time to learn fairly-accurate cost models. Copyright 2006 VLDB Endowment, ACM.

Duke Authors

Cited Authors

  • Shivam, P; Babu, S; Chase, J

Published Date

  • December 1, 2006

Published In

  • Vldb 2006 Proceedings of the 32nd International Conference on Very Large Data Bases

Start / End Page

  • 535 - 546

International Standard Book Number 10 (ISBN-10)

  • 1595933859

International Standard Book Number 13 (ISBN-13)

  • 9781595933850

Citation Source

  • Scopus