Digital Sherpa
Currently users of high performance computers are overwhelmed with non-scalable tasks such as job submission and monitoring. Many users are limited by the number of jobs they can submit to one High Performance Computing (HPC) resource at a time, which results in very long queue times. Digital Sherpa is a grid application for executing jobs on many separate HPC resources at a time, which can reduce total queue time. It automates non-scalable tasks such as job submission and monitoring, and includes recovery features such as resubmission of failed jobs. Digital Sherpa has been implemented for MGAC, a parallel distributed application for the prediction of atomic clusters and crystal structures using Genetic Algorithms. Success has been found using Digital Sherpa in a prototype of an HPC oriented combustion simulation application as well as on the TeraGrid. The high level goal is to allow Digital Sherpa to interoperate with any HPC application. © 2006 IEEE.