XChange: Coupling parallel applications in a dynamic environment
Modern computational science applications are becoming increasingly multi-disciplinary, involving widely distributed research teams and their underlying computational platforms. A common problem for the grid applications used in these environments is the necessity to couple multiple, parallel subsystems, with examples ranging from data exchanges between cooperating, linked parallel programs, to concurrent data streaming to distributed storage engines. This paper presents the XChange mxn middleware infrastructure for coupling componentized distributed applications. XChange mxn implements the basic functionality of well-known services like the CCA Forum's MxN project, by providing efficient data redistribution across parallel application components. Beyond such basic functionality, however, XChange mxn also addresses two of the problems faced by wide area scientific collaborations, which are (1) the need to deal with dynamic application/ component behaviors, such as dynamic arrivals and departures due to the availability of additional resources, and (2) the need to 'match ' data formats across disparate application components and research teams. In response to these needs, XChange mxn uses an anonymous publish/subscribe model for linking interacting components, and the data being exchanged is dynamically specialized and transformed to match end point requirements. The pub/sub paradigm makes it easy to deal with dynamic component arrivals and departures. Dynamic data trans-formation enables the 'in flight' correction of data or needs mismatches for cooperating components. This paper describes the design and implementation of XChange mxn, and it evaluates its implementation compared to those of less flexible transports like MPI. It also highlights the utility of XChange mxn's 'in flight' data specialization, by applying it to the SmartPointer parallel data visualization environment developed at our institution. Interestingly, using XChange mxn did not significantly affect performance but led to a reduction in the size of the code base. © 2004 IEEE.