Space and time efficient implementations of parallel nested dissection
Nested dissection is a method for solving large sparse systems of linear equations. The method is inherently parallelizable and efficient algorithms for parallel nested dissection (PND) exist for various parallel architectures. Birkhoff and George showed that for a sparse grid graph, a parallel algorithm with n processors takes O(√n) parallel time. The algorithm in works on a PRAM model, using O(n3/2) processors and taking O((log n)3) time for a general class of matrices which have have O(√n) size separators. Both these algorithms use a logarithmic factor of space larger than the input matrix. Our results are as follows: (1) We describe a FAST PND algorithm for the PRAM model, which reduces the time bound from O((log n)3) to O((log n)2), without significantly increasing the processor bound. (2) We implement a PND algorithm that works on a mesh-connected processor array, using O(n) processors and taking O(√n) time for the important and more general case where the graph of the matrix is of constant degree and separator size √n (as required for many 2D PDE applications). (3) We describe a COMPACT PND algorithm which reduces the space bounds to a constant factor of the input matrix, without significant increase in time or processor bounds. In addition to these practical results, we show that it is theoretically possible to achieve tighter bounds on the amount of work performed (and thus on time and processor bounds), using known theoretical results about parallel matrix multiplication. Finally, we show that our algorithms generalize to solve all pairs minimal cost path problems within the same complexity.