O(log2n) time efficient parallel factorization of dense, sparse separable, and banded matrices
Known polylog parallel algorithms for the solution of linear systems and related problems require computation of the characteristic polynomial or related forms, which are known to be highly unstable in practice. However, matrix factorizations of various types, bypassing computation of the characteristic polynomial, are used extensively in sequential numerical computations and are essential in many applications. This paper gives new parallel methods for various exact factorizations of several classes of matrices. We assume the input matrices are n × n with either integer entries of size ≤ 2no(t)(1) or rational entries expressible as a ratio of integers of size ≤ 2no(1). We make no other assumption on the input. We assume the arithmetic PRAM model of parallel computation. Our main result is a reduction of the known parallel time bounds for these factorizations from O(log3n) to O(log2n). Our results are work efficient; we match the best known work bounds of parallel algorithms with polylog time bounds, and are within a log n factor of the work bounds for the best known sequential algorithms for the same problems. The exact factorizations we compute for symmetric positive definite matrices include: 1. recursive factorization sequences and trees, 2. LU factorizations, 3. QR factorizations, and 4. reduction into upper Hessenberg form. The classes of matrices for which we can efficiently compute these factorizations include: 1. dense matrices, in time O(log2 n) with processor bound P(n) (the number of processors needed to multiply two n x n matrices in O(logn) time). 2. block diagonal matrices, in time O(log2 6) with P(b)n/b processors, 3. sparse matrices which are s(n)-separable (recursive factorizations only), in time O(log2n) with P(s(n)) processors where s(n) is of the form nγ for 0 < 7 < 1, and 4. banded matrices, in parallel time O((log n) log b) with P(b)n/b processors. Our factorizations also provide us similarly efficient algorithms for exact computation (given arbitrary rational matrices that need not be symmetric positive definite) of the following: 1. solution of the corresponding linear systems, 2. The determinant, 3. The inverse. Thus our results provide the first known efficient parallel algorithms for exact solution of these matrix problems, that avoids computation of the characteristic polynomial or related forms. Instead we use a construction which modifies the input matrix, which may initially have arbitrary condition, so as to have condition nearly 1, and then applies a multilevel, pipelined Newton iteration, followed by a similar multilevel, pipelined Hensel Lifting.