Fast fourier transform accelerated fast multipole algorithm
This paper describes an O(p2 log2(p)N) implementation of the fast multipole algorithm (FMA) for N-body simulations. This method of computing the FMA is faster than the original, which is O(p4N), where p is the number of terms retained in the truncated multipole expansion representation of the potential field of a collection of charged particles. The p term determines the accuracy of the calculation. The limiting O(p4) computation in the original FMA is a convolution-like operation on a matrix of multipole coefficients. This paper describes the implementation details of a conversion of this limiting computation to linear convolution, which is then computed in the Fourier domain using the fast Fourier transform (FFT), based on a method originally outlined by Greengard and Rokhlin. In addition, this paper describes a new block decomposition of the multipole expansion data that provides numerical stability and efficient computation. The resulting O(p2 log2(p)) subroutine has a speedup of 2 on a sequential processor over the original method for p = 8, and a speedup of 4 for p = 16. The new subroutine vectorizes well and has a speedup of 3 on a vector processor at p = 8 and a speedup of 6 at p = 16.
Volume / Issue
Start / End Page
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)