Skip to main content

Recursive array layouts and fast parallel matrix multiplication

Publication ,  Conference
Chatterjee, S; Lebeck, AR; Patnala, PK; Thottethodi, M
Published in: Annual ACM Symposium on Parallel Algorithms and Architectures
January 1, 1999

Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional column-major or row-major array layouts incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts for improving the performance of parallel recursive matrix multiplication algorithms. We extend previous work by Frens and Wise on recursive matrix multiplication to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. We show that while recursive array layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2-2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms; we provide an algorithmic explanation of this phenomenon. We demonstrate that carrying the recursive layout down to the level of individual matrix elements is counter-productive, and that a combination of recursive layouts down to canonically ordered matrix tiles instead yields higher performance. We evaluate five recursive layouts with successively increasing complexity of address computation, and show that addressing overheads can be kept in control even for the most computationally demanding of these layouts. Finally, we provide a critique of the Cilk system that we used to parallelize our code.

Duke Scholars

Published In

Annual ACM Symposium on Parallel Algorithms and Architectures

DOI

Publication Date

January 1, 1999

Start / End Page

222 / 231
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Chatterjee, S., Lebeck, A. R., Patnala, P. K., & Thottethodi, M. (1999). Recursive array layouts and fast parallel matrix multiplication. In Annual ACM Symposium on Parallel Algorithms and Architectures (pp. 222–231). https://doi.org/10.1145/305619.305645
Chatterjee, S., A. R. Lebeck, P. K. Patnala, and M. Thottethodi. “Recursive array layouts and fast parallel matrix multiplication.” In Annual ACM Symposium on Parallel Algorithms and Architectures, 222–31, 1999. https://doi.org/10.1145/305619.305645.
Chatterjee S, Lebeck AR, Patnala PK, Thottethodi M. Recursive array layouts and fast parallel matrix multiplication. In: Annual ACM Symposium on Parallel Algorithms and Architectures. 1999. p. 222–31.
Chatterjee, S., et al. “Recursive array layouts and fast parallel matrix multiplication.” Annual ACM Symposium on Parallel Algorithms and Architectures, 1999, pp. 222–31. Scopus, doi:10.1145/305619.305645.
Chatterjee S, Lebeck AR, Patnala PK, Thottethodi M. Recursive array layouts and fast parallel matrix multiplication. Annual ACM Symposium on Parallel Algorithms and Architectures. 1999. p. 222–231.

Published In

Annual ACM Symposium on Parallel Algorithms and Architectures

DOI

Publication Date

January 1, 1999

Start / End Page

222 / 231