Skip to main content

Recursive array layouts and fast matrix multiplication

Publication ,  Journal Article
Chatterjee, S; Lebeck, AR; Patnala, PK; Thottethodi, M
Published in: IEEE Transactions on Parallel and Distributed Systems
November 1, 2002

The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2-2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between 10 percent and 20 percent. Carrying the recursive layout down to the level of individual matrix elements is shown to be counterproductive; a combination of recursive layouts down to canonically ordered matrix tiles instead yields higher performance. Five recursive layouts with successively increasing complexity of address computation are evaluated and it is shown that addressing overheads can be kept in control even for the most computationally demanding of these layouts.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE Transactions on Parallel and Distributed Systems

DOI

ISSN

1045-9219

Publication Date

November 1, 2002

Volume

13

Issue

11

Start / End Page

1105 / 1123

Related Subject Headings

  • Distributed Computing
  • 4606 Distributed computing and systems software
  • 1005 Communications Technologies
  • 0805 Distributed Computing
  • 0803 Computer Software
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Chatterjee, S., Lebeck, A. R., Patnala, P. K., & Thottethodi, M. (2002). Recursive array layouts and fast matrix multiplication. IEEE Transactions on Parallel and Distributed Systems, 13(11), 1105–1123. https://doi.org/10.1109/TPDS.2002.1058095
Chatterjee, S., A. R. Lebeck, P. K. Patnala, and M. Thottethodi. “Recursive array layouts and fast matrix multiplication.” IEEE Transactions on Parallel and Distributed Systems 13, no. 11 (November 1, 2002): 1105–23. https://doi.org/10.1109/TPDS.2002.1058095.
Chatterjee S, Lebeck AR, Patnala PK, Thottethodi M. Recursive array layouts and fast matrix multiplication. IEEE Transactions on Parallel and Distributed Systems. 2002 Nov 1;13(11):1105–23.
Chatterjee, S., et al. “Recursive array layouts and fast matrix multiplication.” IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 11, Nov. 2002, pp. 1105–23. Scopus, doi:10.1109/TPDS.2002.1058095.
Chatterjee S, Lebeck AR, Patnala PK, Thottethodi M. Recursive array layouts and fast matrix multiplication. IEEE Transactions on Parallel and Distributed Systems. 2002 Nov 1;13(11):1105–1123.

Published In

IEEE Transactions on Parallel and Distributed Systems

DOI

ISSN

1045-9219

Publication Date

November 1, 2002

Volume

13

Issue

11

Start / End Page

1105 / 1123

Related Subject Headings

  • Distributed Computing
  • 4606 Distributed computing and systems software
  • 1005 Communications Technologies
  • 0805 Distributed Computing
  • 0803 Computer Software