Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

Khalid Hasanov, Jean-Noël Quintin, Alexey Lastovetsky

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Original languageEnglish (US)
Pages (from-to)3991-4014
Number of pages24
JournalThe Journal of Supercomputing
Issue number11
StatePublished - Mar 4 2014
Externally publishedYes

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: The research in this paper was supported by IRCSET (Irish Research Council for Science, Engineering and Technology) and IBM, grant numbers EPSG/2011/188 and EPSPD/2011/207. Some of the experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://​www.​grid5000.​fr) Another part of the experiments was carried out using the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) in Thuwal, Saudi Arabia. The authors would like to thank Ashley DeFlumere for her useful comments and corrections.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.


Dive into the research topics of 'Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms'. Together they form a unique fingerprint.

Cite this