Abstract
AI hardware technologies have revolutionized computational science. While they have been mostly used to accelerate deep learning training and inference models for machine learning, HPC scientific applications do not seem to directly benefit from these specific hardware features unless AI-based components are introduced into their simulation workflows, for instance, as a replacement of their numerical solvers. This paper proposes to take another direction in an attempt to democratize customized AI architectures for HPC scientific computing. The main idea consists in demonstrating how legacy applications can leverage these AI engines after a necessary algorithmic redesign. It is critical that the resulting software implementations map onto the underlying memory-austere hardware architectures to extract the expected performance. To facilitate this process, we promote the matricization technique for restructuring codes (1) by exploiting data sparsity via algebraic compression and (2) by expressing the critical computational phases in terms of tile low-rank matrix-vector multiplications (TLR-MVM) and batch matrix-matrix multiplications (batch GEMM). Algebraic compression enables to reduce memory footprint and to fit into small local cache/memory, while batch execution ensures high occupancy. We highlight how we can steer the Graphcore AI-focused Wafer-on-Wafer Intelligence Processing Units (IPUs) to deliver high performance for both operations. We conduct a performance benchmarking campaign of these two matrix operations that account for most of the elapsed times of four real applications in computational astronomy, seismic imaging, wireless communications, and climate/weather predictions. We report bandwidth and execution rates with speedup factors up to 150X/14X/25X/40X, respectively, on IPUs compared to other systems.
Original language | English (US) |
---|---|
Title of host publication | High Performance Computing - 38th International Conference, ISC High Performance 2023, Proceedings |
Editors | Abhinav Bhatele, Jeff Hammond, Marc Baboulin, Carola Kruse |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 125-143 |
Number of pages | 19 |
ISBN (Print) | 9783031320408 |
DOIs | |
State | Published - 2023 |
Event | 38th International Conference on High Performance Computing, ISC High Performance 2023 - Hamburg, Germany Duration: May 21 2023 → May 25 2023 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13948 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 38th International Conference on High Performance Computing, ISC High Performance 2023 |
---|---|
Country/Territory | Germany |
City | Hamburg |
Period | 05/21/23 → 05/25/23 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- Batch matrix operations
- BLAS for Graphcore IPU
- HPC scientific applications
- Low-rank matrix computations
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science