Steering Customized AI Architectures for HPC Scientific Applications

Hatem Ltaief*, Yuxi Hong, Adel Dabah, Rabab Alomairy, Sameh Abdulah, Chris Goreczny, Pawel Gepner, Matteo Ravasi, Damien Gratadour, David Keyes

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations


AI hardware technologies have revolutionized computational science. While they have been mostly used to accelerate deep learning training and inference models for machine learning, HPC scientific applications do not seem to directly benefit from these specific hardware features unless AI-based components are introduced into their simulation workflows, for instance, as a replacement of their numerical solvers. This paper proposes to take another direction in an attempt to democratize customized AI architectures for HPC scientific computing. The main idea consists in demonstrating how legacy applications can leverage these AI engines after a necessary algorithmic redesign. It is critical that the resulting software implementations map onto the underlying memory-austere hardware architectures to extract the expected performance. To facilitate this process, we promote the matricization technique for restructuring codes (1) by exploiting data sparsity via algebraic compression and (2) by expressing the critical computational phases in terms of tile low-rank matrix-vector multiplications (TLR-MVM) and batch matrix-matrix multiplications (batch GEMM). Algebraic compression enables to reduce memory footprint and to fit into small local cache/memory, while batch execution ensures high occupancy. We highlight how we can steer the Graphcore AI-focused Wafer-on-Wafer Intelligence Processing Units (IPUs) to deliver high performance for both operations. We conduct a performance benchmarking campaign of these two matrix operations that account for most of the elapsed times of four real applications in computational astronomy, seismic imaging, wireless communications, and climate/weather predictions. We report bandwidth and execution rates with speedup factors up to 150X/14X/25X/40X, respectively, on IPUs compared to other systems.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing - 38th International Conference, ISC High Performance 2023, Proceedings
EditorsAbhinav Bhatele, Jeff Hammond, Marc Baboulin, Carola Kruse
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages19
ISBN (Print)9783031320408
StatePublished - 2023
Event38th International Conference on High Performance Computing, ISC High Performance 2023 - Hamburg, Germany
Duration: May 21 2023May 25 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13948 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference38th International Conference on High Performance Computing, ISC High Performance 2023

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.


  • Batch matrix operations
  • BLAS for Graphcore IPU
  • HPC scientific applications
  • Low-rank matrix computations

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Steering Customized AI Architectures for HPC Scientific Applications'. Together they form a unique fingerprint.

Cite this