High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications

Ahmad Abdelfattah, Hatem Ltaief, David E. Keyes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Leveraging optimization techniques (e.g., register blocking and double buffering) introduced in the context of KBLAS, a Level 2 BLAS high performance library on GPUs, the authors implement dense matrix-vector multiplications within a sparse-block structure. While these optimizations are important for high performance dense kernel executions, they are even more critical when dealing with sparse linear algebra operations. The most time-consuming phase of many multicomponent applications, such as models of reacting flows or petroleum reservoirs, is the solution at each implicit time step of large, sparse spatially structured or unstructured linear systems. The standard method is a preconditioned Krylov solver. The Sparse Matrix-Vector multiplication (SpMV) is, in turn, one of the most time-consuming operations in such solvers. Because there is no data reuse of the elements of the matrix within a single SpMV, kernel performance is limited by the speed at which data can be transferred from memory to registers, making the bus bandwidth the major bottleneck. On the other hand, in case of a multi-species model, the resulting Jacobian has a dense block structure. For contemporary petroleum reservoir simulations, the block size typically ranges from three to a few dozen among different models, and still larger blocks are relevant within adaptively model-refined regions of the domain, though generally the size of the blocks, related to the number of conserved species, is constant over large regions within a given model. This structure can be exploited beyond the convenience of a block compressed row data format, because it offers opportunities to hide the data motion with useful computations. The new SpMV kernel outperforms existing state-of-the-art implementations on single and multi-GPUs using matrices with dense block structure representative of porous media applications with both structured and unstructured multi-component grids.
Original languageEnglish (US)
Title of host publicationEuro-Par 2015: Parallel Processing
PublisherSpringer Nature
Pages601-612
Number of pages12
ISBN (Print)9783662480953
DOIs
StatePublished - Jul 25 2015

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01

Fingerprint

Dive into the research topics of 'High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications'. Together they form a unique fingerprint.

Cite this