Abstract
The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
Original language | English (US) |
---|---|
Title of host publication | High Performance Computing and Applications |
Publisher | Springer Nature |
Pages | 38-47 |
Number of pages | 10 |
ISBN (Print) | 9783642118418 |
DOIs | |
State | Published - 2010 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledged KAUST grant number(s): KUS-C1-016-04
Acknowledgements: This publication is based on work supported in part by NSF grants OISE-0405349, ACI-0305466, CNS-0719626, and ACI-0324876, by DOE project DE-FC26-08NT4, by FWF project SFB032, by BMWF project AustrianGrid 2, and Award No. KUS-C1-016-04, made by King Abdullah University of Science and Technology (KAUST).
This publication acknowledges KAUST support, but has no KAUST affiliated authors.