A High Performance QDWH-SVD Solver Using Hardware Accelerators

Dalal Sukkari*, Hatem Ltaief, David Keyes

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Scopus citations


This article describes a new high performance implementation of the QR-based Dynamically Weighted Halley Singular Value Decomposition (QDWH-SVD) solver on multicore architecture enhanced with multiple GPUs. The standard QDWH-SVD algorithm was introduced by Nakatsukasa and Higham (SIAM SISC, 2013) and combines three successive computational stages: (1) the polar decomposition calculation of the original matrix using the QDWH algorithm, (2) the symmetric eigendecomposition of the resulting polar factor to obtain the singular values and the right singular vectors, and (3) the matrix-matrix multiplication to get the associated left singular vectors. A comprehensive test suite highlights the numerical robustness of the QDWH-SVD solver. Although it performs up to two times more flops when computing all singular vectors compared to the standard SVD solver algorithm, our new high performance implementation on single GPU results in up to 4x improvements for asymptotic matrix sizes, compared to the equivalent routines from existing state-of-the-art open-source and commercial libraries. However, when only singular values are needed, QDWH-SVD is penalized by performing more flops by an order of magnitude. The singular value only implementation of QDWH-SVD on single GPU can still run up to 18% faster than the best existing equivalent routines.

Original languageEnglish
Article number6
Number of pages25
JournalACM Transactions on Mathematical Software
Issue number1
StatePublished - Aug 2016


  • Design
  • Algorithms
  • Singular value decomposition
  • polar decomposition
  • symmetric eigensolver
  • mixed precision algorithms
  • GPU-based scientific computing


Dive into the research topics of 'A High Performance QDWH-SVD Solver Using Hardware Accelerators'. Together they form a unique fingerprint.

Cite this