Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction

Hatem Ltaief, Piotr R. Luszczek, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations


The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system. © 2012 Springer-Verlag.
Original languageEnglish (US)
Title of host publicationParallel Processing and Applied Mathematics
PublisherSpringer Nature
Number of pages10
ISBN (Print)9783642314636
StatePublished - 2012

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction'. Together they form a unique fingerprint.

Cite this