Scheduling dense linear algebra operations on multicore processors

Jakub Kurzak*, Hatem Ltaief, Jack Dongarra, Rosa M. Badia

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

66 Scopus citations


State-of-the-art dense linear algebra software, such as the LAPACK and ScaLAPACK libraries, suffers performance losses on multicore processors due to their inability to fully exploit thread-level parallelism. At the same time, the coarse-grain dataflow model gains popularity as a paradigm for programming multicore architectures. This work looks at implementing classic dense linear algebra workloads, the Cholesky factorization, the QR factorization and the LU factorization, using dynamic data-driven execution. Two emerging approaches to implementing coarse-grain dataflow are examined, the model of nested parallelism, represented by the Cilk framework, and the model of parallelism expressed through an arbitrary Direct Acyclic Graph, represented by the SMP Superscalar framework. Performance and coding effort are analyzed and compared against code manually parallelized at the thread level.

Original languageEnglish (US)
Pages (from-to)15-44
Number of pages30
JournalConcurrency Computation Practice and Experience
Issue number1
StatePublished - Jan 1 2010


  • Cholesky
  • Direct acyclic graph
  • Dynamic scheduling
  • Factorization
  • LU
  • Linear algebra
  • Matrix factorization
  • Multicore
  • QR
  • Scheduling
  • Task graph

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics


Dive into the research topics of 'Scheduling dense linear algebra operations on multicore processors'. Together they form a unique fingerprint.

Cite this