Abstract
The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems mainly because of their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design of the multicore platform. The parallel linear algebra for scalable multicore architectures library developed at the University of Tennessee tackles this challenge by using tile algorithms to achieve a finer task granularity. These tile algorithms can then be represented by directed acyclic graphs, where nodes are the tasks and edges are the dependencies between the tasks. The paramount key to achieve high performance is to implement a runtime environment to efficiently schedule the execution of the directed acyclic graph across the multicore platform. This paper studies the impact on the overall performance of some parameters, both at the level of the scheduler (e.g., window size and locality) and the algorithms (e.g., left-looking and right-looking variants). We conclude that some commonly accepted rules for dense linear algebra algorithms may need to be revisited.
Original language | English (US) |
---|---|
Pages (from-to) | 305-321 |
Number of pages | 17 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 24 |
Issue number | 3 |
DOIs | |
State | Published - Mar 10 2012 |
Keywords
- dense linear algebra
- dynamic scheduling
- looking variants
- multicore architectures
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Computer Science Applications
- Computer Networks and Communications
- Computational Theory and Mathematics