Abstract
The rapid growth in simulation data requires large-scale parallel implementations of scientific analysis and visualization algorithms, both to produce results within an acceptable timeframe and to enable in situ deployment. However, efficient and scalable implementations, especially of more complex analysis approaches, require not only advanced algorithms, but also an in-depth knowledge of the underlying runtime. Furthermore, different machine configurations and different applications may favor different runtimes, i.e., MPI vs Charm++ vs Legion, etc., and different hardware architectures. This diversity makes developing and maintaining a broadly applicable analysis software infrastructure challenging. We address some of these problems by explicitly separating the implementation of individual tasks of an algorithm from the dataflow connecting these tasks. In particular, we present an embedded domain specific language (EDSL) to describe algorithms using a new task graph abstraction. This task graph is then executed on top of one of several available runtimes (MPI, Charm++, Legion) using a thin layer of library calls. We demonstrate the flexibility and performance of this approach using three different large scale analysis and visualization use cases, i.e., topological analysis, rendering and compositing dataflow, and image registration of large microscopy scans. Despite the unavoidable overheads of a generic solution, our approach demonstrates performance portability at scale, and, in some cases, outperforms hand-optimized implementations.
Original language | English (US) |
---|---|
Title of host publication | 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
Publisher | IEEE |
Pages | 463-473 |
Number of pages | 11 |
ISBN (Print) | 9781538643686 |
DOIs | |
State | Published - Aug 6 2018 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2022-06-23Acknowledgements: This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This work is also supported in part by NSF: CGV: Award: 1314896, NSF: IIP Award: 1602127 NSF:ACI:award 1649923, DOE/SciDAC DESC0007446, CCMSC DE-NA0002375, and PIPER: ER26142 DE-SC0010498 and by the Department of Energy under the guidance of Dr. Lucy Nowell and Richard Carson. This research used the resources of the Supercomputing Laboratory at KAUST, Saudi Arabia.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.