TY - GEN
T1 - Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures
AU - AlOnazi, Amani
AU - Rogowski, Marcin
AU - Al-Zawawi, Ahmed
AU - Keyes, David E.
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2018/12/8
Y1 - 2018/12/8
N2 - Two trends are reshaping the landscape of petroleum reservoir simulators, one architecturally and one application driven: an increasing number of cores per node and increasing computational intensity arising from higher fidelity physics at each cell. Implicit algebraic solvers being the dominant kernels, we present hybrid MPI and OpenMP implementations of the linear solver of GigaPOWERS, a full-scale real-world massively parallel simulator for black oil and composition models. We also evaluate the impact of explicit communication and computation overlap by including the halo exchange in the task-dependency graph. We analyze the performance of these modifications across multi- and many-core architectures, i.e., Intel Haswell, Skylake, and Knights Landing, using a variety of synthetic and real-world models. The hybrid approach results in up to 50% reduction of time to solution on a 16 million-cell SPE10-like model on Skylake whereas on a smaller, 1 million-cell, model on Haswell and Knights Landing both implementations achieve very similar performance. In the real-world reservoir simulations, the hybrid parallelism has reduced communication volume, memory consumption, and improved load balancing.
AB - Two trends are reshaping the landscape of petroleum reservoir simulators, one architecturally and one application driven: an increasing number of cores per node and increasing computational intensity arising from higher fidelity physics at each cell. Implicit algebraic solvers being the dominant kernels, we present hybrid MPI and OpenMP implementations of the linear solver of GigaPOWERS, a full-scale real-world massively parallel simulator for black oil and composition models. We also evaluate the impact of explicit communication and computation overlap by including the halo exchange in the task-dependency graph. We analyze the performance of these modifications across multi- and many-core architectures, i.e., Intel Haswell, Skylake, and Knights Landing, using a variety of synthetic and real-world models. The hybrid approach results in up to 50% reduction of time to solution on a 16 million-cell SPE10-like model on Skylake whereas on a smaller, 1 million-cell, model on Haswell and Knights Landing both implementations achieve very similar performance. In the real-world reservoir simulations, the hybrid parallelism has reduced communication volume, memory consumption, and improved load balancing.
UR - http://hdl.handle.net/10754/631262
UR - https://ieeexplore.ieee.org/document/8547565
UR - http://www.scopus.com/inward/record.url?scp=85060101133&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2018.8547565
DO - 10.1109/HPEC.2018.8547565
M3 - Conference contribution
SN - 9781538659892
BT - 2018 IEEE High Performance extreme Computing Conference (HPEC)
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -