TY - JOUR
T1 - Extreme scale FMM-accelerated boundary integral equation solver for wave scattering
AU - AbdulJabbar, Mustafa Abdulmajeed
AU - Al Farhan, Mohammed
AU - Al-Harthi, Noha A.
AU - Chen, Rui
AU - Yokota, Rio
AU - Bagci, Hakan
AU - Keyes, David E.
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2019/6/20
Y1 - 2019/6/20
N2 - We present a boundary integral equation solver for wave scattering suited for manycore processors, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic and architecture-oriented optimizations are essential for achieving worthy performance. We use the GMRES iterative method and the fast multipole method (FMM) to implement the matrix-vector product (MatVec) kernel. We implement highly optimized kernels for both shared- and distributed-memory architectures and provide various performance models for tuning the task-based tree traversal implementation of FMM. We develop optimal architecture-specific and algorithm-aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. Task-based traversal achieves roughly 77\% and 60\% of the peak single precision floating point performance of a 56-core Skylake Scalable processor and a 72-core Knights Landing processor, respectively. These represent approximately sevenfold speedups compared to the baseline scalar code. We report weak scalability in accordance with the best possible O(log P) communication complexity and the theoretical scaling complexity of FMM. These result in up to 85\% efficiency in strong scaling while computing in excess of 2 billion degrees of freedom (DoFs) on the full-scale Cray XC40. To demonstrate the applicability of the proposed implementation of FMM, we use it in the solution of the acoustic scattering problem involving soft targets. This calls for the discretization of an integral equation with an oscillatory kernel and (weak) 1/R singularity. Applications of FMM to the problems involving hard targets, which call for implementation of more complicated singularity extraction/cancellation schemes to account for 1/R2 singularity, or to Burton-Miller surface integral formulation, which results in a well-conditioned matrix system, are not considered in this work, but the efficiency and scalability results demonstrated here are expected to be observed for these cases as well. The numerical results are converged to within 1.0e-4 relative 2-norm residual accuracy of the analytical solution for the sphere, and a multiple flower-shaped scatterer displays comparable algebraic convergence in a more general setting.
AB - We present a boundary integral equation solver for wave scattering suited for manycore processors, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic and architecture-oriented optimizations are essential for achieving worthy performance. We use the GMRES iterative method and the fast multipole method (FMM) to implement the matrix-vector product (MatVec) kernel. We implement highly optimized kernels for both shared- and distributed-memory architectures and provide various performance models for tuning the task-based tree traversal implementation of FMM. We develop optimal architecture-specific and algorithm-aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. Task-based traversal achieves roughly 77\% and 60\% of the peak single precision floating point performance of a 56-core Skylake Scalable processor and a 72-core Knights Landing processor, respectively. These represent approximately sevenfold speedups compared to the baseline scalar code. We report weak scalability in accordance with the best possible O(log P) communication complexity and the theoretical scaling complexity of FMM. These result in up to 85\% efficiency in strong scaling while computing in excess of 2 billion degrees of freedom (DoFs) on the full-scale Cray XC40. To demonstrate the applicability of the proposed implementation of FMM, we use it in the solution of the acoustic scattering problem involving soft targets. This calls for the discretization of an integral equation with an oscillatory kernel and (weak) 1/R singularity. Applications of FMM to the problems involving hard targets, which call for implementation of more complicated singularity extraction/cancellation schemes to account for 1/R2 singularity, or to Burton-Miller surface integral formulation, which results in a well-conditioned matrix system, are not considered in this work, but the efficiency and scalability results demonstrated here are expected to be observed for these cases as well. The numerical results are converged to within 1.0e-4 relative 2-norm residual accuracy of the analytical solution for the sphere, and a multiple flower-shaped scatterer displays comparable algebraic convergence in a more general setting.
UR - http://hdl.handle.net/10754/627504
UR - https://epubs.siam.org/doi/10.1137/18M1173599
UR - http://www.scopus.com/inward/record.url?scp=85071924501&partnerID=8YFLogxK
U2 - 10.1137/18M1173599
DO - 10.1137/18M1173599
M3 - Article
SN - 1064-8275
VL - 41
SP - C245-C268
JO - SIAM Journal on Scientific Computing
JF - SIAM Journal on Scientific Computing
IS - 3
ER -