Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

Mustafa Abdulmajeed AbdulJabbar, Mohammed Al Farhan, Rio Yokota, David E. Keyes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations


Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.
Original languageEnglish (US)
Title of host publicationEuro-Par 2017: Parallel Processing
PublisherSpringer Nature
Number of pages12
ISBN (Print)9783319642024
StatePublished - Aug 1 2017

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01


Dive into the research topics of 'Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture'. Together they form a unique fingerprint.

Cite this