Compute capability of high-performance hardware has been growing at immense rates, increasing over 130x in the last decade. Communication bandwidth, however, only grew by a factor of 6x in the same time, leading to a significant decrease in the byte-To-flop metric. This trend leads us to the situation where, in many cases, computation is virtually free, and the dominant cost of a parallel application comes from its communication cost. We expect this trend to continue and, hence, the parallel application wall-clock time to be increasingly correlated with the amount of data transferred between the nodes involved. In order to alleviate this communication bottleneck, we test several communication-reducing schemes based on the idea of using higher precision for the inner cells and lower precision communication. For every approach, we report the resulting network traffic and weigh it against the decreased accuracy. We perform our experiments in a collocated Discontinuous Galerkin finite element method framework (DG-FEM) applied in Computational Fluid Dynamics (CFD). First, we present a parametric study using the method of manufactured solutions on a 3D compressible Navier-Stokes supersonic cube. Using this method allows us to quantify communication reducing schemes' impact on the error in test cases representing a range of solution polynomial degrees and problem sizes. Finally, we verify the findings on a full-scale CFD problem, flow around the delta wing, and report on methods' consistency as the number of processes and the number of halo elements change.
Bibliographical noteKAUST Repository Item: Exported on 2022-10-07
Acknowledgements: The research reported in this paper was funded by King Abdullah University of Science and Technology. We are thankful to the Supercomputing Laboratory and the Extreme Computing Research Center at King Abdullah University of Science and Technology for their computing resources.