A scheduling policy to save 10% of communication time in parallel fast Fourier transform

Samar A. Aseeri, Anando Gopal Chatterjee, Mahendra K. Verma, David E. Keyes

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


The fast Fourier transform (FFT) has applications in almost every frequency related study, for example, in image and signal processing, and radio astronomy. It is also used as a Poisson operator inversion kernel in partial differential equations in fluid flows, in density functional theory, many-body theory, and others. The three-dimensional (Formula presented.) FFT has large time complexity (Formula presented.). Hence, parallelization is used to compute such FFTs. Popular libraries perform slab division or pencil decomposition of (Formula presented.) data. None of the existing libraries achieve perfect inverse scaling of time with (Formula presented.) cores because FFT requires all-to-all communication and clusters hitherto do not have physical all-to-all connections. Dragonfly, one of the popular topologies for the interconnect, supports hierarchical connections among the components. We show that if we align the all-to-all communication of FFT with the physical connections of Dragonfly topology we will achieve a better scaling and reduce communication time.
Original languageEnglish (US)
JournalConcurrency and Computation: Practice and Experience
StatePublished - Jul 18 2021

Bibliographical note

KAUST Repository Item: Exported on 2021-08-06
Acknowledgements: The authors thank the members of the Supercomputer Laboratory at King Abdullah University for providing the necessary resources and guidance. This research was supported by the Extreme Computing Research Center (ECRC) at KAUST and by the Simulation and Modelling Laboratory at IIT Kanpur.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science
  • Software
  • Computer Networks and Communications
  • Computer Science Applications


Dive into the research topics of 'A scheduling policy to save 10% of communication time in parallel fast Fourier transform'. Together they form a unique fingerprint.

Cite this