We present a high performance tridiagonal solver library for Xilinx FPGAs optimized for multiple multi-dimensional systems common in real-world applications. An analytical performance model is developed and used to explore the design space and obtain rapid performance estimates that are over 85% accurate. This library achieves an order of magnitude better performance when solving large batches of systems than previous FPGA work. A detailed comparison with a current state-of-the-art GPU library for multi-dimensional tridiagonal systems on an Nvidia V100 GPU shows the FPGA achieving competitive or better runtime and significant energy savings of over 30%. Through this design, we learn lessons about the types of applications where FPGAs can challenge the current dominance of GPUs.
Bibliographical noteKAUST Repository Item: Exported on 2022-06-20
Acknowledgements: Gihan Mudalige was supported by the Royal Society Industry Fellowship Scheme (INF/R1/1800 12). Istvan Reguly ´ was supported by National Research, Development and Innovation Fund of Hungary (PD 124905), under the PD 17 funding scheme. We are grateful to Xilinx for their hardware and software donations and Jacques Du Toit and Tim Schmielau at NAG UK for their advice and making the SLV application avaialble for this work.