OverGen: Improving FPGA Usability through Domain-specific Overlay Generation

Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, Tony Nowatzki

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

FPGAs have been proven to be powerful computational accelerators across many types of workloads. The mainstream programming approach is high level synthesis (HLS), which maps high-level languages (e.g. C+ #pragmas) to hardware. Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: Although HLS compilation is fast, the downstream physical design takes hours to days; FPGA reconfiguration time limits the time-multiplexing ability of hardware, and tools do not reason about cross-workload flexibility. Overlay architectures mitigate the above by mapping a programmable design (e.g. CPU, GPU, etc.) on top of FPGAs. However, the abstraction gap between overlay and FPGA leads to low efficiency/utilization. Our essential idea is to develop a hardware generation framework targeting a highly-customizable overlay, so that the abstraction gap can be lowered by tuning the design instance to applications of interest. We leverage and extend prior work on customizable spatial architectures, SoC generation, accelerator compilers, and design space explorers to create an end-to-end FPGA acceleration system. Our novel techniques address inefficient networks between on-chip memories and processing elements, as well as improving DSE by reducing the amount of recompilation required. Our framework, OverGen, is highly competitive with fixed-function HLS-based designs, even though the generated designs are programmable with fast reconfiguration. We compared to a state-of-the-art DSE-based HLS framework, AutoDSE. Without kernel-tuning for AutoDSE, OverGen gets 1.2 × geomean performance, and even with manual kernel-tuning for the baseline, OverGen still gets 0.55 × geomean performance - all while providing runtime flexibility across workloads.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
PublisherIEEE Computer Society
Pages35-56
Number of pages22
ISBN (Electronic)9781665462723
DOIs
StatePublished - 2022
Event55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 - Chicago, United States
Duration: Oct 1 2022Oct 5 2022

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume2022-October
ISSN (Print)1072-4451

Conference

Conference55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
Country/TerritoryUnited States
CityChicago
Period10/1/2210/5/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • CGRA
  • Design Automation
  • Domain-specific Accelerators
  • FPGA
  • Reconfigurable architectures

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'OverGen: Improving FPGA Usability through Domain-specific Overlay Generation'. Together they form a unique fingerprint.

Cite this