Efficient overlay architecture based on DSP blocks

Abhishek Kumar Jain, Suhaib A. Fahmy, Douglas L. Maskell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

36 Scopus citations


Design productivity and long compilation times are major issues preventing the mainstream adoption of FPGAs in general purpose computing. Several overlay architectures have emerged to tackle these challenges, but at the cost of increased area and performance overheads. This paper examines a coarse grained overlay architecture designed using the flexible DSP48E1 primitive on Xilinx FPGAs. This allows pipelined execution at significantly higher throughput without adding significant area overheads to the PE. We map several benchmarks, using our custom mapping tool, and show that the proposed overlay architecture delivers a throughput of up to 21.6 GOPS and provides an 11 - 52% improvement in throughput compared to Vivado HLS implementations.
Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages4
ISBN (Print)9781479999699
StatePublished - Jan 1 2015
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2021-03-16


Dive into the research topics of 'Efficient overlay architecture based on DSP blocks'. Together they form a unique fingerprint.

Cite this