Files
Abstract
The impact of transistor scaling on FPGAs is changing the role of FPGA from accelerators to a major role as processors. With this rapid development and the ability to implement complex systems on FPGAs, the conventional hardware language design flow is making way for software-like language using High-Level Synthesis (HLS). While academic and commercial HLS tools have made huge strides, nearly all these tools focus exclusively on the computation and the data path. Rarely do they directly address the memory subsystem and its impact onthe overall performance. At best, the programmers can assist thetools with optimization which indirectly impact the memory subsystem performance. This has (unintentionally) exacerbated the already existing memory issues. The performance of DDR memory which has been the main stream off-chip memory has been lagging behind the processor performance. This has resulted in a performance gap and emerging memories such as Hybrid Memory Cube (HMC), High Bandwidth \Memory (HBM), and others are promising prospects in reducing this gap. However, integrating these new memory technologies with HLS design flow has not been trivial. To fully utilize the performance benefits, the programmer must understand the low-level details of the hardware.In this work, we conduct a systematic analysis of different HLS generated circuits on different off-chip memories. Our analysis identifies the root cause of the problem which is how the low-level hardware interacts with the memory subsystem.To mitigate these issues we introduce a hardware middle layer which establishes compatibility and a software transformation to improve the performance. This design is compared with the baseline system to evaluate the performance improvement from the methodology on heterogeneous systems.