Files
Abstract
With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly "execute-in-execute" LLVM-based model, enabling scalable modeling of dynamically interacting accelerators with full-system simulation support. To create a sustainable expansion compatible with the gem5 framework, gem5-SALAM offers a general-purpose and modular memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands the framework established in gem5-SALAMv1 with improved elaboration and simulation, system integration, and automation to simplify rapid prototyping and design space exploration. Validation on the MachSuite \cite{machsuite} benchmarks presents a timing estimation error of less than 1% against the Vivado HLS tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. System validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures.