Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The rise in transistor cost in conjunction with the slowdown of Moore’s law has increased the demand for scalable SoC (System-on-Chip) based frameworks. The opportunity provided by reconfigurable and extensible full-system frameworks opens up a wide domain in research and academia for architecture exploration and customization. Additionally, with today’s enormous increase in applications of machine learning, specifically deep learning at the edge which requires flexible real-time cognitive processing, inclines us to have efficient architectures with conflicting combination of high-performance and low-power utilization. This leads us to have scalable, latency aware domain-specific architectures. However, accelerator design and optimization are often done in isolation considering the best possible constraints. This results in system integration becoming much more complicated due to the non-ideal performance differences. Therefore, this work focuses on enabling a design space exploration platform for processor accelerator codesign in order to achieve the best possible performance on the targeted systems. This work presents parameterizable system support for any streaming, data-hungry accelerators using Chipyard, an open-sourced, extensible, RISC-V based, agile, fullsystem hardware design and evaluation framework developed by University of California, Berkeley. With this work’s RISC-LCAW (RISC-V Loosely-Coupled Accelerator Wrapper) contribution, we try to ease the manual effort and engineering behind the accelerator system integration by proving an accelerator integration socket in Chipyard framework. This wrapper template is designed with a focus on data-hungry, streaming, loosely-coupled hardware accelerators. Also, we enable the co-design support for AWARE-DNN accelerator, developedin TeCSAR (Transformative Computer Systems and Architecture Research Lab) at University of North Carolina at Charlotte by using RISC-LCAW to integrate with the RISC-V based Rocket Chip SoC system. This integration benefits the accelerator configurability and optimizability by providing the system integration interpretation. AWARE-DNN accelerator offers an automated, re-configurable workflow for generating application specific architectures based on the inherent data-flow of targeted application and user specified real-time requirements. This combination fuels the expandability to explore the vital domains of power, latency and performance for the targeted system. Furthermore, some of the possible system-level effects on the accelerator latency and throughput are elevated compared to optimizing the accelerator in isolation. We evaluate this integration using Verilator RTL simulator for three sizes of convolution networks and see that the end-to-end latency of RISC-AWARE compared to standalone AWARE-DNN accelerator is 1.7x for 64x64x3, 2.6x for 32x32x3 and 1.2x for 11x11x3 image size.

Details

PDF

Statistics

from
to
Export
Download Full History