Development NURO-RAM: Memory management architecture for streaming CNN accelerators on edge

SAWANT, ADARSH

Development NURO-RAM: Memory management architecture for streaming CNN accelerators on edge

SAWANT, ADARSH

2019

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Development of accelerators for deep learning accelerators have been gaining a lot of popularity due to sheer amount computation performed by deep learning algorithms. From the onset of Moore's law failure it has become difficult to improve the performance of the general purpose processors and hence computer architects are inclining towards more heterogeneous solutions for accelerating deep learning applications efficiently. Secondly the computation performed in the deep applications are repetitive and predictable which naturally leads to three choices $ASICSs$, \emph{GPUs} and \emph{FPGAs}. \emph{FPGAs} due to its configurability, deep pipeling abilities and high performance per watt has been one of favorite devices for accelerator architecture research. As \emph{FPGAs} are really difficult to program and thus there has been thus rise in development reusable accelerator templates which can be instantiated even by software developers.Memory always has been the main bottle-neck even for the architecture with most efficient compute data-path. This problem is further compounded as \emph{FPGAs} have low on-chip memory footprint (in form of BRAMs). Most of the deep learning applications have a very high model size (ex: AlexNet has model size of >100Mb). Thus to accelerate deep learning applications there is a need to develop memory systems to support these application. Conventional accelerators try to mitigate with these issue by accelerating single layer sequentially which has its own implication like bandwidth wastage, power consumption, etc. Since the presented work serves streaming accelerators, separate strategy has to developed. This work presents development of such memory management system called NURO-RAM. NURO-RAM uses minimum sized pre-fetch buffers and static weight scheduler in order to support the deep learning accelerator AWARE-DNN. This work implements three different network AlexNet, Shallow mobile net and Tiny Darknet to show the diversity of the NURO-RAM to serve streaming accelerators like AWARE. We then compared NURO-AWARE solution (implementing AWARE-DNN with support of NURO-RAM memory system) to Chai DNN an HLS based deep learning accelerator library and NVDIA Xavier mobile \emph{GPUs}. The purposed network consumes lower power 4.5 watts (NURO-AWARE) vs 10 watts (Chai DNN). The power consumption against \emph{GPUs} is comparable with 5.7 watts consumed by NVDIA Xavier. This work also consumes lesser BRAM for both AlexNet 75 in NURO-AWARE vs 88% in Chai DNN and for Tiny Darknet 48% in NURO-AWARE vs 88% in Chai DNN. With lesser BRAM utilization than state of the art architecture and separate utilization for three different networks shows that the NURO-AWARE architecture is resource aware as well as application aware. The lesser power consumption and resource utilization of the presented work can be attributed to the custom data path used by the AWARE DNN accelerator and the custom memory access path developed by the presented solution for each layer which reduces the off-chip memory access. The presented work beats Chai DNN which can support 10.21 FPS with fully connected layers whereas NURO-AWARE can support 30 FPS, and even Xavier \emph{GPUs} with support of 2 FPS for performance metric. This is because the presented solution uses its architectural knobs to satisfy the real-time frame rate requirement. Overall the presented solution beats \emph{GPUs} as well as \emph{FPGA} \emph{FPGA} based state of the art solution in performance per watt.

Details

Title

Development NURO-RAM: Memory management architecture for streaming CNN accelerators on edge

Author

SAWANT, ADARSH (Electrical Engineering)

Contributor

ProQuest (Firm) Contributor
University of North Carolina at Charlotte Degree Granting Institution
SAWANT, ADARSH Thesis Advisor
Tabkhi, Dr. Hamed Committee Member
Willis, Dr. Andrew Committee Member
Sass, Dr. Ronald Committee Member

Date

2019

Publisher

University of North Carolina at Charlotte

Subjects

Computer engineering

Keywords

Configurable Memory System; Convolutional Neural Networks; Hardware Accelerators; Memory System; Streaming Accelerators; Xilinx Based System

Link to This Page

Handle: http://hdl.handle.net/20.500.13093/etd:2563

Publication Type

masters theses

Pagination

1 online resource (79 pages) : PDF

File Format

application/pdf

Degree Type

M.S.

Usage Statement

This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/., (http://rightsstatements.org/page/InC/1.0/)
Copyright is held by the author unless otherwise indicated.

Record Appears in

Departments and Institutes > Electrical Engineering
Types > Masters Theses
Graduate Theses and Dissertations
Graduate Thesis and Dissertations

PDF

Statistics

Download Full History

Development NURO-RAM: Memory management architecture for streaming CNN accelerators on edge

Files

Abstract

Details

Related Items

PDF

Statistics