Files
Abstract
Deep Learning has brought a massive and revolutionary impact to the field of machine learning. The research in Neural Network algorithms has made them more efficient and powerful in the recent years. which gave rise to a need to enhance their performance on the edge. The focus in AI processing has attracted the hardware community and has led development in customizable hardware for AI. GPU’s have proved to be efficient for processing the AI workloads. Researcher’s today are more focused on reducing the computational complexity and memory footprint of the networks. This has led to more sparsity in the Network known as depthwise separable convolutional neural networks (DSCNNs) e.g: MobileNet and EfficientNet. GPU’s are not designed to take advantage of the sparsity of such networks. However FPGAs take advantage of the reconfigurability and design a customizable data path for DSCNNs. This thesisfocus on the FPGA Hardware design for powerful and efficient implementation of the DSCNNs on the edge FPGAs. It focuses on designing highly optimized convolutional operators like depthwise, pointwise and normal convolution and an architecture tosupport crucial heterogeneous compute units (CUs). It also focus on scalable development of those compute units for ease of implementation and support for the future networks. The hardware is designed using the Xilinx Vivado HLS 2018.3. HLS accel-erates the development in hardware design. The execution results on Xilinx ZCU102 FPGA board demonstrate 47.4 and 233.3 FPS/Watt for MobileNet-V2 and a compact version of EfficientNet, respectively, as two state-of-the-art depthwise separableCNNs. These comparisons showcase how this design improves FPS/Watt by 2.2× and 1.51× over Jetson Nano high and low power modes, respectively.