An Integrative Algorithm/Architecture Co-Design Of Deep Spatial and Temporal Separable Convolutional Neural Networks
In this dissertation, I present my researches on the co-design of algorithms and architectures for deep spatial and temporal separable convolutional neural networks and their applications. As a first step, I will present Deep RACE as an application of Deep Neural Network (DNN) in the real-time reliability monitoring of transistors. Then, I will introduce DeepDive, a framework for enabling the execution of power-efficient spatial deep learning models on embedded FPGA. In addition, Agile Temporal Convolutional Network (ATCN) is proposed for fast time series prediction and classification in resource-constrained embedded systems. Finally, DeepTrack, which is based on ATCN, is introduced for vehicle trajectory prediction in highways. The significance of each of them is briefly explained below.At first, this dissertation describes a novel approach, Deep Learning Reliability Awareness of Converters at the Edge (Deep RACE), for real-time reliability modeling and prediction of high-frequency MOSFET power electronic converters. Deep RACE offers a holistic solution that comprises algorithm advances, and full system integration (from the cloud down to the edge node) to create a near real-time reliability awareness. On the algorithm side, I propose a deep learning algorithmic solution based on stacked LSTM for collective reliability training and inference across collective MOSFET converters based on device resistance changes. Deep RACE also proposes an integrative edge-to-cloud solution to offer scalable decentralized devices-specific reliability monitoring, awareness, and modeling. The MOSFET convertors are IoT devices that have been empowered with edge real-time deep learning processing capabilities. The proposed Deep RACE solution has been prototyped and implemented through learning from the MOSFET data set provided by NASA. Our experimental results show an average miss prediction of 8.9% over five different devices which is a much higher accuracy compared to well-known classical approaches (Kalman Filter, and Particle Filter). Deep RACE only requires 26mS processing time and 1.87W computing power on edge IoT devices.Then, this dissertation introduces DeepDive, which is a fully-functional, vertical co-design framework, for power-efficient implementation of Deep Separable Convolutional Neural Networks (DSCNNs) on edge FPGAs. DeepDive's architecture supports crucial heterogeneous Compute Units (CUs) to fully support DSCNNs with various convolutional operators interconnected with structural sparsity. It offers FPGA-aware training and online quantization combined with modular synthesizable C++ CUs, customized for DSCNNs. The execution results on Xilinx's ZCU102 FPGA board, demonstrate 47.4 and 233.3 FPS/Watt for MobileNet-V2 and a compact version of EfficientNet, respectively, as two state-of-the-art depthwise separable CNNs. These comparisons showcase how DeepDive improves FPS/Watt by 2.2x and 1.51x over Jetson Nano high and low power modes, respectively. It also enhances FPS/Watt by about 2.27x.Next, this dissertation presents a scalable deep learning model called ATCN for high-accurate fast classification and time series prediction in resource-constrained embedded systems. ATCN is primarily designed for mobile embedded systems with performance and memory constraints, such as wearable biomedical devices and real-time reliability monitoring systems. It makes fundamental improvements over the mainstream temporal convolutional neural networks, including the incorporation of separable depth-wise convolution to reduce the computational complexity of the model and residual connections as time attention machines, to increase the network depth and accuracy. The result of this configurability is that the ATCN becomes a family of compact networks with formalized hyperparameters that enable application-specific adjustments to be made to the model architecture. As part of the present work, three ATCN families, namely T0, T1, and T2, are also presented. T0 and T1 are compiled and executed on the Cortex-M7 microcontroller, and all three models are executed on the Cortex-A57 processor. An evaluation of the accuracy and execution performance of the three models against the best-in-class InceptionTime shows that ATCN can not only improve accuracy but also enable time series classification on microcontrollers and improve the execution time on legacy microprocessors.Intelligent transportation systems that have to perform precise trajectory prediction are vital; however, model complexity and memory footprint of these smart systems are also critical factors as they are generally deployed at the edge. Towards this end, I will present DeepTrack, a model based on ATCN that has better or comparable accuracy to existing models, but is smaller and has a lower computational complexity suitable for embedded systems. In contrast to previous methods, the vehicle dynamics are encoded using ATCNs rather than LSTMs, which are synonymous with time series analysis. According to experimental results, DeepTrack performed better than state-of-the-art trajectory prediction algorithms not only in terms of average displacement error but also in terms of MACs and model size as well.