Files
Abstract
Deep learning uses stacks of multiple processing layers to learn representations of data with different levels of abstraction. It enables machines to have the understanding of outer environment just like the human body, opening a path for diverse range of applications like autonomous control of a self driving car or monitoring a certain set of devices. Convolutional Neural Networks (CNN), arguably the most popular deep learning architecture, consists of multiple convolutional and pooling layers stacked on each other. The convolutional layer is used for capturing the features of an image while the pooling layers help reduce the number of parameters involved. With deep learning consisting of multiple processing layers to learn the representation of data the computation involved in it is intense. Graphical Processing Units work on a Single Thread Multiple Instruction execution model and the uniform structure of each layer in convolutional neural network fits well with the computations GPUs performs efficiently. GPU simulators are a useful tool in making architectural modifications to GPU hardware. In this research, we explore the challenges involved in making Convolutional Neural Networks compatible with latest versions of GPU simulators. We develop optimized GPU kernels (GPU code) and integrate it with Darknet deep learning framework which help us explore the actual hardware bottlenecks affecting GPU performance for deep learning applications. We also do Architecture modeling of embedded GPUs and performance comparison with actual GPUs to verify the accuracy of our architecture modeling specially for deep learning applications.