Faster Convolutional Neural Networks Training

Jiang, Shanshan

Jiang, Shanshan

2021

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Convolutional Neural Network (CNN) models have become the mainstream method in Artificial Intelligence (AI) areas for computer vision tasks like image classification and image segmentation. Deep CNNs contain a large volume of convolution calculations. Thus, training a CNN requires powerful GPU resources. Training a large CNN may take days or even weeks, which is time-consuming and costly. When we need multiple runs to search for the optimal CNN hypermeter settings, it would take a couple of months with limited GPUs, which is not acceptable and hinders the development of CNNs. It is essential to train CNN faster. There are two kinds of methods to train CNN faster when no additional computing resources are available. The first method is to do the model compression, either by reducing parameters or using less storage to represent the models. This method reduces training time by reducing the architecture complexity. The second method is to reduce the input data feed into the network without affecting the network architecture. Architecture complexity reduction is a popular research area to train CNN faster. Nowadays, mobile devices like smartphones and smart cars rely on deep CNNs to accomplish complex tasks like human body recognition and face recognition. Due to the high real-time demands and the memory constraints for mobile device applications, conventional large CNN is not suitable. CNN model compression is a trend to train a deep CNN model with less computation cost. Currently, there are many successful networks designed to solve this problem, like ResNeXt, MobileNet, ShuffleNet, and GhostNet. They use 1×1 convolution, depthwise convolution, or group convolution to replace the standard convolution to reduce the computation. However, there are fewer studies on the following questions. First, does the variety of convolution layers (the output channel number is larger or smaller than the input channel number) affect different compression strategies’ performance? Second, does the expansion ratio (either the output channel number over the input channel number if the output channel number is larger, or the input channel number over the input channel number if the input channel is larger) of the convolution layers affect different compression strategies’ performance? Third, does the compression ratio (the reduced parameter number/FLOPs over the original parameter number/FLOPs) affect the performance of different compression strategies? Current networks tend to use the same convolution strategy inside a basic network block, ignoring the variety of network layers. We have proposed a novel Conditional Reduction (CR) module to compress a single 1×1 convolution layer. Then we have developed a novel three-layer Conditional block (C-block) to compress the CNN bottleneck or inverted bottlenecks. At last we have developed a novel Conditional Network (CRnet) based on the CR module and C-block. We have tested the CRnet on two image classification datasets: CIFAR-10 and CIFAR-100, with multiple network expansion ratios and compression ratios. The experiments verify our methods’ correctness with attention to the importance of the input-output pattern when selecting a compression strategy. The experiments show that our proposed CRnet better balances the model complexity and accuracy compared to the state-of-the-art group convolution and Ghost module compression. Data reduction reduces the training time in a direct and simple way through data dropping. There are works drop data by the sample importance ranking. The ranking process takes extra time when there is a large number of training samples. When we tune the different network settings to search for an optimal setting, we expect a way to reduce a large percentage of training time with tiny or no accuracy loss. There are fewer studies on the following questions. First, what are suitable sampling ratios? Second, should we use the same sampling ratio for each training epoch? Third, does the sampling ratio performs differently on small and large datasets? We have proposed a flat reduced random sampling training strategy and a bottleneck reduced random sampling strategy. We have proposed a three-stage training method based on the bottleneck reduced random sampling with consideration of the distinctiveness of the network early-stage training and end-stage training. Furthermore, we have proved the data visibility of a sample in the whole training process and the theoretical reduced time by four theorems and two corollaries. We have tested the two sampling strategies on three image classification datasets: CIFAR-10, CIFAR-100 and ImageNet. The experiments show that our proposed two sampling strategies effectively reduce a significant training time percentage at a very small accuracy loss.

Details

Title

Faster Convolutional Neural Networks Training

Author

Jiang, Shanshan (Computer Science)

Contributor

ProQuest (Firm) Contributor
University of North Carolina at Charlotte Degree Granting Institution
Wang, Sheng-Guo Thesis Advisor

Date

2021

Publisher

University of North Carolina at Charlotte

Subjects

Computer science

Keywords

Cnn; Deep Learning

Link to This Page

Handle: http://hdl.handle.net/20.500.13093/etd:2853

Publication Type

doctoral dissertations

Pagination

1 online resource (89 pages) : PDF

File Format

application/pdf

Degree Type

Ph.D.

Usage Statement

This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/., (http://rightsstatements.org/page/InC/1.0/)
Copyright is held by the author unless otherwise indicated.

Record Appears in

Departments and Institutes > Computer Science
Types > Doctoral Dissertations
Graduate Theses and Dissertations
Graduate Thesis and Dissertations

PDF

Statistics

Download Full History

Files

Abstract

Details

Related Items

PDF

Statistics