Exploring the scalability of OpenCL for massively parallel applications on Xilinx cloud platform

Thiagarajan, Jhanani

Exploring the scalability of OpenCL for massively parallel applications on Xilinx cloud platform

Thiagarajan, Jhanani

2020

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Field Programmable Gate Array (FPGA) is becoming a preferred platform for the high-performance computing community because of the flexibility to adapt to new computing challenges. FPGAs also provide a greater power-efficient alternative for GPUs with its customizable data path and deep pipelining capability. Using high-level synthesis and optimization tools, we can achieve better and comparable performance to that of a CPU and GPU. OpenCL is the standard programming language for general-purpose parallel programming of a heterogeneous system. The availability of OpenCL has empowered high-performance execution of the massively parallel application. OpenCL-HLS for FPGA enables programmers to explore various software optimization with enhanced hardware capability. We introduce a novel approach to study the scalability of OpenCL coarse-grain parallelism, Compute Unit (CU) replication on cloud FPGAs. This work demonstrates that for every application there is an optimum number of CU to achieve the maximum performance benefits with higher memory bandwidth utilization and optimum FPGA resources. We also provide a generic source-code template and a front-end design exploration tool to explore and identify the optimum CU number for a given application. We have used the Xilinx SDAccel 2017.4 synthesis toolchain, which is an integrated development environment for FPGA for evaluation purposes. On the hardware side, the AWS cloud-based Xilinx VU9P FPGA was employed. This project was funded by the Xilinx University Program (XUP). Our experimental results on a mix of 15applications taken from the Xilinx benchmark suite vs2017.4 and the Rodinia Benchmark Suite vs3.1 show an average speedup of 6.4×and average bandwidth utilization improved by 3.4× over baseline. Further to this, a mere 8% average resource utilization and1.33× power overhead was reported. Our tool results in a 31% improvement in the total design synthesis time for an illustrative Histogram application. Xilinx SDAccel based ‘DDR’ and ‘burst transfer’ optimizations were also explored to improve bandwidth and performance. These optimizations are helpful for data-hungry applications that have the bandwidth as the major bottleneck. Combining CU along with DDR, we could achieve a 7.5× speedup for the Largeloop OCL (from SDAccel benchmark suite) application. In addition to this, we address the memory wall and hide the memory latency problem by using OpenCL pipes. This approach involves splitting an application into ‘read’, ‘compute’, and ‘write back’ sub kernels which work concurrently. Results on seven massively parallel applications gave an average speedup of 5.2× with 2.2× bandwidth improvement on cloud FPGAs

Details

Title

Exploring the scalability of OpenCL for massively parallel applications on Xilinx cloud platform

Author

Thiagarajan, Jhanani (Electrical Engineering)

Contributor

ProQuest (Firm) Contributor
University of North Carolina at Charlotte Degree Granting Institution
Tabkhi, Hamed Thesis Advisor
Saqib, Fareena Committee Member
Sass, Ronald Committee Member

Date

2020

Publisher

University of North Carolina at Charlotte

Subjects

Computer engineering

Keywords

Compute Unit; FPGAs; OpenCL; OpenCL Pipes; SDAccel; Xilinx

Link to This Page

Handle: http://hdl.handle.net/20.500.13093/etd:2274

Publication Type

masters theses

Pagination

1 online resource (69 pages) : PDF

File Format

application/pdf

Degree Type

M.S.

Usage Statement

This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/., (http://rightsstatements.org/page/InC/1.0/)
Copyright is held by the author unless otherwise indicated.

Record Appears in

Departments and Institutes > Electrical Engineering
Types > Masters Theses
Graduate Theses and Dissertations
Graduate Thesis and Dissertations

PDF

Statistics

Download Full History

Exploring the scalability of OpenCL for massively parallel applications on Xilinx cloud platform

Files

Abstract

Details

Related Items

PDF

Statistics