Enhancing HPC job scheduling with synthetic data generation for reinforcement learning-based schedulers

Soundar Raj, Monish; Dai, Dong

Enhancing HPC job scheduling with synthetic data generation for reinforcement learning-based schedulers

2024

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

In the realm of high-performance computing (HPC), the efficient scheduling of jobs is essential for improving the performance of the system and making the best use of its resources. Conventional scheduling methods, which depend on heuristic priority rules, often find it challenging to adjust to changing job loads, objectives, and configurations. To overcome these obstacles, reinforcement learning (RL)-based schedulers have been proposed. However, training and evaluating these schedulers require extensive job traces, which are often unavailable due to privacy concerns. To bridge this gap, we employ advanced synthetic data generation techniques, including Tabular Variational Autoencoder (TVAE), Conditional GAN (CTGAN), and Copula GAN, to generate high-fidelity synthetic job traces. These models produce diverse and realistic synthetic data that accurately capture various job patterns and system states. We evaluate the synthetic data's quality and applicability through various methods, including cumulative distribution functions (CDFs), correlation heatmaps, statistical analysis, and scheduling simulations. Our findings show that the synthetic data, created with minimal human intervention, closely resembles actual real-world scenarios, which is beneficial for training and assessing RL-based scheduling methods in dynamic HPC settings. Our method addresses the lack of available real-world job traces. Integrating advanced synthetic data generation with RL-based scheduling represents a significant step forward in optimizing HPC job scheduling.

Title

Enhancing HPC job scheduling with synthetic data generation for reinforcement learning-based schedulers

Author

Soundar Raj, Monish (Department of Computer Science)
Dai, Dong (Department of Computer Science)

Date

2024

Subjects

High performance computing
Computer science

Link to This Page

Handle: http://hdl.handle.net/20.500.13093/work:1528

Publication Type

conference proceedings

Pagination

1 online resource

File Format

application/pdf

Language

English

Usage Statement

This item may be protected by copyright and other related rights. Atkins Library provides access to this item for educational and research purposes only; other uses require the permission of the copyright holder.

Record Appears in

Departments and Institutes > Department of Computer Science
Types > Conference Proceedings
Student Works
Works

Download Full History

Enhancing HPC job scheduling with synthetic data generation for reinforcement learning-based schedulers

Files

Abstract

Details

Related Items

PDF

Statistics