Optimizing Performance of In-memory Computing with Hybrid Memory System
The development of in-memory technologies has fueled the emerging of in-memory computing systems. Simultaneously, with novel memory technologies such as high-bandwidth memory (HBM) and non-volatile memory (NVM), hybrid memory systems are expected to be more commonly used in Cloud Computing platforms, which opens a new field about memory management in both academia and industry communities. Memory capacity is always a critical bottleneck for any applications running in Cloud Computing platforms. Data explosion is also posing an unprecedented requirement for computing capacity to handle the ever-growing data volume, velocity, variety, and veracity. Thus, in-memory computing systems are increasingly looking inward at huge caches of under-processed or trash-away data as resources to be mined. The key purpose of managing data in any memory system is to keep more useful data with high memory utilization without compromising applications' performance, especially for machine and deep learning applications running in clouds. However, there are numerous challenges in realizing the above goal, including sharing memory among applications, managing cached data, data migration on hybrid memory systems, and control strategy for a unified hybrid memory pool. In this context, we concentrate on developing an efficient hybrid memory system and memory management strategies for in-memory computing on Cloud Computing platforms.To achieve this, we propose to develop a hybrid memory system that includes fast and relatively slow memory hardware and memory management strategies for applications running in cloud environments based on optimization formulations, feedback control, and machine/deep learning methods. In order to realize a runtime system that automatically optimizes data management on hybrid memory, we will (1) propose a new shared in-memory cache layer among parallel executors that are co-hosted on the same computing node, which aims to improve the overall hit rate of data blocks by caching and evicting these blocks uniformly across multiple executors; (2) develop a middleware layer built on top of existing deep learning frameworks that streamlines the support and implementation of online learning applications; (3) design a unified in-memory computing architecture with efficient data sharing and communication strategy to optimize data migration and placement, memory allocation and recycle for machine learning applications. The problem of management of share cache memory including memory allocation and recycle will be defined as online optimization problems and solved with feedback control and machine learning algorithms in terms of memory utilization. To improve the utilization rate of memory for in-memory computing, we will (1) design an algorithm to predict the possibility of a cache data block to be referenced again and follows it to prevent blocks with longer re-reference distance from occupying the limited cache space too long time; (2) design a novel model updating strategy that builds training data samples in terms of contributions from different data life stages in model training, and considers the training cost consumed in model updating so that a better training model of describing data tendency in dynamic environments can be achieved; (3) design a memory management strategy for the hybrid memory system to automatically optimize data migration among different memory layers to achieve similar performance compares to the case of pure fast memory systems with a relatively limited capacity of memory size. We will implement these methods on top of some existing Cloud Computing platforms that aims to maximize memory utilization, holding more applications and less requirement for fast memory hardware. We will also implement experiments with the proposed technologies on a testbed of the local cluster environment and evaluate their performance with typical benchmark applications.