Files
Abstract
Integrated Circuits (ICs) for logic (computation) have dramatically increased in both capacity and speed since their introduction in 1958. However, memory technology has had only modest improvements in speed. Moreover, to increase capacity and lower the cost per bit, main memory is implemented as a separate, external IC. Thus, relative to logic speeds, memory latency is increasing and physical constraints on external pins limits memory bandwidth. The traditional on-chip cache hierarchy has evolved with the sole goal hiding external memory latency for a single-core, sequential processor; essentially giving the illusion of lower latency. Unfortunately, it does so at the expense of IC resources (transistors), energy, power, and memory bandwidth. As the world quickly moves toward multi/many-core architectures, current cache architectures are not aligned with and in fact, are hostile to future priorities.This dissertation questions the allocation of on-chip resources for logic ICs and proposes a novel memory architecture that (a) actively manages the movement of on-chip and off-chip data, (b) creates a flatter memory hierarchy, and (c) emphasizes efficient bandwidth utilization over latency. The results demonstrate that, when combined with a suitable programming environment, the proposed memory subsystem enables a larger fraction of chip resources to be dedicated to computation. This yields a higher degree of parallelism and ultimately reduces the time-to-completion of an application independent of how fast individual tasks execute.