Computer Architecture
The Art of Designing Micro-Processors
Heterogeneous Many-Core Architecture
My research focuses on designing faster and more efficient computers using specialized architectures that are highly optimized for performing specific tasks. Making systems more efficient reduces their cost and increases their availability and potential impact to solve challenging problems in other domains. In the past, advances in computer efficiency have enabled streaming video on handheld devices, visual computing and 3D gaming on consoles and desktops, and molecular dynamic simulations to aid the develpment of new healthcare products.
Motivation
In the past, from the 1960s until recently, it has been possible to improve computer performance by treating all applications the same and optimizing the common case. Around 2005, hardware designers in industry began to hit limits in the amount of energy used to run a general purpose processor at full speed.
In order to continue to improve performance under new power constraints, it becomes necessary to optimize hardware to perform specific rather than general purpose operations. GPU architectures from NVIDIA AMD and Intel are commercial examples of specialized processors for graphics. However, as all applications are no longer capable of running efficiently on all processor architectures, application developers are forced to deal with additional complexity.
My research interests are two-fold: 1) Determining different sets of architecture features that are highly efficienct for performing domain specific applications and 2) building dynamic systems for automatically mapping software to the most efficient architecture when it is available, or reoptimizing the software if it is not.
Individual Projects
Harmony - An Execution Model and Runtime for Heterogeneous Many-Core Systems
Harmony is a programming and execution model for systems with at least one CPU and possibly many accelerators. The goal is to hide inter-accelerator parallelism and architecture heterogeneity from the programmer without sacrificing performance. This is done via automatic parallelization of sequential applications using speculative threading and dynamic mapping of work to accelerators. We also explore techniques such as performance prediction, variable renaming, and kernel fusion/fission further optimize this model.
Heterogeneous Virtual Machines
The relentless progress of Moore’s Law has periodically inspired major innovations – both in hardware and software – at specific points in time to keep performance growth on pace with transistor density. Industry has reached another such point as it encounters intellectual and engineering challenges in the form of power dissipation, processor-memory performance gap, limits to instruction level parallelism, slower frequency growth, and rising non-recurring engineering costs. As a consequence, when we consider how the large number of transistors that will be supplied at future technology nodes will be used to sustain performance growth, there are some inevitable trends, including i) replication of cores, ii) the use of high volume custom accelerators due to the fact that these devices have small footprint and dramatically less power consumption for the performance gains they offer, and iii) innovations in memory hierarchies. The preceding collectively inspire the development of Hybrid Virtual Machines (HVM) for heterogeneous many-core platforms – large scale, heterogeneous systems comprised of single or shared ISA general purpose cores intermingled with customized heterogeneous cores – accelerators, and using diverse memory, cache and interconnect hierarchies. Such platforms will be seen both, in individual user as well as rack scale and multi-rack scale systems in order to keep up with the growing application demands.
These Hybrid Virtual Machines will run applications belonging to a wide spectrum comprising of high performance applications like scientific computing, biological simulations etc, enterprise applications like financial processing, data processing etc to client applications like gaming and image processing. Achieving performance guarantees for these applications under the constraints imposed by this workload variability, power, cost and heterogeneity then, requires a re-thinking of the current software stack. To this end, we have undertaken a wide reaching effort at Georgia Tech which evaluates the challenges imposed by the emerging hardware and applications on the entire software stack. We are working on designing and implementing suitable programming models, runtimes, operating system changes and hypervisor changes to prepare software for such future heterogeneous many-core platforms. This effort is a collaboration between different research groups at Georgia Tech and research labs across United States.