Co-Designed Systems Software For Multi-core Architectures
Traditional approaches to resource management based solely on hardware or software are ill-suited for future on-chip resource supervision in the multi-core era. Hardware only solutions may not be practical if the ideal resource optimization is complex, specific to chip instance, time-dependent, or influenced by external factors. On the other hand, complete solutions through conventional system software require intimate knowledge of the processor implementation, necessitate ISA augmentation, and may incur prohibitive overhead.This research examines a broad solution space where cooperative designs consisting of firmware and adaptive/programmable microarchitecture structures can be integrated into processor design to overcome the limitations of separate hardware/software solutions. Specifically, we target the following issues:
The above example shows how virtualization of CMP cores can help to hide hard failures from software by remapping threads to cores.
Variability Tolerant Architectures
Technology scaling will increase the impact of circuit-level parameter variation, attacking the fundamental notion that architects can design a processor based on fixed circuit latency, power consumption, and reliability characterisitics.Parameter variations come in two forms: (1) process variation due to manufacturing errors and (2) environmental variation due to conditions such as temperature and supply voltage. Together these variations can cause each instantiation of a given physical structure to have different circuit latency, power consumption, and susceptibility to noise. This applies spatially and temporally to structures on the same chip as well as to structure on chips fabricated on the same wafer. Furthermore, the deviation in these parameters is significant with respect to their magnitudes, diminishing the effectiveness of ``safety margins''.
Under parameter variation, efficient dynamic resource management becomes
considerably harder, but offers increasing benefits. The chief
difficulty is that each instantiation of a physical resource will have
its own average and worst-case behavior under different environmental
conditions. This demands that resource optimization be tailored to a
specific processor and be cognizant of the environmental
situation. The obvious benefit is that variation aware management can
provide better results than approaches that either assume average
case or worst-case parameters.
Dynamic Power Management
Today power is a first-class design constraint and technology scaling will ensure that it will retain this prominent position for the foreseeable future. In truth, the term power generally refers to a number of related problems that are all rooted in the same physical phenomena. The energy or average power problem relates to the amount of dynamic and static power consumed by circuits over time, and has direct consequences on battery-life time in mobile devices, utility costs in large server installations, and considering the number of in-operation computers, world-wide, efficient use of natural resources, as well.The thermal or max power problem relates to the temperature of ``hotspots'' on the die, which left unaddressed can cause permanent hardware failure. Due to increasing power density under technology scaling, and the inability to adequately cool high frequency CPUs, thermal effects will play a pronounced role in future processors. Finally, an important growing issue is power delivery or the delta power problem. It is concerned with imperfections in the electrical networks that supply power to the circuits on chip. Severe rises and falls in the power demanded by the CPU can degrade the reference voltages via non-idealities in the delivery mechanisms and ultimately produce soft-failures. Together, these principle problems will force architects to consider power management issues early in the design cycle.
This project examines dynamic microarchitectural techniques that can solve power management as a multiobjective problem.
The above picture shows the spatial distribution of voltage droops on a high-performance CPU.