Upcoming Architecture Talks:
Recent Talks:
General Info:
Dharmendra S. Modha
IBM Almaden Research Center
Monday 5/3/04, 4pm-5pm, 104 Gates Hall
Abstract
We consider the problem of cache
management in a demand paging scenario with uniform page sizes. We propose a
new cache management policy, namely, Adaptive Replacement Cache (ARC), that
has several advantages. In response to evolving and changing access patterns,
ARC dynamically, adaptively, and continually balances between the recency and frequency
components in an online and self-tuning fashion. The policy ARC is simple-to-implement and, like LRU, has constant complexity per request. The
policy ARC is scan-resistant: it allows one-time sequential requests to pass through without polluting the cache.
On many real-life traces as well as on a synthetic benchmark trace, ARC leads
to substantial performance gains over LRU for a wide range of cache sizes. As anecdotal evidence, for the Storage Performance Council Benchmark,
at 4GB cache, LRU delivers a hit ratio of 9.19% while ARC achieves a hit ratio
of 20%.
This is joint work with Nimrod Megiddo. Links: To learn more about ARC, see: http://www.almaden.ibm.com/StorageSystems/autonomic_storage/ARC/index.shtml
Bio
Dharmendra Modha is a researcher at the IBM Almaden Research Center. He holds a Ph. D. in Electrical Engineering from UC San Diego. He is interested in Caching Algorithms, Information Theory, and Machine Learning. For more details, see: http://www.almaden.ibm.com/cs/people/dmodha/
Abstract
The current proliferation of mobile devices has resulted in a large diversity of designs, each optimized for a specific application, form-factor, battery life, and functionality (e.g., cell phone, pager, MP3 player, PDA, tablet, laptop). Recent trends, motivated by user preferences towards carrying less, have focused on integrating these different applications in a single general-purpose device, often resulting in much higher energy consumption and consequently much reduced battery life. Our research argues that in order to achieve longer battery life, such systems should be designed to include requirements-aware energy scale-down techniques. Such techniques would allow a general-purpose device to use hardware mechanisms and software policies to adapt energy use to the user's requirements for the task at hand, potentially approaching the low energy use of a special-purpose device.
We make two main contributions. We first provide a model for energy scale-down. We argue that one approach to design scale-down is to use special-purpose devices as examples of power-efficient design points, and structure adaptivity using insights from these design points. To understand the magnitude of the potential benefits, we present an energy comparison of a wide spectrum of mobile devices (to the best of our knowledge, the first study to do so). Based on the insights from this study, we propose and evaluate three specific requirements-aware energy scale-down optimizations, in the context of the display, wireless, and CPU components of the system. Our optimizations reduce the energy consumption of each of their targeted subsystems by factors of 2 to 10 demonstrating the importance of energy scale-down in future designs.
Bio
Partha Ranganathan is currently a research scientist at Hewlett Packard Labs. His research interests are in low-power system design, system architecture, and performance evaluation. His recent research focuses on designing power- and energy-efficient systems for future computing environments (from small mobile devices to dense servers in data centers). This work has led to a class of "energy scale-down" optimizations that use adaptivity in resources to match system energy efficiency with desired user functionality to achieve significant energy savings. Partha is currently exploring the potential of energy scale-down optimizations in the context of the data center as part of the data center architecture team. Partha received his B.Tech degree from the Indian Institute of Technology, Madras and his M.S. and Ph.D. from Rice University, Houston. He is a primary developer of the publicly distributed Rice Simulator for ILP Multiprocessors (RSIM), and a recipient of the Lodieska Stockbridge Vaughan fellowship and an IIT Madras Alumni Award.
Saman Amarasinghe
MIT
Monday, 10/20/03, 4pm-5pm, 104 Gates Hall
Abstract
Designing an optimizing compiler is a
black art. Compiler writers are expected to create effective and inexpensive
solutions to NP-hard problems such as instruction scheduling and register
allocation. To make matters worse, separate optimization phases have strong
interactions and competing resource constraints. Complexities of modern
architectures further muddy the solution space in which they work. Compilers
cannot practically find optimal solutions to NP-hard problems. Therefore,
compiler writers divide the problem into multiple phases and devise heuristics
that find approximate solutions for a large class of applications.
However, to achieve satisfactory performance, developers are forced to
iteratively tweak their heuristics and change the ordering of the phases.
In this talk I will introduce two techniques that simplify optimizer
development: Meta Optimization and Convergent Scheduling.
Meta Optimization a methodology for
automatically fine-tune compiler heuristics. Meta Optimization uses
machine-learning techniques to automatically search the space of compiler
heuristics. These techniques reduce
compiler design complexity by relieving compiler writers of the tedium of
heuristic tuning. Our machine-learning system uses an evolutionary
algorithm to automatically find effective compiler heuristics.
Convergent Scheduling is a general
framework for instruction scheduling on spatial architectures. A convergent
scheduler is composed of independent passes, each implementing a heuristic that
addresses a particular problem or constraint. The passes share a simple,
common interface that provides spatial and temporal preferences for each
instruction. Preferences are not absolute; instead, the interface allows a pass
to express the confidence of its preference. By applying a series of passes that
address all the relevant constraints, we show that convergent scheduler
can produce a schedule better than the state-of-the-art schedulers.
Bio
Saman P. Amarasinghe is an Associate
Professor in the Department of Electrical Engineering and Computer Science at
Massachusetts Institute of Technology and a member of the Computer Science and
Artificial Intelligence Laboratory (CSAIL). He received his BS in Electrical
Engineering and Computer Science from Cornell University in 1988, and his MSEE
and Ph.D from Stanford University in 1990 and 1997, respectively. Currently he
leads the Commit compiler group and is the co-leader of the MIT Raw project. His
research interests are in discovering novel approaches to improve the
performance of modern computer systems without unduly increasing the complexity
faced by either application developers, compiler writers, or computer
architects.
Tim
Sherwood, UC
Monday,
Abstract
Understanding program behavior is at the foundation of
computer architecture and program optimization. Many programs have wildly
different behavior on even the very largest of scales (over the complete
execution of the program). This realization has ramifications for many
architectural and compiler techniques, from thread scheduling, to feedback
directed optimizations, to the way programs are simulated. However, in
order to take advantage of time-varying behavior, we must first develop the
analytical tools necessary to automatically and efficiently analyze program
behavior over large sections of execution.
In this talk I will describe a new technique, Basic Block Distribution
Analysis, as a means of summarizing, visualizing, and exploiting the time
varying behavior of programs. With this as a framework, we can see that
many programs execute as a set of phases. Using techniques from machine
learning, I will show how we can find these phases automatically at either
profile or run-time, and then use this information to reduce simulation time
and help guide expensive run-time optimizations.
Bio
Tim Sherwood is currently a Ph.D. candidate in Computer
Science at the
Doug
Burger, UT
Friday,
4/18/2003, 2-3pm, 104 Gates Hall
Abstract
In this talk, I will first present an overview of the
Polymorphous TRIPS architecture, which addresses the wire delay problem in both
the processing core and memory system, permits scalable wide-issue out-of-order
performance from 8- to 64-issue, and can be morphed to meet the needs of diverse applications, including single-threaded,
multithreaded (server), or data-intensive (scientific, streaming, and signal processing). I will then present a new technique called
Speculative Dataflow Traversal that permits extremely efficient recovery from data mis-speculations, permitting
simple, efficient, and effectively free partial rollbacks when a data mis-speculation occurs. I will describe how we use it to make the cost of memory disambiguation
cheaper, as well as permitting aggressive data value speculation. I will also show how this technique enables new classes of
speculation, such as coherence speculation, which can make shared-memory systems to scale to larger levels.
Bio
Doug Burger is an Assistant Professor of Computer Sciences at the University of Texas at Austin. He received his Ph.D. in Computer Sciences from the University of Wisconsin-Madison in 1998, after seven years of gorging on tasty Wisconsin cheese. His main research area is computer architecture, and his interests span compilers, operating systems, and emerging technologies. He is co-leader of the TRIPS project at UT-Austin, which is building a system from the microprocessors on up that is targeted at technologies in the 2010 time-frame, and he coaches the UT Marathon Team.
Kees Vissers, Chameleon Systems & UC Berkeley
Monday,
Abstract
The purpose of this talk is to give an overview of the field and show that great progress has been made in reconfigurable computing, but that we are only at the beginning of a very interesting area of multi-processor systems. Current silicon technology allows the integration of several hundreds of ALUs. Historically multi-processor systems have failed to make a significant industrial impact. However in the domain of embedded systems many ad-hoc multiprocessor systems have emerged. Unfortunately these have often been programmed in verilog or other dedicated non-standard specific environments. The talk will focus on what interesting architectures are under research, what programming problems remain and why research for embedded system design might provide the break-through for novel multi-processing architectures. I will discuss 3 different architectures and their programming environments that I implemented over my career: Fine grain parallelism dataflow machine with a multi rate Signal Flow Graph programming environment; high performance VLIW, embedded in an SoC, with C/C+ and streaming APIs; network of ALUs with a Simulink programming environment. I will show the pros and cons of all these approaches. The implementations will be put into perspective for high performance video processing.
Bio
Kees Vissers
graduated in 1980 from
Peter
Bannon, HP
Monday,
5/5/2003, 4-5pm, 104 Gates Hall
Abstract
The Alpha 21364 integrates a high performance RISC core with an L2 cache, Direct RDRAM memory controller and a router. The entire design operates at 1.2 GHz, providing exceptional performance for applications requiring memory systems with high capacity, low latency, and high bandwidth. The talk will provide an overview of the chips micro architecture along with selected performance data.
Bio
Pete Bannon is a Staff Fellow in HP's Alpha Development Group. He has participated in the design and verification
of several microprocessor chips including the Alpha 21164 and Alpha
21164PC. He is currently the architect
of the Alpha 21364 design. Pete joined
Digital in 1984 after receiving a B.S. in computer system design from the