Upcoming Architecture Talks:

 

 

Recent Talks:

 

 

 

 

General Info:

 


Arc: A Self-tuning, Low Overhead Replacement Cache

Dharmendra S. Modha

IBM Almaden Research Center 

Monday 5/3/04, 4pm-5pm, 104 Gates Hall 

 

Abstract

We consider the problem of cache management in a demand paging scenario with uniform page sizes. We propose a new cache management policy, namely, Adaptive Replacement Cache (ARC), that has several advantages. In response to evolving and changing access patterns, ARC dynamically, adaptively, and continually balances between the recency and frequency components in an online and self-tuning fashion.  The policy ARC is simple-to-implement and, like LRU, has constant complexity per request. The policy ARC is scan-resistant: it allows one-time sequential requests to pass through without polluting the cache.

On many real-life traces as well as on a synthetic benchmark trace, ARC leads to substantial performance gains over LRU for a wide range of cache sizes. As anecdotal evidence, for the Storage Performance Council Benchmark, at 4GB cache, LRU delivers a hit ratio of 9.19% while ARC achieves a hit ratio of 20%.

This is joint work with Nimrod Megiddo. Links: To learn more about ARC, see: http://www.almaden.ibm.com/StorageSystems/autonomic_storage/ARC/index.shtml

Bio

Dharmendra Modha is a researcher at the IBM Almaden Research Center. He holds a Ph. D. in Electrical Engineering from UC San Diego. He is interested in Caching Algorithms, Information Theory, and Machine Learning. For more details, see: http://www.almaden.ibm.com/cs/people/dmodha/

 


Energy Consumption in Mobile Devices: Why Future Systems Need Requirements-Aware Energy Scale-Down

Parthasarathy Ranganathan
Hewlett Packard Labs
Partha.ranganathan@hp.com
Monday, 12/8/03, 4pm-5pm, 104 Gates Hall

 

Abstract

The current proliferation of mobile devices has resulted in a large diversity of designs, each optimized for a specific application, form-factor, battery life, and functionality (e.g., cell phone, pager, MP3 player, PDA, tablet, laptop). Recent trends, motivated by user preferences towards carrying less, have focused on integrating these different applications in a single general-purpose device, often resulting in much higher energy consumption and consequently much reduced battery life. Our research argues that in order to achieve longer battery life, such systems should be designed to include requirements-aware energy scale-down techniques. Such techniques would allow a general-purpose device to use hardware mechanisms and software policies to adapt energy use to the user's requirements for the task at hand, potentially approaching the low energy use of a special-purpose device.

 

We make two main contributions. We first provide a model for energy scale-down. We argue that one approach to design scale-down is to use special-purpose devices as examples of power-efficient design points, and structure adaptivity using insights from these design points. To understand the magnitude of the potential benefits, we present an energy comparison of a wide spectrum of mobile devices (to the best of our knowledge, the first study to do so). Based on the insights from this study, we propose and evaluate three specific requirements-aware energy scale-down optimizations, in the context of the display, wireless, and CPU components of the system. Our optimizations reduce the energy consumption of each of their targeted subsystems by factors of 2 to 10 demonstrating the importance of energy scale-down in future designs.

 

Bio

Partha Ranganathan is currently a research scientist at Hewlett Packard Labs. His research interests are in low-power system design, system architecture, and performance evaluation. His recent research focuses on designing power- and energy-efficient systems for future computing environments (from small mobile devices to dense servers in data centers). This work has led to a class of "energy scale-down" optimizations that use adaptivity in resources to match system energy efficiency with desired user functionality to achieve significant energy savings. Partha is currently exploring the potential of energy scale-down optimizations in the context of the data center as part of the data center architecture team. Partha received his B.Tech degree from the Indian Institute of Technology, Madras and his M.S. and Ph.D. from Rice University, Houston. He is a primary developer of the publicly distributed Rice Simulator for ILP Multiprocessors (RSIM), and a recipient of the Lodieska Stockbridge Vaughan fellowship and an IIT Madras Alumni Award.

 


Alternative Solutions for Hard Compiler Problems

Saman Amarasinghe

MIT

Monday, 10/20/03, 4pm-5pm, 104 Gates Hall

 

Abstract

Designing an optimizing compiler is a black art. Compiler writers are expected to create effective and inexpensive solutions to NP-hard problems such as instruction scheduling and register allocation. To make matters worse, separate optimization phases have strong interactions and competing resource constraints. Complexities of modern architectures further muddy the solution space in which they work. Compilers cannot practically find optimal solutions to NP-hard problems.  Therefore, compiler writers divide the problem into multiple phases and devise heuristics that find approximate solutions for a large class of applications.  However, to achieve satisfactory performance, developers are forced to iteratively tweak their heuristics and change the ordering of the phases.  In this talk I will introduce two techniques that simplify optimizer development:  Meta Optimization and Convergent Scheduling.

Meta Optimization a methodology for automatically fine-tune compiler heuristics. Meta Optimization uses
machine-learning techniques to automatically search the space of compiler heuristics. These techniques reduce
compiler design complexity by relieving compiler writers of the tedium of heuristic tuning.  Our machine-learning system uses an evolutionary algorithm to automatically find effective compiler heuristics.

Convergent Scheduling is a general framework for instruction scheduling on spatial architectures. A convergent scheduler is composed of independent passes, each implementing a heuristic that addresses a particular problem or  constraint. The passes share a simple, common interface that provides spatial and temporal preferences for each instruction. Preferences are not absolute; instead, the interface allows a pass to express the confidence of its preference. By applying a series of passes that address all  the relevant constraints, we show that convergent scheduler can produce a schedule better than the state-of-the-art schedulers.

Bio

Saman P. Amarasinghe is an Associate Professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D from Stanford University in 1990 and 1997, respectively. Currently he leads the Commit compiler group and is the co-leader of the MIT Raw project. His research interests are in discovering novel approaches to improve the performance of modern computer systems without unduly increasing the complexity faced by either application developers, compiler writers, or computer architects.


 

Catching the Time Varying Behavior in Programs

Tim Sherwood, UC San Diego

Monday, 4/14/2003, noon-1pm, 101 Packard Hall

 

Abstract

Understanding program behavior is at the foundation of computer architecture and program optimization.  Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program).  This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated.  However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.

In this talk I will describe a new technique, Basic Block Distribution Analysis, as a means of summarizing, visualizing, and exploiting the time varying behavior of programs.  With this as a framework, we can see that many programs execute as a set of phases. Using techniques from machine learning, I will show how we can find these phases automatically at either profile or run-time, and then use this information to reduce simulation time and help guide expensive run-time optimizations.

Bio

Tim Sherwood is currently a Ph.D. candidate in Computer Science at the University of California, San Diego.  There he works with Professor Brad Calder on a variety of research projects in computer architecture including program phase detection and optimization, customized processor design, and hardware based profile driven optimization.  While at UCSD he has worked at both the Compaq Western Research Lab and Hewlett Packard Research Labs in Palo Alto.  Before that, he received his BS from UC Davis in 1998 where he worked on the Active Pages project. 


Low-Overhead Selective Re-Execution in the Polymorphic TRIPS Processor

Doug Burger, UT Austin

Friday, 4/18/2003, 2-3pm, 104 Gates Hall

 

Abstract

In this talk, I will first present an overview of the Polymorphous TRIPS architecture, which addresses the wire delay problem in both the processing core and memory system, permits scalable wide-issue out-of-order performance from 8- to 64-issue, and can be morphed to meet the needs of diverse applications, including single-threaded, multithreaded (server), or data-intensive (scientific, streaming, and signal processing).  I will then present a new technique called Speculative Dataflow Traversal that permits extremely efficient recovery from data mis-speculations, permitting simple, efficient, and effectively free partial rollbacks when a data mis-speculation occurs. I will describe how we use it to make the cost of memory disambiguation cheaper, as well as permitting aggressive data value speculation.  I will also show how this technique enables new classes of speculation, such as coherence speculation, which can make shared-memory systems to scale to larger levels.

Bio

Doug Burger is an Assistant Professor of Computer Sciences at the University of Texas at Austin.  He received his Ph.D. in Computer Sciences from the University of Wisconsin-Madison in 1998, after seven years of gorging on tasty Wisconsin cheese. His main research area is computer architecture, and his interests span compilers, operating systems, and emerging technologies.  He is co-leader of the TRIPS project at UT-Austin, which is building a system from the microprocessors on up that is targeted at technologies in the 2010 time-frame, and he coaches the UT Marathon Team.


Reconfigurable Systems in terms of Computer Architectures

Kees Vissers, Chameleon Systems & UC Berkeley

Monday, 4/21/2003, 4-5pm, 104 Gates Hall

 

Abstract

The purpose of this talk is to give an overview of the field and show that great progress has been made in reconfigurable computing, but that we are only at the beginning of a very interesting area of multi-processor systems. Current silicon technology allows the integration of several hundreds of ALUs. Historically multi-processor systems have failed to make a significant industrial impact. However in the domain of embedded systems many ad-hoc multiprocessor systems have emerged. Unfortunately these have often been programmed in verilog or other dedicated non-standard specific environments. The talk will focus on what interesting architectures are under research, what programming problems remain and why research for embedded system design might provide the break-through for novel multi-processing architectures. I will discuss 3 different architectures  and their programming environments that I implemented over my career: Fine grain parallelism dataflow machine with a multi rate Signal Flow Graph programming environment; high performance VLIW, embedded in an SoC, with C/C+ and streaming APIs; network of ALUs with a Simulink programming environment. I will show the pros and cons of all these approaches. The implementations will be put into perspective for high performance video processing.


Bio

Kees Vissers graduated in 1980 from Delft University in the Netherlands. He joined Philips Research in the Netherlands in 1980. He worked on several EDA algorithms, simulation implementations, Dataflow Systems, including the programming environment and the actual ICs and Boards. He has extensive experience in high performance video processing sytems. He performed research on the design space exploration of streaming applications, the design space exploration of next generation VLIW processors and the systematic trade-off between programmable and dedicated implementations. He worked as an industrial fellow at Carnegie Mellon University and at UC Berkeley. He was director of Architecture at Trimedia Technologies Inc., and most recently Chief Technology Officer at Chameleon Systems Inc.At Chameleon Systems he was responsible for all hardware and software design of a high performance reconfigurable system. He is a part time research fellow at UC Berkeley where his interest is in mapping multiple processes onto a network of processors, applied to network processing problems, e.g. IPv4 routing.

 


The Alpha 21364 Processor

Peter Bannon, HP

Monday, 5/5/2003, 4-5pm, 104 Gates Hall

 

Abstract

The Alpha 21364 integrates a high performance RISC core with an L2 cache, Direct RDRAM memory controller and a router.  The entire design operates at 1.2 GHz, providing exceptional performance for applications requiring memory systems with high capacity, low latency, and high bandwidth.  The talk will provide an overview of the chips micro architecture along with selected performance data.


Bio

Pete Bannon is a Staff Fellow in HP's Alpha Development Group.  He has participated in the design and verification of several microprocessor chips including the Alpha 21164 and Alpha 21164PC.  He is currently the architect of the Alpha 21364 design.  Pete joined Digital in 1984 after receiving a B.S. in computer system design  from the University of Massachusetts. He holds ten patents for VAX and Alpha CPU design.