Evaluation and Comparison of Existing Cache Designs Implemented as an IRAM

Processors implemented in a DRAM process (IRAM) will offer, among other, high memory bandwidth (x 100) and low memory latency (x 1/10). Yet, logic in a DRAM process is likely to be significantly slower, at least for the moment. The goal of this project is to quantify how much slower or faster features like logic, SRAM and DRAM can be in an IRAM, and still allow a performance advantage compared to a conventional system implementation.

Within this project, we will measure the execution behavior of some characteristic programs/benchmarks for two high-performance microprocessors : the 21064 Alpha and the Pentium Pro. These programs will vary in CPU requirements, memory requirements (size, latency, bandwidth) and the amount of OS activity they create. The measurements will include computation cycles, cache misses at various levels, stall cycles due to mispredictions or structural hazards etc., both for user and kernel activity. Using this measurements, we will be able to plot execution time (i.e. system performance) as a function of logic/SRAM/DRAM speed in an DRAM process.

A secondary goal of the project is to compare the performance of each microprocessor implemented as an IRAM and identify the specific cache design features that lead to poor/great performance. This will be useful designing in the future an optimal microarchitecture for the IRAM. In addition, we may draw some conclusions on whether a simpler architecture (like the one of the 21046 Alpha) or a more advanced and complex design (that of the Pentium Pro) is more appropriate for an IRAM implementation.

Christoforos E. Kozyrakis & Helen Wang: Last Update 10/15/96