Evaluation and Comparison of Existing Cache
Designs Implemented as an IRAM
The increasing processor-memory speed gap has become a performance
limitation for current microprocessors. The integration of processor
and memory in a single DRAM chip has been proposed in order to overcome this
problem. Such an architecture will provide high memory bandwidth and
low memory latency, but may have to compensate for slower logic.
In this paper, we use a study of program's execution time and
an analytical model in order to evaluate the potential performance of IRAM
architectures as a function of process parameters, such as the speed of logic
and memory access in a DRAM chip. For memory intensive applications,
IRAM is faster than conventional implementations, even when logic is 1.5
times slower compared to microprocessor processes. Maximum speedup achieved
varies between 1.3 and 1.9. For CPU intensive
applications, almost no logic slowdown is necessary for IRAM to achieve
comparable
performance. We compare the IRAM implementations of simple and a complex
processor/cache architecture and find that the first performs comparably
and, for some applications, even better than the second one. Finally,
we discover that the IRAM implementation of a simple architecture
can be 1.5 to 2.8 times faster than the conventional implementation of
a complex one.
Christoforos E. Kozyrakis & Helen Wang: Last Update 10/15/96