Evaluation and Comparison of Existing Cache Designs Implemented as an IRAM

The increasing processor-memory speed gap has become a performance limitation for current microprocessors. The integration of processor and memory in a single DRAM chip has been proposed in order to overcome this problem. Such an architecture will provide high memory bandwidth and low memory latency, but may have to compensate for slower logic. In this paper, we use a study of program's execution time and an analytical model in order to evaluate the potential performance of IRAM architectures as a function of process parameters, such as the speed of logic and memory access in a DRAM chip. For memory intensive applications, IRAM is faster than conventional implementations, even when logic is 1.5 times slower compared to microprocessor processes. Maximum speedup achieved varies between 1.3 and 1.9. For CPU intensive applications, almost no logic slowdown is necessary for IRAM to achieve comparable performance. We compare the IRAM implementations of simple and a complex processor/cache architecture and find that the first performs comparably and, for some applications, even better than the second one. Finally, we discover that the IRAM implementation of a simple architecture can be 1.5 to 2.8 times faster than the conventional implementation of a complex one.

Christoforos E. Kozyrakis & Helen Wang: Last Update 10/15/96