4. Design Implementations of the Memory Crossbar

4.1 Crossbar Overview: A Large crossbar

Figure 4.1: Implementation of the memory crossbar. Horizontal buses spread across the chip. Each bus is hardwired to a single L/S unit and has one switch per memory section.

4.1.1 Buffers

4.1.2 Architectural Issues

4.2 Small crossbar

Figure 4.3: Alternative implementation of the memory crossbar. A small fully-connected crossbar is pitch matched to the vector unit. Its outputs are connected to horizontal buses that run the width of the chip. The memory sections are hardwired to these buses.

4.3 Self Routing Crossbar

4.4 Layout issues for the Crossbar

4.5 Layout results

4.6 Simulation Environment

Figure 4.10: Critical Path of the Crossbar.

4.7 Sizing and Energy-Delay product

An often quoted figure of merit is the energy-delay product (EDP). Below, in figure 4.11, is a graph demonstrating the effect on the energy-delay product as the size of the driver and switches are varied. Although the data is given here is for the small crossbar, the shape of the curve is applicable to all the implementations presented in this paper.

Figure 4.11: Simulated energy-delay product for the small crossbar as the driver and switch, or pass-gate, sizes are varied. Each of these sizes are normalized to the minimum size.

The first thing to notice is that the EDP decreases as the drivers sizes are increased. Although this may seem counterintuitive, as mentioned above, the bus represents a large capacitive load; as a result, the signal transition times are slow for the smaller drivers. This causes the drivers, inverters in this case, to spend extra time in the direct path current regime, thereby increasing the total energy cost. This can be more clearly seen in figure 4.12 which plots the energy vs. delay for a single driver. Notice that increasing the size of the driver from minimum size to approximately 10 times minimum size improves both energy and delay. Beyond this point, the energy increases very quickly for larger drivers due to the extra capacitance of the drivers.

Figure 4.12: Energy vs. Delay for the small crossbar. The driver size to varied from 1x to 100x minimum size while the switch size is fixed at 4x minimum size. The minimum occurs for a driver size of 10x.

In addition to changing the size of the driver, the delay and energy can also be improved by increasing the size of the pass-gate switch. As mentioned earlier, the additional capacitance of a larger device has a minimum effect on the delay time. The decrease in resistance, on the other hand, is fairly significant. This is shown in figure 4.13.

Figure 4.13: Simulated average resistance of a pass-gate switch , one NMOS and one PMOS, as the size of both the transistors, normalized to a minimum size switch, are varied.

As the pass-gate is enlarged, the resistance decreases. Since the intrinsic resistance of the bus is approximately 1k, sizing the switch for resistances below this value does not significantly improve performance. This effect can be clearly seen in figure 4.11. Notice that the EDP decreases as the pass-gates are sized up from minimum size; beyond this point, however, only small improvements are seen.

In summary, by changing the sizes of the drivers and switches, it is possible to significantly vary the EDP; however, the optimal point may not be a realistic point. From the data presented in figure 4.11, the optimal point, corresponding to an EDP of 35pJ*ns, was found to occur for a driver size of 100 times minimum size. This is clearly not realistic, especially when considering that there could potentially be over 700 of these drivers. On the other hand, by allowing the EDP to double, the driver size falls by a factor of 12.5 to eight times the minimum size.

4.8 Crossbar simulation Results

Large Crossbar Small Crossbar with bus Self-Routing Crossbar
Full Swing bus Low Swing Bus Full Swing bus Low Swing Bus Full Swing
Delay (ns) 7.9 9.5 7.7 10.0 6.0
Energy per Transition (pJ) 52.0 30.7 65.2 47.2 54.3
Static Power (nW) 1.3 65.2 2.5 17.7 5
Energy-Delay Product (10-21 Js) 411 292 502 472 326
Area per 64-bit bus (mm2) 4.22 2.82 3.16
Driver Area per 64-bits (mm2) .015 .064 .015 .064 .015

Table 4.3: Simulation Results of Different Crossbar Implementations

Back to report index.