Delay Interpolator

Block Diagram

4 equally spaced edges generated by 4 delay elements and fed to mux
2b counter chooses which 2 clocks to interpolate between
Weight of interpolator edges is determined by 16-bit shift register

The delay elements are identical to the delay elements of the DLL and are biased off the same references, so that their delays should be the same. For 1 GHz data, the delay elements of the receive DLL will be spaced by 0.5 ns in order to sample both data and edge. Therefore, the four inputs to the phase interpolator will be spaced by a total of 1.5 ns. Since the data eye is only 1 ns, the range of the phase interpolator is 0.5 ns larger than necessary. This is important because the delay elements in the DLL and the interpolator may not be precisely matched since their loads may differ slightly. If we only used three inputs to the interpolator, spaced a total of 1.0 ns apart, and the delay turned out to be slightly less than 1 ns, there is the possibility that the center of the data eye would fall outside of the interpolator's range. By using a larger than necessary range, we can guarantee that the center of at least one data eye will fall within
the range.

The phase interpolator has a granularity of 16 steps per clock phase, so that there are a total of 64 phase steps, each spaced by 31.25 ps. We could have just interpolated between the first and last clock edges, spaced 1.5 ns apart, but instead chose to always interpolate between adjacent edges, spaced 0.5 ns apart. This is because the interpolator is not very linear, and by interpolating between smaller equally spaced steps we impose some linearity.

The weight of the phase interpolator is controlled by a 16-bit shift register. We could have also chosen to use a 4-bit up down counter, where each bit had twice the weight of the previous bit, but chose the shift register since it was simpler to implement and simpler to complement, since we also needed the complement of the 16-bit weight. The area and power of the shift register were both insignificant (37,500 um²and 53 uW).

The inputs to the shift register are the up and down signals produced by the bit lock logic. The shift register shifts 1's in on the left and 0's in on the right. Control logic senses when the shift register contains all 1's or all 0's. This logic will then increment or decrement the 2b counter appropriately, which will shift the two edges being interpolated between. The shift register will then continue shifting in the opposite direction.

Phase Interpolator Schematic

clk inputs are separated from outputs by control switches, to avoid coupling through C_gd
nfet current sources supply 1/16 of total current
same biases as DLL

We chose to implement a current-controlled phase interpolator, as described in [Sid97] rather than a voltage-controlled interpolator, as described in [Enam92]. The voltage controlled interpolators were much less linear and more difficult to control, since they required analog control inputs. Although the current-controlled interpolator was bigger and consumed more power (both still turned out to be insignificant - 4500 um²and 924 uW) its outputs were much more linear and it took digital inputs.

The nmos part of the block shown above is repeated 16 times, so that there are 16 phi diff pairs and 16 psi diff pairs. The total current through the interpolator is constant and will be divided among phi and psi diff pairs based on the digital interpolator weight, which shuts off or turns on the appropriate number of phi and psi pairs through the compelementary ctl[i] and ctlb[i] inputs. Depending on the current through phi and psi, the phase of omega will shift between the two.

We examined two types of current-controlled phase interpolators, type-I and type-II, as described in [Sid97]. Type-I places the control switch below the clock inputs, so that the clock inputs can be shared among all 16 diff pairs. Although this would be a smaller design, it is less linear than the type-II that we chose to implement. The reason is that one side cannot be entirely shut off with the type-I interpolator. Even if all of the current is through the phi branch, the psi inputs will still effect the output due to coupling from the gate to drain of the diff pair inputs.

Phase Interpolator Simulation

Yellow curves are 4 equally spaced clocks
Red curves are 16 phase steps between first two clocks
nominal phase step = 31 ps (22.5 deg.)
maximum phase step = 70 ps (50 deg.)

This plot shows the outputs of the phase interpolator. The hspice simulations are from the extracted layout. The yellow lines are the four equally spaced outputs of the delay elements, and represent the phi and psi inputs to the interpolator. The red lines are the 16 phase steps between the first two clock edges. Although the edges are not perfectly linear, the largest step of 70 ps is only 39 ps greater than the nominal phase step. The smallest step is 1ps.

When in lock, the phase interpolator will bounce back and forth between two adjacent steps. Therefore, the largest phase step determines the maximum peak-to-peak jitter. We looked at using 8 steps instead of 16, but the maximum phase step was 145 ps, which would have led to unacceptable peak-to-peak jitter. Using more than 16 phase steps is difficult since all the gates of all phi and psi inputs must be driven by the same delay element. In order to have 16 steps, these gates were made small, but it would be difficult to make them much smaller and still have acceptable transconductances.

Phase interpolator layout

layout of nfets for phase interpolator
want to balance phi, phib paths, match nbias nfets

This is the layout of one phi diff pair. There are 16 phi pairs and 16 psi pairs. The nbias transistor is split into two in order to balance the current through the two branches. The pmos part is identical to the symmetric loads of the delay elements, except that there are two of them to match the double current through the nmos devices.

Control Logic Schematic

Control logic for phase interpolator
16-bit shift register, 2-bit counter, 2 flip-flops and several gates

The control logic controls both the weight of the interpolator, through ictl[15:0], and the two clock edges that are input to the interpolator, through sela-seld.

The logic on the top left translates the up/down pulses from the bit-lock logic into left and right signals for the shift register. It bases this decision on the state held by the 2-bit counter, which keeps track of which two clocks are being interpolated between. The outputs of the 2-bit counter feed the logic on the bottom right to select the correct muxes.

Two flip-flops dff_0 and dff_1 determine when the shift register is full of one's or full of zeros and increments or decrements the 2-bit counter aprropriately.

I have only done layout for the shift register. We still have not settled on the reset logic. We would like to wait to evaulate the behavior of the entire system before we decide on this logic. The issue involves exceeding the interpolator's range. If we reach the edge of the range after we have locked, we would either lose a bit or sample an extra bit if we wrapped around to the other end of the range. Therefore, we would like to guarantee that the interpolator never wraps around during operation. We can do this by limiting the range during lock acquisition. This provides some padding on the extreme edges of the range. To determine exactly how much range to allow, we need to know more about how closely the interpolator input delays match the receive DLL delays.

Shift Register Layout

16-bit shift register- shift ones in on top left, shift zeros in on bottom right
outputs at bottom
resets to 8 ones, 8 zeros
size is 250 um x 150 um = 37,500 um²

The shift register controls the weight of the interpolator phase by shifting ones and zeros back and forth. Ones are shifted in on the top left and zeros are shifted in on the bottom right. Control logic detects when the shift register is full of ones or zeros and will toggle the relationship between early and late and shift left and shift right.

A shift register cell consists of a D-Flip flop and a 3:1 mux as shown in the yellow outline. The mux chooses between the outputs of the cell and its two neighbors depending on the control signals left, right, and hold. Since the pattern of choosing between left, right, and hold repeats every three cells, I chose to put three cells in a row to make layout simpler. This also pitch-matches the outputs, which are shown with red arrows on the bottom, with the interpolator. The data snakes around as shown by the white lines.

The height of the shift register could be reduced by 30 um or 80% by sharing control lines between adjacent rows. This was not done originally for simplicity.