The delay elements are identical to the delay elements of the DLL and
are biased off the same references, so that their delays should be the
same. For 1 GHz data, the delay elements of the receive DLL will be spaced
by 0.5 ns in order to sample both data and edge. Therefore, the four inputs
to the phase interpolator will be spaced by a total of 1.5 ns. Since the
data eye is only 1 ns, the range of the phase interpolator is 0.5 ns larger
than necessary. This is important because the delay elements in the DLL
and the interpolator may not be precisely matched since their loads may
differ slightly. If we only used three inputs to the interpolator, spaced
a total of 1.0 ns apart, and the delay turned out to be slightly less than
1 ns, there is the possibility that the center of the data eye would fall
outside of the interpolator's range. By using a larger than necessary range,
we can guarantee that the center of at least one data eye will fall within
the range.
The phase interpolator has a granularity of 16 steps per clock phase, so that there are a total of 64 phase steps, each spaced by 31.25 ps. We could have just interpolated between the first and last clock edges, spaced 1.5 ns apart, but instead chose to always interpolate between adjacent edges, spaced 0.5 ns apart. This is because the interpolator is not very linear, and by interpolating between smaller equally spaced steps we impose some linearity.
The weight of the phase interpolator is controlled by a 16-bit shift register. We could have also chosen to use a 4-bit up down counter, where each bit had twice the weight of the previous bit, but chose the shift register since it was simpler to implement and simpler to complement, since we also needed the complement of the 16-bit weight. The area and power of the shift register were both insignificant (37,500 um2 and 53 uW).
The inputs to the shift register are the up and down signals produced by the bit lock logic. The shift register shifts 1's in on the left and 0's in on the right. Control logic senses when the shift register contains all 1's or all 0's. This logic will then increment or decrement the 2b counter appropriately, which will shift the two edges being interpolated between. The shift register will then continue shifting in the opposite direction.
We chose to implement a current-controlled phase interpolator, as described in [Sid97] rather than a voltage-controlled interpolator, as described in [Enam92]. The voltage controlled interpolators were much less linear and more difficult to control, since they required analog control inputs. Although the current-controlled interpolator was bigger and consumed more power (both still turned out to be insignificant - 4500 um2 and 924 uW) its outputs were much more linear and it took digital inputs.
The nmos part of the block shown above is repeated 16 times, so that there are 16 phi diff pairs and 16 psi diff pairs. The total current through the interpolator is constant and will be divided among phi and psi diff pairs based on the digital interpolator weight, which shuts off or turns on the appropriate number of phi and psi pairs through the compelementary ctl[i] and ctlb[i] inputs. Depending on the current through phi and psi, the phase of omega will shift between the two.
We examined two types of current-controlled phase interpolators, type-I and type-II, as described in [Sid97]. Type-I places the control switch below the clock inputs, so that the clock inputs can be shared among all 16 diff pairs. Although this would be a smaller design, it is less linear than the type-II that we chose to implement. The reason is that one side cannot be entirely shut off with the type-I interpolator. Even if all of the current is through the phi branch, the psi inputs will still effect the output due to coupling from the gate to drain of the diff pair inputs.
This plot shows the outputs of the phase interpolator. The hspice
simulations are from the extracted layout. The yellow lines are the four
equally spaced outputs of the delay elements, and represent the phi and
psi inputs to the interpolator. The red lines are the 16 phase steps between
the first two clock edges. Although the edges are not perfectly linear,
the largest step of 70 ps is only 39 ps greater than the nominal phase
step. The smallest step is 1ps.
When in lock, the phase interpolator will bounce back and forth between
two adjacent steps. Therefore, the largest phase step determines the maximum
peak-to-peak jitter. We looked at using 8 steps instead of 16, but the
maximum phase step was 145 ps, which would have led to unacceptable peak-to-peak
jitter. Using more than 16 phase steps is difficult since all the gates
of all phi and psi inputs must be driven by the same delay element. In
order to have 16 steps, these gates were made small, but it would be difficult
to make them much smaller and still have acceptable transconductances.
This is the layout of one phi diff pair. There are 16 phi pairs and 16 psi pairs. The nbias transistor is split into two in order to balance the current through the two branches. The pmos part is identical to the symmetric loads of the delay elements, except that there are two of them to match the double current through the nmos devices.
The control logic controls both the weight of the interpolator, through ictl[15:0], and the two clock edges that are input to the interpolator, through sela-seld.
The logic on the top left translates the up/down pulses from the bit-lock logic into left and right signals for the shift register. It bases this decision on the state held by the 2-bit counter, which keeps track of which two clocks are being interpolated between. The outputs of the 2-bit counter feed the logic on the bottom right to select the correct muxes.
Two flip-flops dff_0 and dff_1 determine when the shift register is full of one's or full of zeros and increments or decrements the 2-bit counter aprropriately.
I have only done layout for the shift register. We still have not settled on the reset logic. We would like to wait to evaulate the behavior of the entire system before we decide on this logic. The issue involves exceeding the interpolator's range. If we reach the edge of the range after we have locked, we would either lose a bit or sample an extra bit if we wrapped around to the other end of the range. Therefore, we would like to guarantee that the interpolator never wraps around during operation. We can do this by limiting the range during lock acquisition. This provides some padding on the extreme edges of the range. To determine exactly how much range to allow, we need to know more about how closely the interpolator input delays match the receive DLL delays.
The shift register controls the weight of the interpolator phase by shifting ones and zeros back and forth. Ones are shifted in on the top left and zeros are shifted in on the bottom right. Control logic detects when the shift register is full of ones or zeros and will toggle the relationship between early and late and shift left and shift right.
A shift register cell consists of a D-Flip flop and a 3:1 mux as shown in the yellow outline. The mux chooses between the outputs of the cell and its two neighbors depending on the control signals left, right, and hold. Since the pattern of choosing between left, right, and hold repeats every three cells, I chose to put three cells in a row to make layout simpler. This also pitch-matches the outputs, which are shown with red arrows on the bottom, with the interpolator. The data snakes around as shown by the white lines.
The height of the shift register could be reduced by 30 um or 80% by sharing control lines between adjacent rows. This was not done originally for simplicity.