[ Chapter start ] [ Previous page ] [ Next page ] 13.7 Static Timing AnalysisWe return to the comparator/MUX example to see how timing analysis is applied to sequential logic. We shall use the same input code ( comp_mux.v in Section 13.2 ), but this time we shall target the design to an Actel FPGA. Before routing we obtain the following static timing analysis: Instance name in pin-->out pin tr total incr cell -------------------------------------------------------------------- OUT1 : D--->PAD R 27.26 7.55 OUTBUF I_1_CM8 : S11--->Y R 19.71 4.40 CM8 I_2_CM8 : S11--->Y R 15.31 5.20 CM8 I_3_CM8 : S11--->Y R 10.11 4.80 CM8 IN1 : PAD--->Y R 5.32 5.32 INBUF The estimated prelayout critical path delay is nearly 30 ns including the I/O-cell delays (ACT 3, worst-case, standard speed grade). This limits the operating frequency to 33 MHz (assuming we can get the signals to and from the chip pins with no further delays—highly unlikely). The operating frequency can be increased by pipelining the design as follows (by including three register stages: at the inputs, the outputs, and between the comparison and the select functions): module comp_mux_rrr(a, b, clock, outp); input [2:0] a, b; output [2:0] outp; input clock; reg [2:0] a_r, a_rr, b_r, b_rr, outp; reg sel_r; wire sel = ( a_r <= b_r ) ? 0 : 1; always @ ( posedge clock) begin a_r <= a; b_r <= b; end always @ ( posedge clock) begin a_rr <= a_r; b_rr <= b_r; end always @ ( posedge clock) outp <= sel_r ? b_rr : a_rr; always @ ( posedge clock) sel_r <= sel; Following synthesis we optimize module comp_mux_rrr for maximum speed. Static timing analysis gives the following preroute critical paths: ---------------------INPAD to SETUP longest path--------------------- Instance name in pin-->out pin tr total incr cell -------------------------------------------------------------------- INBUF_24 : PAD--->Y R 4.52 4.52 INBUF ---------------------CLOCK to SETUP longest path--------------------- Instance name in pin-->out pin tr total incr cell -------------------------------------------------------------------- I_1_CM8 : S10--->Y R 9.99 0.00 CM8 I_3_CM8 : S00--->Y R 9.99 4.40 CM8 a_r_ff_b1 : CLK--->Q R 5.60 5.60 DF1 ---------------------CLOCK to OUTPAD longest path-------------------- Instance name in pin-->out pin tr total incr cell -------------------------------------------------------------------- OUTBUF_31 : D--->PAD R 11.95 7.55 OUTBUF outp_ff_b2 : CLK--->Q R 4.40 4.40 DF1 The timing analyzer has examined the following:
By pipelining the design we added three clock periods of latency, but we increased the estimated operating speed. The longest prelayout critical path is now an exit delay, approximately 12 ns—more than doubling the maximum operating frequency. Next, we route the registered version of the design. The Actel software informs us that the postroute maximum stage delay is 11.3 ns (close to the preroute estimate of 9.99 ns). To check this figure we can perform another timing analysis. This time we shall measure the stage delays (the start points are all clock pins, and the end points are all inputs to sequential cells, in our case the D input to a D flip-flop). We need to define the sets of nodes at which to start and end the timing analysis (similar to the path clusters we used to specify timing constraints in logic synthesis). In the Actel timing analyzer we can use predefined sets 'clock' (flip-flop clock pins) and 'gated' (flip-flop inputs) as follows: 1st longest path to all endpins Rank Total Start pin First Net End Net End pin 0 11.3 a_r_ff_b2:CLK a_r_2_ block_0_OUT1 sel_r_ff:D 1 6.6 sel_r_ff:CLK sel_r DEF_NET_50 outp_ff_b0:D ... 8 similar lines omitted ... We could try to reduce the long stage delay (11.3 ns), but we have already seen from the preroute timing estimates that an exit delay may be the critical path. Next, we check some other important timing parameters. 13.7.1 Hold TimeHold-time problems can occur if there is clock skew between adjacent flip-flops, for example. We first need to check for the shortest exit delays using the same sets that we used to check stage delays, 1st shortest path to all endpins Rank Total Start pin First Net End Net End pin 0 4.0 b_rr_ff_b1:CLK b_rr_1_ DEF_NET_48 outp_ff_b1:D 1 4.1 a_rr_ff_b2:CLK a_rr_2_ DEF_NET_46 outp_ff_b2:D ... 8 similar lines omitted ... The shortest path delay, 4 ns, is between the clock input of a D flip-flop with instance name b_rr_ff_b1 (call this X ) and the D input of flip-flop instance name outp_ff_b1 ( Y ). Due to clock skew, the clock signal may not arrive at both flip-flops simultaneously. Suppose the clock arrives at flip-flop Y 3 ns earlier than at flip-flop X . The D input to flip-flop Y is only stable for (4 – 3) = 1 ns after the clock edge. To check for hold-time violations we thus need to find the clock skew corresponding to each clock-to-D path. This is tedious and normally timing-analysis tools check hold-time requirements automatically, but we shall show the steps to illustrate the process. 13.7.2 Entry DelayBefore we can measure clock skew, we need to analyze the entry delays, including the clock tree. The synthesis tools automatically add I/O pads and the clock cells. This means that extra nodes are automatically added to the netlist with automatically generated names. The EDIF conversion tools may then modify these names. Before we can perform an analysis of entry delays and the clock network delay, we need to find the input node names. By looking for the EDIF 'rename' construct in the EDIF netlist we can associate the input and output node names in the behavioral Verilog model, comp_mux_rrr , and the EDIF names, piron% grep rename comp_mux_rrr_o.edn (port (rename a_2_ "a[2]") (direction INPUT)) ... 8 similar lines renaming ports omitted ... (net (rename a_rr_0_ "a_rr[0]") (joined ... 9 similar lines renaming nets omitted ... Thus, for example, the EDIF conversion program has renamed input port a[2] to a_2_ because the design tools do not like the Verilog bus notation using square brackets. Next we find the connections between the ports and the added I/O cells by looking for 'PAD' in the Actel format netlist, which indicates a connection to a pad and the pins of the chip, as follows: piron% grep PAD comp_mux_rrr_o.adl NET DEF_NET_148; outp_2_, OUTBUF_31:PAD. NET DEF_NET_151; outp_1_, OUTBUF_32:PAD. NET DEF_NET_154; outp_0_, OUTBUF_33:PAD. NET DEF_NET_127; a_2_, INBUF_24:PAD. NET DEF_NET_130; a_1_, INBUF_25:PAD. NET DEF_NET_133; a_0_, INBUF_26:PAD. NET DEF_NET_136; b_2_, INBUF_27:PAD. NET DEF_NET_139; b_1_, INBUF_28:PAD. NET DEF_NET_142; b_0_, INBUF_29:PAD. NET DEF_NET_145; clock, CLKBUF_30:PAD. This tells us, for example, that the node we called clock in our behavioral model has been joined to a node (with automatically generated name) called CLKBUF_30:PAD , using a net (connection) named DEF_NET_145 (again automatically generated). This net is the connection between the node clock that is dangling in the behavioral model and the clock-buffer pad cell that the synthesis tools automatically added. 13.7.3 Exit DelayWe now know that the clock-pad input is CLKBUF_30:PAD , so we can find the exit delays (the longest path between clock-pad input and an output) as follows (using the clock-pad input as the start set): Working startset 'clockpad' contains 0 pins. Working startset 'clockpad' contains 2 pins. I shall explain why this set contains two pins and not just one presently. Next, we define the end set and trace the longest exit paths as follows: Working endset 'outpad' contains 3 pins. 1st longest path to all endpins Rank Total Start pin First Net End Net End pin 0 16.1 CLKBUF_30/U0:PAD DEF_NET_144 DEF_NET_154 OUTBUF_33:PAD 1 16.0 CLKBUF_30/U0:PAD DEF_NET_144 DEF_NET_151 OUTBUF_32:PAD 2 16.0 CLKBUF_30/U0:PAD DEF_NET_144 DEF_NET_148 OUTBUF_31:PAD This tells us we have three paths from the clock-pad input to the three output pins ( outp[0] , outp[1] , and outp[2] ). We can examine the longest exit delay in more detail as follows: 1st longest path to OUTBUF_33:PAD (rising) (Rank: 0) Total Delay Typ Load Macro Start pin Net name 16.1 3.7 Tpd 0 OUTBUF OUTBUF_33:D DEF_NET_154 12.4 4.5 Tpd 1 DF1 outp_ff_b0:CLK DEF_NET_1530 7.9 7.9 Tpd 16 CLKEXT_0 CLKBUF_30/U0:PAD DEF_NET_144 The input-to-clock delay, t IC , due to the clock-buffer cell (or macro) CLKEXT_0 , instance name CLKBUF_30/U0 , is 7.9 ns. The clock-to-Q delay, t CQ , of flip-flop cell DF1 , instance name outp_ff_b0 , is 4.5 ns. The delay, t QO , due to the output buffer cell OUTBUF , instance name OUTBUF_33 , is 3.7 ns. The longest path between clock-pad input and the output, t CO , is thus This is the critical path and limits the operating frequency to (1 / 16.1 ns) ª 62 MHz. When we created a start set using CLKBUF_30:PAD , the timing analyzer told us that this set consisted of two pins. We can list the names of the two pins as follows: CLKBUF_30/U0:PAD <no net> CLKEXT_0 CLKBUF_30/U1:PAD DEF_NET_145 CLKTRI_0 The clock-buffer instance name, CLKBUF_30/U0 , is hierarchical (with a '/' hierarchy separator). This indicates that there is more than one instance inside the clock-buffer cell, CLKBUF_30 . Instance CLKBUF_30/U0 is the input driver, instance CLKBUF_30/U1 is the output driver (which is disabled and unused in this case). 13.7.4 External Setup TimeEach of the six chip data inputs must satisfy the following set-up equation:
(where both clock and data delays end at the same flip-flop instance). We find the clock delays in Eq. 13.24 using the clock input pin as the start set and the end set 'clock' . The timing analyzer tells us all 16 clock path delays are the same at 7.9 ns in our design, and the clock skew is thus zero. Actel’s clock distribution system minimizes clock skew, but clock skew will not always be zero. From the discussion in Section 13.7.1 , we see there is no possibility of internal hold-time violations with a clock skew of zero. Next, we find the data delays in Eq, 13.24 using a start set of all input pads and an end set of 'gated' , 1st longest path to all endpins Rank Total Start pin First Net End Net End pin 10 10.0 INBUF_26:PAD DEF_NET_1320 DEF_NET_1320 a_r_ff_b0:D 11 9.7 INBUF_28:PAD DEF_NET_1380 DEF_NET_1380 b_r_ff_b1:D 12 9.4 INBUF_25:PAD DEF_NET_1290 DEF_NET_1290 a_r_ff_b1:D 13 9.3 INBUF_27:PAD DEF_NET_1350 DEF_NET_1350 b_r_ff_b2:D 14 9.2 INBUF_29:PAD DEF_NET_1410 DEF_NET_1410 b_r_ff_b0:D 15 9.1 INBUF_24:PAD DEF_NET_1260 DEF_NET_1260 a_r_ff_b2:D We are only interested in the last six paths of this analysis (rank 10–15) that describe the delays from each data input pad ( a[0] , a[1] , a[2] , b[0] , b[1] , b[2] ) to the D input of a flip-flop. The maximum data delay, 10 ns, occurs on input buffer instance name INBUF_26 (pad 26); pin INBUF_26:PAD is node a_0_ in the EDIF file or input a[0] in our behavioral model. The six t SU (external) equations corresponding to Eq, 13.24 may be reduced to the following worst-case relation: We calculated the clock and data delay terms in Eq. 13.24 separately, but timing analyzers can normally perform a single analysis as follows:
Finally, we check that there is no external hold-time requirement. That is to say, we must check that t SU (external) is never negative or Since t SU (internal) is always positive on Actel FPGAs, t SU (external) min is always positive for this design. In large ASICs, with large clock delays, it is possible to have external hold-time requirements on inputs. This is the reason that some FPGAs (Xilinx, for example) have programmable delay elements that deliberately increase the data delay and eliminate irksome external hold-time requirements. [ Chapter start ] [ Previous page ] [ Next page ] |
© 2024 Internet Business Systems, Inc. 670 Aberdeen Way, Milpitas, CA 95035 +1 (408) 882-6554 — Contact Us, or visit our other sites: |
|
Privacy PolicyAdvertise |