

| Clk | 3,C etc.             | Operations           | 1                    |
|-----|----------------------|----------------------|----------------------|
| CIK | Sample A:            | Sample B             | Sample C             |
| 1   | N4(*) N5(*), Input X |                      |                      |
| 2   | N2(*).N3(*).N7(+)    |                      |                      |
| 3   | N6 (+)               | N4(*) N5(*), Input X |                      |
| 4   | N8(+)                | N2(*),N3(*),N7(+)    |                      |
| 5   |                      | N6(+)                | N4(*) N5(*), Input X |
| 6   |                      | N8(+)                | N2(*),N3(*),N7(+)    |
| 7   |                      |                      | N6(+)                |
| 8   |                      |                      | N8(+)                |

| Clk | Operations           |                      |                      |  |  |  |
|-----|----------------------|----------------------|----------------------|--|--|--|
|     | Sample A:            | Sample B             | Sample C             |  |  |  |
| 1   | N4(*) N5(*), Input X |                      | -                    |  |  |  |
| 2   | N2(*),N3(*),N7(+)    |                      |                      |  |  |  |
| 3   | N6(+)                | N4(*) N5(*), Input X |                      |  |  |  |
| 4   | N8(+)                | N2(*),N3(*),N7(+)    |                      |  |  |  |
| 5   |                      | N6(+)                | N4(*) N5(*), Input X |  |  |  |
| 6   |                      | N8(+)                | N2(*),N3(*),N7(+)    |  |  |  |
| 7   |                      |                      | N6(+)                |  |  |  |
| 8   |                      |                      | N8(+)                |  |  |  |

Two multiplies per clock, so need two multipliers (A, B). In clock #4, Clock #6 we have two additions, so need need two adders (A, B).

## BR 1/99

3

5



|     | Gene                 | eralized Schedu      | ıle                  |
|-----|----------------------|----------------------|----------------------|
| Clk |                      | Operations           |                      |
|     | Sample J-1:          | Sample J             | Sample J+1           |
| I-2 | N4(*) N5(*), Input X | -                    | -                    |
| I-1 | N2(*),N3(*),N7(+)    |                      |                      |
| I+0 | N6(+)                | N4(*) N5(*), Input X |                      |
| I+1 | N8(+)                | N2(*),N3(*),N7(+)    |                      |
| I+2 |                      | N6(+)                | N4(*) N5(*), Input X |
| I+3 |                      | N8(+)                | N2(*),N3(*),N7(+)    |

Note: The initiation rate must be evenly divisible into the latency in order to generalize the table.

## BR 1/99

## Schedule Clk I+0 What do we need in Registers at Clock I+0? For Sample J-1: N2, N3, N7 For Sample J: x@3, x@2, x@1 For Sample J+1: No operations. Registers: RA: x@3, RB: x@2, RC: x@1, RD: N2, RE: N3,

| Schedule Clk I+0:<br>Sample J-1: N6(N3+N7)<br>Sample J: Input X | $RF \leftarrow RE + RF$<br>$RE \leftarrow X$ | overwrite N7 value, don't need.<br>overwrite N3 value, don't need |
|-----------------------------------------------------------------|----------------------------------------------|-------------------------------------------------------------------|
| Sample J: N4(x@2*a2)                                            | RG ← RB * a2                                 | add new register RG to hold N4                                    |
| Sample J: N5(x@3*a3)                                            | RA ← RA * a1                                 | overwrite x@3 value, don't need.                                  |

Finished: Added extra Register RG.

RF:N7

BR 1/99





| Init Rate |             | Resources |           |  |  |
|-----------|-------------|-----------|-----------|--|--|
|           | Multipliers | Adders    | Registers |  |  |
| 4         | 2           | 1         | 10        |  |  |
| 2         | 2           | 2         | 11        |  |  |

Because Execution units for InitRate = 4 were not fully utilized!

BR 1/99

| Execution Unit Utilization table |            |                                 |                  |                                  |    |
|----------------------------------|------------|---------------------------------|------------------|----------------------------------|----|
| Init Rate                        |            | Execution U                     | Jnit Utilization |                                  |    |
|                                  | Mult A     | Mult B                          | Add A            | Add B                            |    |
| 4                                | 50% (2/4)  | 50 (2/4)                        | 75% (3/4)        | N/A                              |    |
| 2                                | 100% (2/2) | 100% (2/2)                      | 100% (2/2)       | 50% (1/2)                        |    |
|                                  | *          | rs were not fi<br>me in the sch | 5                | for InitRate = $1$<br>tRate = 4. | 2, |
|                                  |            |                                 |                  |                                  |    |
|                                  |            | BR                              | 1/99             |                                  | 10 |

| Clk |                         | Operations              |                         |                      |
|-----|-------------------------|-------------------------|-------------------------|----------------------|
|     | Sample A:               | Sample B                | Sample C                | Sample D             |
| 1   | N4(*) N5(*),<br>Input X | _                       |                         |                      |
| 2   | N2(*),N3(*),<br>N7(+)   | N4(*) N5(*),<br>Input X |                         |                      |
| 3   | N6(+)                   | N2(*),N3(*),<br>N7(+)   | N4(*) N5(*),<br>Input X |                      |
| 4   | N8(+)                   | N6(+)                   | N2(*),N3(*),<br>N7(+)   | N4(*) N5(*), Input X |
| 5   |                         | N8(+)                   | N6(+)                   | N2(*),N3(*),N7(+)    |
| 6   |                         |                         | N8(+)                   | N6(+)                |
| 7   |                         |                         |                         | N8(+)                |

BR 1/99

11

9

## Resource Comparison Again

| Init Rate |             | Resourc | es        |
|-----------|-------------|---------|-----------|
|           | Multipliers | Adders  | Registers |
| 4         | 2           | 1       | 10        |
| 2         | 2           | 2       | 11        |
| 1         | 4           | 3       | ??        |

Latency for all of these designs is 4 clocks.

This table clearly illustrates the time versus area tradeoff in Digital Systems.

Will cost you MORE resources to do something in LESS Time!

BR 1/99



BR 1/99



14

|       | Adder | MultA                             | MultB                             | IO      |
|-------|-------|-----------------------------------|-----------------------------------|---------|
| Clk 1 | idle  | N5                                | N4                                | Input X |
| Clk 2 |       | N3                                | N2                                |         |
| Clk 3 |       | (n5 rdy,<br>saved to<br>register) | (n4 rdy,<br>saved to<br>register) |         |
| Clk 4 | N7    | (n3 rdy,<br>saved to<br>register) | (n2 rdy,<br>saved to<br>register) |         |
| Clk 5 | N6    |                                   |                                   |         |
| Clk 6 | N8    |                                   |                                   |         |

BR 1/99

15



| Clk    |                         | One                     | rations                 |                         |           |
|--------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
| CIK    | Sample A:               | Sample B                | Sample C                | Sample D                |           |
| 1      | N4(*) N5(*),<br>Input X | -                       |                         |                         |           |
| 2      | N2(*),N3(*)             |                         |                         |                         |           |
| 2<br>3 |                         | N4(*) N5(*),<br>Input X |                         |                         | Repeating |
| 4      | N7(+)                   | N2(*),N3(*)             |                         |                         | clocks    |
| 5      | N6(+)                   |                         | N4(*) N5(*),<br>Input X |                         |           |
| 6      | N8(+)                   | N7(+)                   | N2(*),N3(*)             |                         |           |
| 7      |                         | N6(+)                   |                         | N4(*) N5(*),<br>Input X |           |
| 8      |                         | N8(+)                   | N7(+)                   | N2(*),N3(*)             |           |
| 9      |                         |                         | N6(+)                   |                         |           |
| 10     |                         |                         | N8(+)                   | N7(+)                   |           |

| Clk |                         | One                     | rations                 |                         |           |
|-----|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
| CIK | Sample J-2              | Sample J-1              | Sample J                | Sample J+1              |           |
| I-4 | N4(*) N5(*),<br>Input X |                         |                         |                         |           |
| I-3 | N2(*),N3(*)             |                         |                         |                         |           |
| I-2 |                         | N4(*) N5(*),<br>Input X |                         |                         | Repeating |
| I-1 | N7(+)                   | N2(*),N3(*)             |                         |                         | clocks    |
| Ι   | N6(+)                   |                         | N4(*) N5(*),<br>Input X |                         |           |
| I+1 | N8(+)                   | N7(+)                   | N2(*),N3(*)             |                         |           |
| I+2 |                         | N6(+)                   |                         | N4(*) N5(*),<br>Input X |           |
| I+3 |                         | N8(+)                   | N7(+)                   | N2(*),N3(*)             |           |
| I+4 |                         |                         | N6(+)                   |                         |           |
| I+5 |                         |                         | N8(+)                   | N7(+)                   |           |

| Do a soli | ution with | Initiation | Rate = La | tency; 2 n | nult, 2 ad |
|-----------|------------|------------|-----------|------------|------------|
|           | Adder A    | Adder B    | MultA     | MultB      | IO         |
| Clk I     | N6 (j-2)   | idle       | N5 (j)    | N4 (j)     | Input X    |
| Clk I+1   | N8 (j-2)   | N7 (j-1)   | N3 (j)    | N2(j)      |            |
| Clk I+2   | N6 (j-1)   | idle       | N5(j+1)   | N4(j+1)    | Input X    |
| Clk I+3   | N8 (j-1)   | N7 (j)     | N3(j+1)   | N3(j+1)    |            |
| Clk I+4   | N6 (j)     | idle       | N5(j+2)   | N4(j+2)    | Input X    |
| Clk I+5   | N8 (j)     | N7 (j+1)   | N3(j+2)   | N3(j+2)    |            |

General schedule is I, I+1. 6 clocks shown to complete one computation. Initiation Rate = 2, Latency = 6. Overlapping computations of three samples.

BR 1/99

19