Sale!

CS3350B Assignment 4 Solved

$35.00

Category:

Description

5/5 - (2 votes)

Exercise 1. [20 Marks]

Consider the multi-cycle MIPS datapath presented in Figure 1,
it shows 4 inter-stage registers: IF/ID, ID/EX, EX/MEM, and MEM/WB. Consider also
the control signals presented in the diagram in blue. Assume the ALUOp control signal is
3 bits. Ignore control signals not shown (i.e. the ones controlling forwarding). Determine
the minimum size, in bits, of each of the four inter-stage registers. For part marks, ensure to
include workings and not just the total values.

Exercise 2. [15 Marks]

Consider a pipelined processor with 𝑁 stages. Each stage except
the last runs in 𝑡 units of time (say pico-seconds) meanwhile the last stage runs in 𝑚×𝑡 units
of time where 𝑚 > 1. Thus, the last stage is slower than the other ones. Assume there are 𝑐
independent and hazard-free instructions being processed by this pipeline.
(a) Compute an expression for the actual pipeline speedup based on the variables 𝑁, 𝑡, 𝑚,
and 𝑐. Show your workings.
(b) Compute an expression for the ratio of time for which the pipeline runs at full occupancy
based on the variables 𝑁, 𝑡, 𝑚, and 𝑐. Show your workings. Recall full occupancy means
every pipeline stage is actively computing work.
2

Exercise 3.

Consider the following C code:
for (i = 0; i < n; ++i)
a[i] = b[i] + i;
where a and b are 32-bit integer arrays of size n.
We have a corresponding MIPS instruction sequence:
add $t0, $0, $0 # $t0 = 0, which corresponds to i in C code
loop: lw $s1, 0($s4) # assume $s4 stores the base address of array b
add $s0, $s1, $t0 # $s0 gets b[i] + i
sw $s0, 0($s2) # assume $s2 stores the base address of array a
addi $t0, $t0, 1 # ++i

addi $s2, $s2, 4 # get address of a[i+1]
addi $s4, $s4, 4 # get address of b[i+1]
slt $t2, $t0, $s5 # assume that $s5 holds n
bne $t2, $0, loop # if $t2 == 1, go to loop

Assume that the above MIPS instructions will be executed on a 5-stage pipelined processor.
Ignore structural hazards and assume the branch condition is computed in the EX stage.
(a) [10 Marks] Draw the pipeline execution diagram for one iteration of the loop, that is,
starting at the label loop. Assume there is no data forwarding. Compute the CPI (clock
cycles per instruction) for that one iteration. Do not re-order the instructions.

(b) [10 Marks] Draw the pipeline execution diagram for one iteration of the loop, that
is, starting at the label loop. This time, assume all possible data forwarding is used. For
each instance of data forwarding, annotate your pipeline diagram (say, with arrows, colors,
or footnotes) to indicate the source and destination of the forwarding; also say what kind of
forwarding it is. Compute the CPI (clock cycles per instruction) for that one iteration. Do
not re-order the instructions.

(c) [5 Marks] Apply loop unrolling (as well as instruction re-ordering, if you like) on the
above MIPS code for two iterations. Write out the corresponding MIPS instruction code. Make
sure to use offsets appropriately to avoid unnecessary instructions.
3
(d) [15 Marks] Consider a 2-issue extension of MIPS. That is, a VLIW extension of MIPS
where two instructions occur in each instruction word. The issue packet has the following
format: the first instruction must be arithmetic or branch, and the second instruction must
be a data transfer instruction (lw or sw). Still assume all types of forwarding.

Using the code of your unrolled loop of part (c), statically schedule one iteration of the (now
unrolled) loop to run optimally on this 2-issue machine. You may wish to modify your code
from part (c) to have a different order and use additional registers to accomplish this task.
Consider using the following table as a starting point and to help guide you through the
process
ALU or branch Data transfer CC
loop: 1
2
3
4
5
6
7
8
9

4

Exercise 4. [25 Marks]

Consider a multi-core processor with 2 cores, named P1 and P2.
Each core has a dedicated cache with the following characteristics:
• 2-way set associative and a 16-byte capacity;
• is initially empty;
• follows the MESI snooping protocol;
• follows write-back and write-allocate protocols; and
• follows a pseudo-LRU replacement policy where
(𝑖) empty cache lines in a set are filled first, then,

(𝑖) if there are any invalid cache lines in a set replace them, then,
(𝑖𝑖) if no invalid cache lines are present, follows a typical LRU replacement policy.
Given the following list of serialized memory byte address accesses by the cores, determine:
(a) whether each access results in a cache hit, cold miss, conflict miss, capacity miss, true
share miss, or false share miss;
(b) the data stored in each cache after all addresses in the list have been accessed; and
(c) the MESI state of each cache block after all addresses in the list have been accessed.
Time Memory Access Hit/Miss Type
1 P1 Reads 5
2 P2 Writes 8
3 P1 Reads 9
4 P1 Writes 14

5 P1 Reads 3
6 P1 Writes 12
7 P2 Reads 6
8 P2 Reads 17
9 P1 Reads 20
10 P2 Reads 4
Time Memory Access Hit/Miss Type
11 P2 Writes 9
12 P2 Writes 10
13 P2 Reads 2
14 P1 Writes 7
15 P1 Reads 8
16 P1 Reads 4

17 P2 Reads 12
18 P2 Reads 7
19 P1 Writes 2
20 P1 Reads 11
To answer part (a) use the above table. To answer parts (b) and (c), use the below tables.
P1 Cache
Set Cache Block Data State
0
1
2
3
P2 Cache
Set Cache Block Data State
0
1
2
3
5