## Description

1. Introduction

This assignment involves an introduction to parallel programming, i.e., software designed to

execute on multiple processors of a computer. Here, you will use a software library called OpenMP

to write your parallel code. OpenMP uses a shared memory programming model. You will work

on an individual basis for this assignment (no teams).

The assignment involves writing two versions of a parallel code to perform matrix multiplication,

i.e., compute C=B*A where A, B, and C are matrices and * is matrix multiplication. You may

assume the matrices are all square with N rows and N columns.

Parallelization can be accomplished by observing that C[i,j] is simply the dot product of row i

of A and column j of B. The N*N dot products could all be performed in parallel by mapping each

to a different thread. Here, you will use computation of a single row of the result matrix (N dot

products) as the unit of computation distributed among the threads, resulting in a maximum of Nfold parallelism.

You will develop two different versions of the matrix multiplication code and compare the

performance achieved by each with a sequential execution, i.e., execution using a single thread.

These two codes differ in the approach used to distribute the computations among the threads. The

first uses a static mapping approach while the second uses a dynamic “worker” approach. Your

code must be designed to work so the number of threads can be easily changed (e.g., using

#define or a command line argument without making significant changes to the code.

2. Static Mapping

Assume there are K threads performing the computation. In this approach, each thread is statically

assigned a set of rows of the result matrix C for which it is responsible for computing results.

Specifically, thread 0 is responsible for computing values for rows 0, K, 2K, … of C. Thread 1

computes values for rows 1, K+1, 2K+1, … In other words, the rows of the matrix are assigned in

round robin fashion to the different threads.

This approach is called a static mapping because the assignment of rows (computations) to threads

is determined in advance, before the program begins to execute.

3. Dynamic Mapping

In the dynamic approach a pool of “worker threads” is used. This means you create K threads to

perform the computation. Each worker thread repeatedly accesses a global data structure to

allocate a single row of the result matrix to compute. It then performs this computation, and then

goes back to the global data structure to allocate another piece of work to do. This process

continues until the entire matrix computation is complete. Each row of the result matrix must be

assigned to exactly one thread.

2

In other words, each worker thread executes the following loop:

While (rows of the result matrix have not been computed) {

Allocate a row of the result matrix to compute

Compute results for this row of the result matrix

}

Note that when using this approach, the specific rows of the matrix assigned to a certain thread is

not known until during the execution of the computation itself. Further, in repeated executions of

the same code a particular thread may compute different rows of the result matrix. However, the

same results should always be computed independent of how the rows are mapped to threads.

To implement the dynamic allocation approach, you can define a single global variable, initialized

to zero, that indicates the number of the row that should be allocated next to a worker thread. Each

worker thread can simply read this variable to allocate a row to compute, and then increment the

variable so the next worker is allocated a different row. Be sure to place accesses to this global

variable in a critical section to avoid race conditions which may lead to erroneous results, e.g., a

row being computed twice!

4. Experiments

Construct a set of experiments to measure the performance (execution time) of the two parallel

implementations and a sequential implementation. Note that in general, an execution using one

thread is not the same as a sequential execution because it will have parallel processing overheads.

However, for this problem, these will likely result in similar run times. SpeedUp(K) is defined as

the execution time of the sequential code divided by the execution time of the parallel code using

K threads, and indicates now many times faster the parallel code executes.

Create two different “sizes” of the matrix multiplication computation. Define a “small” matrix as

one with 50 rows and 50 columns. Define a “large” matrix by setting the number of rows and

columns (N) to a value where the execution of the sequential code is, say approximately 10 or 20

seconds.

Your code should fill the matrix with random numbers in the interval [0.0, 1.0]. All values should

be double precision floating point numbers. Measure the runtime of the three codes (sequential

static, dynamic) for the small and large matrices varying the number of threads. Show plots of

speedup as the number of threads is varied for the small and large matrices. Explain your results

in the project report.

You will need a multiprocessor computer to perform your experiments. Use the deepthought

cluster for this purpose.

5. Implementation Notes

You may wish to refer to the sample code for computing dot products in developing your code for

the static mapping. You may utilize the dot product code as a starting point in creating your matrix

multiplication code, and reuse any portions you desire.

You will need a timer to determine the execution time of your code. The C library

can be used for this purpose. This will enable your program to read wallclock time, i.e., time on

3

the computer, enabling you to compute elapsed time values at very high precision. This will be

important, especially for the small matrix computation.

You may allocate memory for the arrays statically as global variables, or dynamically using

malloc(). Note the arrays must be defined as shared variables for your program to work

correctly.

6. Student Taking CX 4010

Turn in a report including a brief description of your software and explanation how you verified it

computes correct results. Turn in your software in a single zip file. Your software must be well

documented and include comments so the code is easy to understand. You must include a

README file with instructions on how to compile and run your program.

Write up the results of your timing experiments and show graphs depicting the speedup results you

obtained. Explain any anomalous, unexpected results. Which parallel version yields better

performance? Explain why. Further, speculate in what situations (in general) a dynamic mapping

approach might perform better than the static approach, and what situations the opposite might be

true.

7. Students Taking CSE 6010

Complete the steps described above for students taking CX 4010. In addition, conduct a literature

search to find an efficient approach to computing matrix multiplication using a message passing

approach (OpenMP uses a shared memory model). Write a brief description of a parallel matrix

multiplication algorithm using message passing, and include references to relevant literature.

8. Collaboration, Citing, and Honor Code

Finally, we remind you of class policies regarding collaboration, citations, and the honor code.

All students are expected to follow the Georgia Tech Honor Code. You are encouraged to work

together and help each other on assignments and preparing for exams, but all work you turn in

must be entirely your own work.

We encourage discussion with other students, but all code you turn in must be written entirely by

you. We recommend you destroy any notes taken while consulting with other students before

writing your own code. Dissemination of code to other students is not allowed.

The Internet is a useful resource to solve specific programming problems. You are encouraged to

use the Web for information or to answer specific questions, but copying or utilizing code from

the Web violates the Honor Code. If you use the Web or other sources for any information, you

must cite these sources in your assignment submission. If you are unsure what is permissible,

consult with the instructor or a TA.