## Description

In this assignment, you will experiment with thread creation and synchronization using both busy waiting and semaphores. You will use the POSIX thread library. Your job is to write a program that generates a large array of random numbers as specified below and then computes the product of all the numbers in the array mod NUM_LIMIT (defined in the given template) using the four different schemes described below. In this assignment you will be measuring the time taken in each case. You are given a template (MTFindProd_template.c) that has the timing code in it. Use the given template. When computing the modular product of the array you MUST mod by NUM_LIMIT after each multiplication or the variable that you are using to store the product will overflow.

## Your program will take the following three command-line arguments:

- The array size (a positive integer between 1 and 100 000 000)
- The number of threads (a positive integer between 1 and 16)
- The index at which a zero must be placed in the input array. If this index is negative, don’t put a zero in the array.

## Your program will perform the following steps:

- Generate the input data by first filling the array with random numbers in the range 1 to MAX_RANDOM_NUMBER (defined in the given template). Your program should then check for the third command-line argument. If it is a valid non-negative index, your program must set the value at that index to zero.
- Compute the product of all the numbers in the array mod NUM_LIMIT without creating any new threads and print the computation time. Your single threaded multiplier must terminate as soon as a zero is found.
- Based on the number of threads (T), compute the indices for dividing the array into T equal divisions. For each division i, indices[i][0] should be set to the division number (i),

indices[i][1] should be set to the start index, and indices[i][2] should be set to the end index. For example, if the array size (N) is 1000 and T is 2, the indices will be

indices[0][0]=0, indices[0][1]=0, indices[0][2]=499

indices[1][0]=1, indices[1][1]=500, indices[1][2]=999

- Compute the product mod NUM_LIMIT using multi-threading by creating T threads and having each thread compute the product of one division of the array. Multiply the products for each division to get the modular product for the full array. You will need to implement the following three synchronization schemes and print the computation time for each scheme (as shown in the given template):

**First Scheme**: The parent waits for all children.

**Second Scheme**: The parent keeps checking on the children in a busy waiting loop and terminates as soon as one child finds a zero or when all children finish computing the modular product of the numbers in their divisions. The parent must then cancel all threads. ** Do not use any semaphores in this scheme**.

**Third Scheme**: The parent waits on a semaphore that gets signaled by one of the children either when that child finds a zero or when all children finish getting the modular product of the numbers in their divisions. The parent must then cancel all threads.

Use the following command to compile your code:

g++ -O3 MTFindProd.c -o MTFindProd -lpthread

Note that we are using the –O3 option to enable a high level of compiler optimization.

When your program is working correctly, run the following tests on a *dedicated machine*:

- Array size = 100M, T=2, index for zero =50M+1
- Array size = 100M, T=4, index for zero =75M+1
- Array size = 100M, T=8, index for zero =88M
- Array size = 100M, T=2, index for zero =-1 (no zero)
- Array size = 100M, T=4, index for zero =-1 (no zero)
- Array size = 100M, T=8, index for zero =-1 (no zero)

The tests must be run on a dedicated machine to ensure the accuracy of the timing results.

** **

Note: In the code that implements the second scheme **(**parent keeps checking on the children in a busy-waiting loop), make sure that the shared variables that the busy-waiting loop checks on are declared as *volatile*. This will prevent the compiler from optimizing out the checking code and replacing the loop with an infinite loop (while(true)).

** **

**Submission**

Submit the following on Canvas:

- Your source code in one source file named MTFindProd.c

Please don’t use any other file names.

In your source file, please include a header that has the following information:

Your name

Your section number

The OSs on which you have tested your program (Linux, Mac, etc). __Your code must compile and work correctly in our ECS computing environment.__

The hardware configuration of the machine that you ran your tests on. To get cpu information on Linux, you can use one of the following commands:

lscpu

cat /proc/cpuinfo

- A brief report including the following:
- Six tables summarizing the results of the tests that you ran (one table for each test)
- A discussion of the results and the conclusions that you have drawn. Don’t forget to
on the machine that you have used. The interpretation of the results will be highly dependent on the number of CPUs on your machine. Timing tests must be run on a dedicated machine.__report the number of physical cores__

**Extra Work (25%)**

Redo this assignment using multiple processes instead of multiple threads. Use the fork() system call to create multiple processes. Each process will compute the modular product in one division of the array. The array should be divided exactly the same way, that is, the code that divides the array in the parent will be exactly the same. Also, the parent process should synchronize with the child processes using the same synchronization schemes described above (wait for all, busy wait, and semaphore-based synchronization).

In this case, shared variables must be placed in shared memory instead of being declared as global variables, because each process has its own address space. After doing the implementation, run the same timing experiments that you ran with multiple threads. Then write a brief report discussing your results and comparing the multi-thread solution with the multi-process solution in terms of speed and implementation complexity.

The extra work will be graded out of 25, and the required work will be graded out of 100. So, you can earn up to 25% more points on this assignment if you do the extra work. The number of extra points that you will get will depend on the quality of your work.