CS6023 GPU Programming Assignment 2 Solved

Original price was: $40.00.Current price is: $35.00.



5/5 - (1 vote)

1. Problem Statement

Given four input matrices 𝐴, 𝐡, 𝐢, and 𝐷. Compute the output matrix, 𝑋 = (𝐴 + 𝐡 𝑇 ) 𝐢 𝐷 𝑇 Write an efficient code to compute the output matrix. While writing the code, consider aspects like memory coalescing, shared memory, degree of divergence, etc.

2. Input and Output

2.1. Input

● 4 integers: 𝑝, π‘ž, π‘Ÿ and 𝑠 ● Matrix 𝐴 of size 𝑝 Γ— π‘ž ● Matrix 𝐡 of size π‘ž Γ— 𝑝 ● Matrix 𝐢 of size π‘ž Γ— π‘Ÿ ● Matrix 𝐷 of size 𝑠 Γ— π‘Ÿ

2.2. Output

● Matrix 𝑋 of size 𝑝 Γ— 𝑠

2.3. Constraints

● 2 ≀ 𝑝, π‘ž, π‘Ÿ, 𝑠 ≀ 2 10 ● All the elements in the input matrices will be in the range [-10, 10]

3. Sample Testcase

● Input matrices 𝐴, 𝐡, 𝐢 and 𝐷: Input will be given as: 2 3 3 2 2 5 0 3 -2 1 6 1 -4 2 1 3 1 9 6 -6 7 2 2 4 -3 10 0 5 1 3 -3 First line represents the values 𝑝, π‘ž, π‘Ÿ and 𝑠 Next 𝑝 lines represents the rows of matrix 𝐴 Next π‘ž lines represents the rows of matrix 𝐡 Next π‘ž lines represents the rows of matrix 𝐢 Next 𝑠 lines represents the rows of matrix 𝐷 ● (𝐴 + 𝐡 𝑇 ) ● Output matrix, 𝑋 = (𝐴 + 𝐡 𝑇 ) 𝐢 𝐷 𝑇

4. Points to be noted

● The file β€˜’ provided by us contains the code, which takes care of taking the input, printing the result and printing the execution time. ● Don’t write any code in the main() function. ● You need to implement the compute() function provided in the β€˜’. ● You are free to use any number of functions/kernels. ● You can launch the kernels as you wish. ● It is compulsory to optimize for coalesced accesses. Also, make use of shared memory. ● Do not write any print statements. ● Test your code on large input matrices.

5. Submission Guidelines

● Use the file β€˜’ provided by us. ● Don’t change anything in the main() function. ● Rename the file β€˜’, which contains the implementation of the above-described functionality, to .cu ● For example, if your roll number is CS20M039, then the name of the file you submit on the Moodle should be (submit only the .cu file). ● After submission, download the file and make sure it was the one you intended to submit.

6. Learning Suggestions

● Write a CPU-version of code achieving the same functionality. Time the CPU code and GPU code separately for large matrices and compare the performances. ● Exploit shared memory as much as possible to gain performance benefits. ● Try reducing thread divergence as much as possible.