Description
Description
Your task is to accelerate the computation of convolutional neural network (CNN) using OpenCL with
a multi-core CPU. The details of the algorithm can be found in the lecture slides. Figure 1 shows the
CNN layer that we will work on in this lab:
Tip
● We will use m5.2xlarge instances for grading.
Preparation
Create an AWS Instance
Please refer to the tutorial slides and create an m5.2xlarge instance with a Ubuntu 18.04 AMI.
Please use your AWS educate classroom account for this lab.
Run OpenCL Example: Vector Add
We have prepared the host code for you at GitHub. Log in to your instance and run the following
commands:
git clone https://github.com/UCLA-VAST/cs-133-19w -o upstream
cd cs-133-19w/lab3
./setup.sh
make test-vadd
It should run without error and finish in a few seconds.
Tips
● To resume a session in case you lose your connection, you can run screen after login.
● You can recover your session with screen -DRR if you lost your ssh connection.
● You should stop your instance if you are going back and resume your work in a few hours or
days. Your data will be preserved but you will be charged for the EBS storage for $0.10 per GB
per month (with default settings).
● You should terminate your instance if you are not going to back and resume your work in days
or weeks. Data on the instance will be lost.
● You are recommended to use private repos provided by GitHub. Do not put your code in a
public repo.
Run CNN with OpenCL
If you have successfully launched an instance and run the vector add code, you can start to create
your CNN kernel. The provided code will load test data and verify your results against a ground truth.
Your task is to implement a fast, parallel version of CNN. You can start with the sequential version
provided in cnn.cpp. You should edit intel.cl for this task. To adjust the workgroup parameters, edit
params.sh. For example, if you would like to set the global work size to be (1, 1, 1), you should
uncomment the second line of params.sh by deleting the leading pound sign (#). Note that the
workgroup size doesn’t have to be 3-dimensional. You cannot put spaces around the equal sign (=) in
params.sh. If your workgroup size is multi-dimensional, you cannot omit the quote marks (‘).
Tips
● To check in your code to a private GitHub repo, create a repo first.
git branch -m upstream
git checkout -b master
git add intel.cl params.sh
git commit -m “lab3: first version” # change commit message accordingly
# please replace the URL with your own URL
git remote add origin git@github.com:YourGitHubUserName/your-repo-name.git
git push -u origin master
● You are recommended to git add and git commit often so that you can keep track of the
history and revert whenever necessary.
● If you move to a new instance, just git clone your repo.
● Run make test to re-compile and test your code.
● You can run the sequential CNN by make test-seq.
● If make test fails, it means your code produces wrong result.
● Make sure your code produces correct results!
Submission
You need to report your performance results of your CPU-based OpenCL implementation on an
m5.2xlarge instance. Please express your performance in GFlops and the speedup compared with
the sequential version. In particular, you need to submit a brief report which summarizes:
● Please explain the parallelization strategies you applied for each step (convolution, max
pooling, etc) in this lab. Why did you choose such strategy?
● Please describe any optimization you have applied. (Optional, bonus +5: Evaluate the
performance of at least 3 different optimization techniques that you have incrementally
applied and explain why such optimization improves the performance. Simply changing
parameters does not count and sufficient code change is needed between versions. In your
report, please include the most important changes you have applied to your code for each
optimization.)
● In terms of the execution time, please make a comparison with the given sequential version,
and discuss the scalability of your parallel implementation using 1, 2, 4, 8, 16, 32 work-items.
What is the global/local work size that gives you the best performance?
● Optional: The challenges you faced, and how you overcame them.
You will need to submit your optimized kernel code and the parameter settings. Please do not modify
or submit the host code. Please submit to CCLE. Please verify the correctness of your code before
submission.
Your final submission should be a tarball which contains and only contains the following files:
.tar.gz
└
├ intel.cl
├ params.sh
└ lab3-report.pdf
File lab3-report.pdf must be in PDF format. You should make the tarball by copying your
lab3-report.pdf to the lab3 directory and running
make tar UID=. If you made the tarball in other ways, you MUST put it in the lab3 directory
and check by running make check UID=.
Grading Policy
Submission Format
Your submission will only be graded if it complies with the requirement. In case of missing reports,
missing codes, or compilation error, you will receive 0 for the corresponding category/categories.
Correctness (50%)
Please check the correctness of your implementation.
Performance (25%)
Your performance will be evaluated based on the workgroup settings you set in params.sh. The
performance point will be added only if you have the correct result, so please prioritize the
correctness over performance. Your performance will be evaluated based on the ranges of throughput
(GFlops). We will set five ranges after evaluating all submissions and assign the points as follows:
● Better than TA’s performance: 25 points + 5 points (bonus)
● Range A GFlops: 25 points
● Range B GFlops: 20 points
● Range C GFlops: 15 points
● Range D GFlops: 10 points
● Speed up lower than range D: 5 points
● Slowdown: 0 points
Report (25%)
Points may be deducted if your report misses any of the sections described above.

