Sale!

TDT 4265 Assignment 2 Computer Vision and Deep Learning solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (3 votes)

Introduction. In the previous assignment, you implemented a single-layer neural network to classify
MNIST digits with softmax regression. In this assignment, we will extend this work to a multi-layer
neural network. You will derive update rules for hidden layers by using backpropagation of the cost
function. Furthermore, you will experiment with several well-known ”tricks of the trade” to improve
your network in both accuracy and learning speed. Finally, you will experiment with different network
topologies, testing different number of hidden units, and number of hidden layers.
Starter Code. With this assignment, we provide you starter code for the programming tasks. We
require you to use the provided files and you are not allowed to create any additional files for your code
(except stated otherwise in the task). You can download the starter code from:
https://github.com/hukkelas/TDT4265-StarterCode.
Report outline. We’ve included a jupyter notebook as a skeleton for your report, such that you won’t
use too much time on creating your report. Remember to export the jupyter notebook to PDF before
submitting it to blackboard. You’re not required to use this report skeleton, and you can write your
report in whatever program you’d like (markdown, latex, word etc), as long as you deliver the report as
a PDF file.
Recommended reading.
1. Check ”Recommended Resources” on blackboard for updates.
2. Neural Networks and Deep Learning: Chapter 1 and 2
3. 3Blue1Brown: What is Backpropagation Really Doing?
4. 3Blue1Brown: Backpropagation Calculus
Delivery We ask you to follow these guidelines:
• Report: Deliver your answers as a single PDF file. Include all tasks in the report, and mark
it clearly with the task you are answering (Task 1.a, Task1.b, Task 2.c etc). There is no need to
include your code in the report.
• Plots in report: For the plots in the report, ensure that they are large and easily readable.
You might want to use the ”ylim” function in the matplotlib package to ”zoom” in on your plots.
Label the different graphs such that it is easy for us to see which graphs correspond to the train,
validation and test set.
• Source code: Upload your code as a zip file. In the assignment starter code, we have included
a script (create_submission_zip.py) to create your delivery zip. Please use this, as this will
structure the zipfile as we expect. (Run this from the same folder as all the python files).
To use the script, simply run: python3 create_submission_zip.py
• Upload to blackboard: Upload the ZIP file with your source code and the report to blackboard
before the delivery deadline.
• The delivered code is taken into account with the evaluation. Ensure your code is well documented
and as readable as possible.
Any group who does not follow these guidelines or delivers late will be subtracted in points.
H˚akon Hukkel˚as (hakon.hukkelas@ntnu.no) 1
TDT4265 Assignment 2 Task 1. Softmax regression with backpropagation
Task 1. Softmax regression with backpropagation
For multi-class classification on the MNIST dataset, you previously used softmax regression with cross
entropy error as the objective function to train a single-layer neural network. Now, we will extend these
derivations to work with multi-layer neural networks. We will extend the network by adding a hidden
layer between the input and output, that consists of J units with the sigmoid activation function. This
network will have two layers: an input layer, a hidden layer, and an output layer 1
.
Notation: We use index k to represent a node in the output layer, index j to represent a node in the
hidden layer, and index i to represent an input unit, i.e. xi
. Hence, the weight from node i in the input
layer to node j in the hidden layer is wji. Similarly, for node j in the hidden layer to node k in the
output layer is wkj . We will write the activation of hidden unit j as aj = f(zj ), where zj =
PI
i=0 wjixi
.
f represents the hidden unit activation function (sigmoid in our case), and I is the dimensionality of
the input. We write the activation of output unit k as ˆyk = f(zk), where f represents the output unit
activation function (softmax in our case). Note that we use the same symbol f for the hidden and output
activation function, even though they are different. However, which f we mean should be clear from the
context. This notation enables us to write the slope of the hidden activation function as f
0
(zj ). Since
we are using the bias trick (as we did in assignment 1), you can ignore the bias in your calculations. To
avoid too many superscripts, we assume that there is only one data sample (N = 1), but extending our
update rule to N > 1 is straightforward.
In the previous assignment you derived the gradient descent update rule for the weights wkj of the output
layer:
wkj := wkj − α
∂C
∂wkj
= wkj − αδkaj , (1)
where δk =
∂C
∂zk
, and ”:=” means assignment. For the weights of the hidden layer, the gradient descent
rule with learning rate α is:
wji := wji − α
∂C
∂wji
, (2)
Equation 2 can be written as a recursive update rule that can be applied without computing ∂C
∂wji
directly
by using the definition δj =
∂C
∂zj
.
Task 1a: Backpropagation (0.75 points)
By using the definition of δj , show that Equation 2 can be written as;
wji := wji − αδjxi
, (3)
and show that δj = f
0
(zj )
P
k wkj δk.
Hint 1: A good starting point is to try rewriting α
∂C
∂wji
using the chain rule.
Hint 2: From the previous assignment, we know that δk =
∂C
∂zk
= −(yk − yˆk).
Task 1b: Vectorize computation (0.25 points)
The computation is much faster when you update all wji and wkj at the same time, using matrix
multiplications rather than for-loops. Show the update rule for the weight matrix from the hidden layer
to output layer and for the weight matrix from input layer to hidden layer, using matrix/vector notation.
We expect you to clearly define the shape of each vector/matrix in your calculation.
Hint: If you’re stuck on this task, take a look in Chapter 2 in Nielsen’s book.
1Note that we only count the layers with actual learnable parameters (hidden layer and output layer).
H˚akon Hukkel˚as (hakon.hukkelas@ntnu.no) 2
TDT4265 Assignment 2 Task 2: Softmax Regression with Backpropagation
Task 2: Softmax Regression with Backpropagation
In this task, we will perform a 10-way classification on the digits in the MNIST dataset with a 2-layer
neural network. The network should consist of an input layer, a hidden layer and an output layer. For
each task we have set the hyperparameters (learning rate and batch size) that should work fine for these
tasks. If you decide to change them, please state it in your report. We expect you to keep/re-implement
the following functions from the last assignment:
• Implementation of one_hot_encode and cross_entropy_loss (include these in task2a.py).
• Early stopping in the training loop. This is not required; however, early stopping might enable
you to stop training early and save computation time. For early stopping we used the same
implementation as last assignment, except increasing the limit of number of validation steps the
model has to not improve over to 50, instead of 10.
• Batch shuffling in batch_loader in utils.py
Input Normalization: Input normalization is a crucial part of optimizing neural networks efficiently.
The convergence is usually faster if the average of each input variable over the training set is close to
zero [LeCun et al., 2012]
2
. A simple, yet efficient, way to normalize your images is
Xnorm =
X − µ
σ
, (4)
where µ and σ is the mean pixel value and the standard deviation over the whole training set, respectively.
For your report, please:
(a) [0.4pt] Find a mean and standard deviation value from the whole training set. Then, implement a
function that pre-processes our images in the function pre_process_images in task2a.py. This
should normalize our images with the input normalization trick shown in Equation 4, and it should
apply the bias trick. Note that you should use the same mean and standard deviation value when
you normalize your training set, validation set, and test set!
(b) [1.4pt] Implement the following in task2a.py:
• Implement a function that performs the forward pass through our softmax model. This should
compute ˆy. Implement this in the function forward.
• Implement a function that performs the backward pass through our two-layer neural network.
Implement this in the function backward. The backward pass computes the gradient for each
parameter in our network (for both the weights in the hidden layer and the output layer).
We have included a couple of simple tests to help you debug your code.
(c) [0.5pt] The rest of the task 2 subtasks should be implemented in task2.py.
Implement softmax regression with mini-batch gradient descent for a multi-layer neural network.
The network should consist of a single hidden layer with 64 hidden units, and an output layer with
10 outputs. Initialize the weight (before any training) to randomly sampled weights between [-1,
1] 3
. You should only require to change train_step in task2.py to support multi-layer neural
networks.
(report) Include a plot of the training and validation loss and accuracy over training. Have the
number of gradient steps on the x-axis, and the loss/accuracy on the y-axis.
(d) [0.4pt] (report) How many parameters are there in the network defined in task 2c? 4
2Section 4.3 in [LeCun et al., 2012] explains in detail the effect of input normalization and presents an extreme case of
what can happen if you don’t normalize your input data.
3You can use the function np.random.uniform(-1, 1, (785, 64)) to get a weight with shape [785, 64] sampled from a
random uniform distribution.
4Number of parameters = number of weights + number of biases
TDT 4265  3
TDT4265 Assignment 2 Task 3: Adding the ”Tricks of the Trade”
Task 3: Adding the ”Tricks of the Trade”
Read the paper Efficient Backprop [LeCun et al., 2012] Section 4.1-4.7. The paper can be found with
the assignment files. Implement the following ideas from the paper. Do these changes incrementally,
i.e., report your results, then add another trick, and report your result again. This way you can observe
what effect the different tricks improves the learning.
Momentum: LeCun et al. [LeCun et al., 2012] define the momentum update step as,
∆w(t + 1) = α ·
∂C
∂w + γ∆w(t) (5)
where ∆w(t + 1) is the weight update for step t and γ is the strength of the momentum term. Following
this notation, we can now write Equation 2 as wji := wji − ∆wji(t). Instead of using LeCun’s definition,
will use the standard way of implementing it in neural network frameworks:
∆w(t + 1) = ∂C
∂w + γ · ∆w(t) (6)
and update our weights with wji := wji − α · ∆wji(t), where ∆wji(t) is set to 0 for the first gradient
step.
Implement the following:
(a) [0.5pt] Initialize the input weights from a normal distribution, where each weight use a mean of
0 and a standard deviation of 1/

fan-in. Fan-in is the number of inputs to the unit/neuron.
Implement it in __init__ for your model in task2.py, where use_improved_weight_init can be
used to toggle it on/off.
(b) [0.7pt] For the hidden layer, use the improved sigmoid in Section 4.4. Note that you will need
to derive the slope of the activation function again when you are performing backpropagation.
Implement it in your model in task2.py, where use_improved_sigmoid can be used to toggle it
on/off.
(c) [0.6pt] Implement momentum to your gradient update step. Use momentum with µ = 0.9. Note
that you will need to reduce your learning rate when applying momentum. For our experiments, a
learning rate of 0.02 worked fine. Implement it in train_step in task2a.py, where use_momentum
can be used to toggle it on/off, and momentum_gamma is the momentum strength.
In your report, please: Shortly comment on the change in performance, which has hopefully improved
with each addition, at least in terms of learning speed. Note that we expect you to comment on
convergence speed, generalization/overfitting, and final accuracy/validation loss of the model. Include a
plot of the loss detailing the improvements for each addition. (an example is shown in Figure 1) We’ve
included task3.py as an example on how you can create this comparison plot. You can extend this file
to solve this task.
Figure 1: Cross entropy loss for training the model from task2 with and without shuffling the training
examples.
TDT 4265  4
TDT4265 Assignment 2 Task 4: Experiment with network topology
Task 4: Experiment with network topology
Start with your final network from Task 3. Now, we will consider how the network topology changes the
performance.
In you report, please answer the following:
(a) [0.3pt] (report) Set the number of hidden units to 32. What do you observe if the number of
hidden units is too small?
(b) [0.3pt] (report) Set the number of hidden units to 128. What do you observe if the number is too
large?
For 4a) and 4b) you can include a plot of either loss or accuracy to support your statements.
(c) [1.0pt] Generalize your implementation of your softmax model to handle a variable number of hidden
layers. You can modify your model from task 2a. The variable neurons_per_layer specifies the
number of layers (length of the list) and the number of units per layer.
To test your implementation you can run the code in task4c.py, where we use gradient approximation to test your model on a network with 2 hidden layers.
Hint: When adding a new hidden layer, you can copy the update rule from the previous hidden
layer (This is the beautiful part of backpropagation!).
(d) [0.4pt] Create a new model with two hidden layers of equal size. The model should have approximately the same number of parameters as the network from task 3 5
. Train your new model (you
can use the same hyperparameters as before).
(report) In your report, state the number of parameters there are in your network from task 3,
and the number of parameters there are in your new network with multiple hidden layers. Also,
state the number of hidden units you use for the new network.
(report) Plot the training, and validation loss over training. Repeat this plot for the accuracy.
(report) How does the network with multiple hidden layers compare to the previous network?
(e) [0.5pt] Train a model with ten hidden layers where each layer has 64 hidden nodes.
(report) Plot the training loss in the same graph as your baseline from task 3 (you can plot it in
the same graph as task 4d). What happens with the model? What is the reason for the change in
model performance?
Tip: You can set num_epochs to 5 if the training takes a long time.
References
[LeCun et al., 2012] LeCun, Y. A., Bottou, L., Orr, G. B., and M¨uller, K.-R. (2012). Efficient backprop.
In Neural networks: Tricks of the trade, pages 9–48. Springer.
5Remember, number of parameters = number of weights + number of biases