Sale!

CS 489/698 Assignment 4 Autoencoders and RNNs Solved

Original price was: $40.00.Current price is: $35.00.

Category:

Description

5/5 - (1 vote)

Autoencoders and RNNs

What you need to do

1. Autoencoder [13 marks total]

In this question, you will create an autoencoder and train it on the MNIST digits.

(a) [4 marks] Consider the cosine proximity loss function C(~y,~t) = − ~y · ~t  k~y k k~t k . It is the negative of the cosine of the angle between ~y and ~t. Based on that loss function, we can define cosine proximity cost as the expected loss, E(Y, T) = C ~y,~t  ~y∈Y,~t∈T . Find a formula for the gradient of the cost function with respect to the output, ~y. That is, find a formula for ∂E ∂~y . Simplify the formula as much as you can. (

b) [3 marks] Complete the function CosineProximity_p. It computes ∂E ∂~y for the entire batch. See the function’s documentation for more details.

(c) [4 marks] Create a 3-layer autoencoder neural network and train it on 10,000 digits from the MNIST dataset. Your network’s input should have 784 neurons, and its output layer should have 784 neurons that use the identity activation function.

The hidden layer should have only 50 logistic neurons. Use stochastic gradient descent to minimize the cosine proximity loss function for at least 20 epochs, and a learning rate of 1. Batch size should be between 30 and 130. You should use the supplied Network class.

(d) [2 marks] Show that your hidden layer successfully encodes the digits by encoding and reconstructing at least one sample of each digit class (0 through 9). c Jeff Orchard 2019 v1.2 (updated 12:06pm, March 19) Page CS 489/698 Neural Networks Assignment 4

2. Backprop Through Time [10 marks total]

The figure on the right shows an RNN. Note that s = Ux + W h + b h = σ(s) z = V h + c y = σ(z) Notice that we are using the mathematical convention of assuming vectors are column-vectors by default.

For the following questions, assume you are given a dataset that has many samples of sequences of inputs and output targets. Each sequence consist of inputs x i , for i = 1, . . . , τ , that produces a sequence of network outputs y i , which you wish to match to a corresponding sequence of targets, t i .

The cost function for such a sequence is, E(y 1 , . . . , yτ , t1 , . . . , tτ ) = Xτ i=1 C(y i , ti ) U V W x h y s z

(a) [3 marks] Show that the gradient of the cost with respect to the weights V can be written ∂E ∂V = Xτ i=1 ∂C y i , ti  ∂yi σ 0 (z i ) ! (h i ) T

(b) [2 marks] Suppose you have computed ∂E ∂hi for i = 1, . . . , τ . Show that ∂E ∂U = Xτ i=1  ∂E ∂hi σ 0 (s i )  (x i ) T

(c) [4 marks] Also, show that ∂E ∂W = Xτ−1 i=1  ∂E ∂hi+1 σ 0 (s i+1)  (h i ) T

(d) [1 mark] Finally, show that ∂E ∂b = Xτ i=1  ∂E ∂hi σ 0 (s i ) 

Page CS 489/698 Neural Networks Assignment 4

3. Recurrent Neural Network [14 marks total]

In this question, you will complete the Python implementation of backprop through time (BPTT) for a simple recurrent neural network (RNN). The notebook contains a definition for the class RNN. The class has a number of methods, including BPTT. However, BPTT is incomplete.

For training and testing, the notebook also reads in a corpus of text (a simplified version of On the Origin of Species by Charles Darwin), along with the character set, and creates about 5000 training samples. The notebook also creates a few utility functions that help convert between the various formats for the data.

(a) [8 marks] Implement the function BPTT so that it computes the gradients of the loss with respect to the connection weight matrices and the biases. Your code should work for different values of seq_length (this is the same as τ in the lecture notes).

(b) [2 marks] Create an instance of the RNN class. The hidden layer should have 400 ReLU neurons. The input to the network is a one-hot vector with 27 elements, one for each character in our character set. The output layer also has 27 neurons, with a softmax activation function.

(c) [2 marks] Train the RNN for about 15 epochs. Use categorical cross entropy as a loss function (see A2 Q2 for help with this). You can use a learning rate of 0.001, but might want to break the training into 5-epoch segments, reducing the learning rate for each segment. Whatever works.

(d) [2 marks] What fraction of the time does your RNN correctly guess the first letter that follows the input? Write a small bit of Python code that counts how many times the next character is correct, and express your answer as a percentage in a print statement.

What to submit

Your assignment submission should be a single jupyter notebook file, named (_a4.ipynb), where is your UW WatIAM login ID (not your student number). The notebook must include solutions to all the questions. Submit this file to Desire2Learn. You do not need to submit any of the modules supplied for the assignment.

Page 3 Autoencoders and RNNs