Sale!

Assignment 5 CS 5370 Deep Learning for Vision/AI5100: Deep Learning/AI2100: Deep Learning solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (9 votes)

1 Theory (15 marks)
1. (2 + 1 + 1 + 1 = 5 marks) One of the biggest advantages of transformers over RNNs is “Parallelism”.
(a) Let t, l and n be defined as t = sequence length, l = number of layers and n = number of
neurons at each layer respectively. Contrast RNN model with transformer model in terms of
time complexity and space complexity, both at train time and test time. Express it in terms of
t, l and n.
(b) What happens to the performance of parallelism if n (number of neurons at each layer) is smaller
than the sequence length t?
(c) Self-attention layer looks across the tokens of a given input sequence. Isn’t this a bottleneck for
parallelism? Explain.
(d) Does the feed forward network and layer norm look across the tokens? Explain how they support
parallelism.
1
2. (2 + 2 = 4 marks) You learned in the attention lecture about q (query), v (value) and k (key)
vectors. Let us look at the case of single-head attention. Let us say q, k, v vectors are from R
d and
there are m value vectors and m key vectors.
Then attention vector and attention weights can be defined as:
z =
Xm
i=1
(viαi)
αi =
exp(k
T
i
q)
Pm
i=1 exp(k
T
i
q)
(a) Let us say z is evaluated to vj
for some j. What does this mean? Explain using query, key and
value vectors.
(b) Now, take an orthogonal set of key vectors {k1, · · · , km} where all key vectors are orthogonal,
that is ki ⊥ kj
for all i 6= j. Let ||ki
|| = 1 for all i. Let va, vb ∈ {v1, . . . , vm} be two of the
value vectors from a set of m arbitrary vectors. Express query vector q such that the output z
is roughly equal to the average of va and vb, that is, 1/2 * (va + vb).
3. (2 marks) The Variational Autoencoder represents the standard variational lower bound as the
combination of a reconstruction term and a regularization term. Starting with the variational lower
bound below, derive these two terms, specifying which is the reconstruction term and which is the
regularization term.
L(q) = Z
q(z|x)log(
p(x, z)
q(z|x)
)dz
4. (2 + 1 + 1 = 4 marks) You may know the minimax problem of generative adversarial networks
(GAN). Let us take a simpler example to understand how difficult is GAN minmax problem. Consider
a function f(p, q) = pq. What does minpmaxqf(p,q) evaluate to? (Clue: minmax problem minimizes
the maximum value possible)
Let us start with the point k and evaluate the function for n steps, by alternating gradient (update
first q and then p) with step size 1. Writing out the update step in terms of pt
, qt
, pt+1, qt+1 will be
helpful.
(a) As you iterate with each step, enter the values for (pt
, qt) in the table below. (Assume k = (1,1)
and n = 6)
q0 q1 q2 q3 q4 q5 q6
1
p0 p1 p2 p3 p4 p5 p6
1
(b) By using the above approach, is it possible to reach the optimal value? Why or why not?
(c) What is the equilibrium point?
2
2 Programming (35 marks)
• The programming questions are shared in “Assignment 5.zip”. Please follow the instructions in the
notebook. Turn-in the notebook via Google Classroom once you finish your work.
• Please check the corresponding Notebook for Marks breakdown details for each questions.
Question-1 : Image Captioning (15 marks)
Find out the caption generated by the model when an input image (”image3.jpg”) is being fed to
the encoder?
(a) Implementation of Encoder CNN module: 6 marks
(b) Implementation of Decoder LSTM module: 4 marks
(c) Input pre-processing, Loading Vocabulary and Parameters of Encoder and Decoder module in
order to perform inference with the given input image : 5 marks
Question-2 : VAE (13 marks)
Given an input image x and parameters of an Encoder and Decoder of a Variational AutoEncoder(VAE) , Compute the variational Lower bound of p(x), i.e. probability of x?
Hint: log pθ(x) ≥ Eqφ(z|x)
[log pθ(x|z)] − KL [qφ(z|x)||p(z)] Note: Use torch.exp() library function in
order to compute exponential.
(a) Define Encoder, Decoder and VAE module and load the given pretrained model weight : 5
marks
(b) Given an input image, Do the Forward pass through the encoder module, Apply Re-parameterization
trick to sample a latent using encoder output, and finally take the forward pass through the
decoder in order to reconstruct the input image from the latent : 5 marks
(c) Compute Variational Lower bound (ELBO): 3 marks
Question-3 : Transformer (7 marks)
Implement Custom Encoder layer using Multi-Head-Attention for Transformer network as given in
question 3 in the notebook and finally report the validation loss on the given setup.
(a) Implement the Custom Encoder layers of Transformer as asked in question: 4 marks
(b) Implement the “forward” function of Custom Encoder layers as asked in question: 3 marks
[Optional] Question-4 : Normalizing Flow (ungraded)
Compute “Transformed” Probability Density Function (pdf) of y, i.e. q1(y). Where, y is obtained
from z via an invertible transformation f as shown below:
Let z ∼ q0(z) where q0(z) = N (z; 0, I). Let f(z) be an invertible transformation given by
y = f(z) = z + uh(w
>z + b)
The pdf of y is given by q1(y)
Find out the mean and standard deviation of q1(y); given the values of q0(z) , f and all the parameters
required to compute f(z)?
(a) Implement functions required to compute the transformed distribution q1(y): 2 marks
(b) Define function to compute y = f(z) and |det(
df
dz )| as given in the question : 4 marks
(c) Finally, compute the Transformed Probability Density Function of y , i.e. q1(y) = q0/|det(
df
dz )|
as given in the question : 1 mark
3