Description

5/5 - (5 votes)

1
Problem 1: This is Bayes Rule! (15 marks)
Given a bunch of observations x1, . . . , xN drawn i.i.d. from an observation/likelihood model p(x|θ), and
assuming a prior distribution p(θ) on the model parameters θ, prove that solving the following problem is
equivalent to the Bayes rule for finding the posterior distribution of θ:
arg min
q(θ)
−
X
N
n=1
Z
q(θ) log p(xn|θ)dθ
+ KL(q(θ)||p(θ))
Also give an brief, intuitive explanation of the above objective function (maximum 1-2 sentences).
Problem 2: Mean-Field VI for Sparse Bayesian Linear Regression (30 marks)
Assume N observations {xn, yn)}
N
n=1 generated from a regression model yn ∼ N (yn|w>xn, β−1
). Further assume a Gaussian prior on w with different component-wise precisions, i.e., p(w) = N (w|0, diag(α
−1
1
, . . . , α−1
D )).
Also assume gamma priors on the noise precision β and prior’s precisions {αd}
D
d=1, i.e., β ∼ Gamma(β|a0, b0)
and αd ∼ Gamma(αd|e0, f0), ∀d. We will use the following parametrization of the gamma: Gamma(η|τ1, τ2) =
τ
τ1
2
Γ(τ1)
η
τ1−1
exp(−τ2η).
Derive the mean-field variational inference algorithm for approximating the posterior distribution
q(w, β, α1, . . . , αD) = q(w)q(β)q(α1). . . q(αD) ≈ p(w, β, α1, . . . , αD|y, X)
You may use the “recognition” method for inferring each q distribution, or explicitly write down the ELBO for
this model and take derivatives w.r.t. the variational parameters of each q distribution to estimate these.
Problem 3: Gibbs Sampling (20 marks)
Suppose we are given a bunch of count-valued observations x1, . . . , xN , assumed generated from the following
hierarchical model: p(xn|λn) = Poisson(xn|λn), p(λn|α, β) = Gamma(λn|α, β), n = 1, . . . , N, p(α|a, b) =
Gamma(α|a, b), and p(β|c, d) = Gamma(β|c, d). Assume a, b, c, d to be fixed.
We would like to do Gibbs sampling for this model. To do so, derive the conditional posterior (CP) of each
variable λ1, . . . , λN , α, and β, given all the other variables. Are all CPs available in closed form?
Problem 4: Using Samples for Prediction (10 marks)
Consider a matrix factorization model for a partially observed N × M matrix R, where p(rij |ui
, vj ) =
N (rij |u
>
i vj , β−1
), and ui and vj denote the latent factors of i-th row and j-th column or R, respectively. The
posterior predictive distribution of each rij is defined as p(rij |R) = R
p(rij |ui
, vj )p(ui
, vj |R)duidvj , which
is in general intractable. Suppose we are given a set of S samples {U(s)
, V(s)}
S
s=1 generated by a Gibbs sampler
for this matrix factorization model, where U(s) = {u
(s)
i
}
N
i=1 and V(s) = {v
(s)
j
}M
j=1.
Given these samples, derive the expressions for the sample based approximation of the mean (expectation) as
well as the variance of any entry rij of the matrix R.
Hint: Note that we can write each rij as u
>
i vj + ij where ij ∼ N (ij |0, β−1
).
Problem 5: Rejection Sampling (25 marks)
Consider a distribution p(x) ∝ exp(sin(x)) for −π ≤ x ≤ π. Denote exp(sin(x)) as p˜(x). Suppose we want to
use Rejection Sampling to sample from p(x) and use a proposal distribution q(x) = N (x|0, σ2
). Find out the
expression for the optimal value of the constant M such that Mq(z) ≥ p˜(x), as required in Rejection Sampling.
Using this value of M and some suitably chosen σ
2
, draw 10,000 samples from p(x) and plot the resulting
histogram of the samples. Submit your code in form of a Python notebook.
2 CS698X

SOLVED: CS698X Homework 3

Download Details:

Description

SOLVED: CS698X Homework 3

Download Details:

Description

Related products

SOLVED: CS698X Homework 1

SOLVED: CS698X Homework 2