Sale!

Homework Set One ECE 271B solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (4 votes)

Probem 1. Consider the hyperplane
wT x + b = 0
of known parameters w and b.
a) Use Lagrangian optimization to find the point x0 in the plane closest to the origin, as a function of
w and b.
b) Use the result of a) to derive a geometric interpretation for the parameters w and b, when ||w|| = 1.
Problem 2. The entropy of probability density function (pdf) p(x) is defined as
h(x) = −
Z
p(x) log p(x)dx.
a) Use Lagrangian optimization to find the pdf of maximal entropy among those whose expected value
is µ and variance is σ
2
.
b) Find the values of the Lagrange multipliers associated with the optimal solution.
Problem 3. Extensive experimental evidence in the area of image compression has shown that the
Discrete Cosine Transform (DCT) of image patches is a very good approximation to their PCA. It is
also well known that all but one of the DCT coefficients (features) have zero mean, and only one has
non-zero mean. The latter is the so-called DC coefficient because it results from projecting the image
patch into the vector 1 = (1, 1, . . . , 1)T and, therefore, is proportional to the average (DC) value of the
patch. In this problem, we are going to explore the connection between the DCT and PCA to explain
this fact. For this, we are going to assume the following.
• An image patch is a collection of random variables X = {X1, . . . , X64} which are identically
distributed
PXi
(x) = f(x), ∀i ∈ {1, . . . , 64},
where f(x) is a common probability density function.
• The pixels in the image patch are correlated, according to the correlation coefficient
ρij =
E[XiXj ]
p
E[X2
i
]
q
E[X2
j
]
.
This obviously implies that we do not have an iid sample.
1
a) Consider the PCA of X. Show that it is not affected by a change of variables of the type Z = X−µx,
where µx = E[X].
b) Given a), we can assume that X has zero mean. We will do so for the remainder of the problem.
Show that in the extreme of highly correlated pixel values, i.e. when
ρij → 1, ∀i,j ,
the vector 1 is the largest principal component. Since neighboring image pixels do tend to be highly
correlated, this helps explain why the DC coefficient is always present.
c) Let Φ be the matrix whose columns φi are the principal components and consider the set of coefficients (features) Z resulting from the projection of an arbitrary image patch X into these components,
i.e.
z = ΦT x.
Consider the DCT coefficients zi = φ
T
i x, noting that z1 = 1
T x is the DC coefficient. Show that
E[zi
] = 0, ∀i > 1,
i.e. that the remaining (AC) coefficients have zero mean.
Problem 4. Consider a classification problem with two Gaussian classes
PX|Y (x|i) = G(x, µi
, Σi), i ∈ {1, 2}
and class probabilities
PY (1) = PY (2) = 1/2.
a) The random variable X is not Gaussian, but we can still compute its mean and covariance. Show
that
µx = E[X] = 1
2
[µ1 + µ2]
and
Σx = E[(X − µx)(X − µx)
T
] = 1
2
[Σ1 + Σ2] + 1
4
(µ1 − µ2)(µ1 − µ2)
T
.
b) In the remainder of this problem, we consider the 2-dimensional case with
µ1 = −µ2 = µ =

α
0

,
where α > 0, and
Σ1 = Σ2 = Γ =

1 0
0 σ
2

.
Using MATLAB, sample 1,000 points from the two Gaussian classes and make a plot of the sampled
points for each of the following conditions:
• condition A: α = 10, σ2 = 2;
• condition B: α = 2, σ2 = 10.
2
c) Use MATLAB to perform a principal component analysis for the random variable X. Determine the
direction of the best 1-dimensional subspace, i.e. the transformation
z = φ
T x,
where φ is the principal component of largest variance. Hand in a plot containing both the datapoints
and this principal component for each of the conditions of b).
d) Repeat c), but now for the 1-dimensional subspace spanned by the linear discriminant φ
0 produced
by LDA.
e) From the results above, is the PCA approach to dimensionality reduction always a good one from
the classification point of view? How does it compare to LDA?
f) We now pursue a theoretical explanation for the observations above. Using the results of a), derive
the principal component φ and the linear discriminant φ
0 as functions of the parameters α and σ
2
.
From a classification point of view, under what conditions is it 1) better to use PCA, 2) better to use
LDA, or 3) identical to use any of the two?
Problem 5. In this problem, we consider a dataset with faces from 6 people. These were divided into
two sets. A training set is provided in the file trainset.zip and a test set in the file testset.zip.
The original colored images of 150 × 150 pixels were converted to 50 × 50 grayscale pixels, resulting in
a vector space of 2, 500 dimensions.
a) Using the training data, compute the PCA of the data and make a 4×4 plot showing the 16 principal
components of largest variance.
b) Consider the 15 different pairs of classes that can be derived from the 6 face classes. Perform an
LDA between the classes in each pair. Make a 4 × 4 plot containing the 15 linear discriminants (plus
one empty plot). (Note: You will likely realize that the within scatter matrix is singular. To address
this, use an RDA of regularization parameter γ = 1 instead of the LDA.)
c) For each image x in the training dataset, compute the projection vector
z = Φx,
where the rows of Φ are the 15 principal components of largest variance of the face training data. Note
that this produces a datasets {zi} of 15 dimensional vectors. Use the PCA projection vectors zi of the
training images to learn a mean µi and variance Σi per class. Use these to implement the Gaussian
classifier.
Compute the PCA projections of the test images and use the Gaussian classifier above to classify them.
Compute the average probability of classification error for each face class and the average error accross
all classes.
d) Repeat c) using LDA projections zi
, instead of PCA projections. (Note: You will likely realize that
the within scatter matrix is singular. To address this, use an RDA of regularization parameter γ = 1
instead of the LDA.)
e) A popular alternative to RDA, when LDA requires the inversion of singular scatter matrix, consists of
3
first applying a PCA to an intermediate dimension k and then apply LDA to the resulting k-dimensional
vectors. This approach is denoted as PCA+LDA. Repeat d) using a PCA+LDA with k = 30.
Note: For a random variable X with multiple Gaussian classes of mean µi and covariance Σi
, the
Gaussian classifier has decision rule
i
∗ = arg min
i
(x − µi)
T Σ
−1
i
(x − µi) + log |Σi
|.
Given a training set {x1, . . . , xn} for class i, the class mean and variance are estimated with
µi =
1
n
Xn
k=1
xk
Σi =
1
n
Xn
k=1
(xk − µi)(xk − µi)
T
.