Sale!

Homework Set Four ECE 271B solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (7 votes)

Problem 1. In this problem, we consider some minimization problems with inequality constraints.
a) Consider a vector x ∈ R
n
. Find the optimal solution to the problem
min
x
1
2
||x||2
subject toX
i
xi ≤ −3.
b) Prove the inequality
(x
T y)
2 ≤ (x
T Qx)(y
T Q−1y),
where Q is a positive definite symmetric matrix. (Hint: you may want to consider the problem
max
x
y
T x
subject to x
T Qx ≤ 1.)
Problem 2. In this problem, we study the topic of duality by considering some simple examples. In
all cases, the minimization problems are of the form
min
x∈X
f(x)
subject to g(x) ≤ 0.
We consider six problems, described by the table below.
Problem f(x) g(x) X
1 x1 − x2 x1 + x2 − 1 {(x1, x2)|x1 ≥ 0, x2 ≥ 0}
2 x x
2 R
3 −x x −
1
2
{0,1}
4 −x x −
1
2
[0,1]
5
1
2
(x
2
1 + x
2
2
) x1 − 1 R
2
6 |x1| + x2 x1 {(x1, x2)|x2 ≥ 0}
For each of the six problem answer the following.
1. Sketch the set of feasible solutions S = {g(x), f(x)|x ∈ X} in (g, f)-space.
2. Determine if there exists a Lagrange multiplier. If so, what is the set of Lagrange multipliers? If
not, why is there no multiplier?
3. State the dual problem and sketch a plot of q(µ) as a function of µ. Determine if there is a duality
gap.
1
Problem 3. In class, we have studied Vapnik’s SVM formulation, i.e. the search for the hyperplane
that solves the following optimization problem
min
w,ξ

1
2
||w||2 +
C
n
Xn
i=1
ξi
!
subject to
yi(< xi
, w > +b) ≥ 1 − ξi
, i = 1, . . . , n
ξi ≥ 0, i = 1, . . . , n.
One limitation of this formulation is that there is no intuition for what the parameter C means and it
can therefore be difficult to find good values for it in practice. In this problem, we consider a slightly
different, but more intuitive formulation, based on the solution of the following problem
min
w,ξ,ρ,b
1
2
||w||2 − νρ +
1
n
Xn
i=1
ξi
!
subject to
yi(< xi
, w > +b) ≥ ρ − ξi
, i = 1, . . . , n
ξi ≥ 0, i = 1, . . . , n
ρ ≥ 0.
a) For this case, determine the dual problem and the form of the resulting decision function.
b) Given the dual solution, how would you determine the values of b and ρ?
c) Define the fraction of margin errors as
ρ =
1
n
|{i| yig(xi) < ρ}|
and suppose that we solve the optimization problem on a dataset with the result that ρ > 0. Show that
1. ν is an upper bound on ρ;
2. ν is a lower bound on the fraction of vectors that are support vectors.
d) Show that, if the solution of the second problem leads to ρ > 0, then the first problem with C set a
priori to 1
ρ
leads to the same decision function.
2
Problem 4. In class, we have dealt mostly with classification problems. One alternative problem, that
is quite frequent in practice, happens when only data from one class is available (there are no negative
examples) but the goal is to detect outliers. This problem can be solved in at least two ways using
SVM related ideas. They both imply finding the region of support of the data by finding a minimal
boundary surface that includes the data from the class.
The first strategy is to find the plane that separates, with largest margin, the training points from
the origin. It consists of solving the following optimization problem
min
w,ξ,ρ
1
2
||w||2 +
1
νn
Xn
i=1
ξi − ρ
!
, ν ∈ (0, 1]
subject to
< Φ(xi), w > ≥ ρ − ξi
, i = 1, . . . , n
ξi ≥ 0, i = 1, . . . , n.
The second strategy consists of finding the smallest ball that encircles the datapoints. This can be
done by solving the following optimization problem
min
R,ξ,c
R
2 +
1
νn
Xn
i=1
ξi
,
!
, ν ∈ (0, 1]
subject to
||Φ(xi) − c||2 ≤ R
2 + ξi
, i = 1, . . . , n
ξi ≥ 0, i = 1, . . . , n.
a) Determine the dual problem associated with each of the strategies above as well as the resulting
decision function.
b) How would you recover the ρ parameter in the case of first strategy? Also, in this case, what happens
when ν approaches zero?
c) Determine the set of kernels that make the two strategies equivalent.
d) In the case of c), compare the decision function with a Parzen density estimate
f(x) = 1
n
Xn
i=1
φ(x − xi),
where φ is a non-negative function that integrates to one.
3 ECE 271B
Problem 5. In this problem, we will use the SVM to, once again, classify digits. In all questions,
you will use the training set contained in the directory MNISTtrain (60, 000 examples) and the test
set in the directory MNISTtest (10, 000). To reduce training time, we will only use 20, 000 training examples. To read the data, you should use the script readMNIST.m (use readDigits=20, 000 or
readDigits=10, 000 respectively, and offset=0). This returns two matrices. The first (imgs) is a matrix
with n ∈ {10000, 20000} rows, where each row is a 28 × 28 image of a digit, vectorized into a 784-
dimensional row vector. The second (labels) is a matrix with n ∈ {10000, 20000} rows, where each
row contains the class label for the corresponding image in imgs. Since there are 10 digit classes, we will
learn 10 binary classifiers. Each classifier classifies one class against all others. For example, classifier 1
assigns label Y = 1 to the images of class 1 and label Y = −1 to images of all other classes. Download
and instal the libsvm package to learn the SVM classifiers.
a) In this problem, we will learn linear SVMs. Using libsvm, learn three SVMs with values of the
regularization constant C ∈ {2, 4, 8}. For each classifier and digit, 1) report the test error, 2) report
the number of support vectors, and 3) plot the three support vectors of largest Lagrange multiplier on
each side of the boundary. For each classifier, report the overall classification error. Comment on the
results.
b) For each binary classifier, make a plot of cumulative distribution function (cdf) of the margins
yi(wT xi + b) of all training examples. Comment on the results.
c) Repeat a) and b) for an SVM with radial basis function kernel. Use the script grid.py, included in
the libsvm package, to cross-validate the values of C and γ, and then use the two parameters for the
classification. Compare the results with those of a) and b).
4