Sale!

CS 446 / ECE 449 Homework 2 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (6 votes)

Version History.
1. Initial version.
1
1. Support Vector Machines.
Recall that the dual problem of an SVM is
max
α∈C
X
N
i=1
αi −
1
2
X
N
i,j=1
αiαjyiyjκ(x
(i)
, x
(j)
),
where the domain C = [0, ∞]
N = {α : αi ≥ 0} for a hard-margin SVM and C = [0, C]
N = {α : 0 ≤ αi ≤ C}
for a soft-margin SVM.
Equivalently, we can frame the dual problem of SVM as the minimization problem
min
α∈C
f(α) :=
1
2
X
N
i,j=1
αiαjy
(i)
y
(j)κ(x
(i)
, x
(j)
) −
X
N
i=1
αi
.
(a) Derive the dual problem and prove that the domain is C = [0, C]
N = {α : 0 ≤ αi ≤ C} for a
soft-margin SVM. We assume that the optimization objective for this soft-margin SVM is
min
w,ξ
1
2
kwk
2
2 + C
X
N
i=1
ξi s.t. 1 − ξi ≤ y
(i)wT φ(x
(i)
), ξi ≥ 0, ∀i ∈ {1, 2, …, N}
Hint: A sketch proof is briefly mentioned in Lecture 6 slides. The purpose of this problem is to
ensure that you comprehend the Lagrangian and are familiar with the basic notations.
(b) Prove some theorems related to the margins of SVM, which connects different variables in SVM. In
the questions below, we assume that the data are linearly separable and all the parameters subject to
the same conditions as in the dual problem:
max
α∈C
X
N
i=1
αi −
1
2
X
N
i,j=1
αiαjy
(i)
y
(j)κ(x
(i)
, x
(j)
),
i. Denote the margin for hard-margin SVM as ρ, and denote the optimal primal solution as w∗
,
prove that
1
ρ
2
=kw∗
k
2
2
Hint: Think about the definition of margin and its expression. Also think about the complementary slackness.
ii. Prove the following equation under the same conditions of the previous question.
1
ρ
2
=
Xn
i=1
αi
Hint: Use the conclusion from the previous question and kw∗k
2
2 = w∗>w∗
. Also recall what is
support vectors and complementary slackness.
(c) Prove some conclusions about kernel methods.
i. For arbitrary 2D vectors x = [x0, x1]
T and z = [z0, z1]
T
, we define a kernel κ(x, z) = (1 + x
T z)
2
.
Derive the equation of φ(x) and φ(z). To keep your answer consistent with the standard solution,
please note that φ(x) and φ(z) are both vectors of monomials.
ii. Show that the dual objective for the RBF kernel SVM is given by:
h(α) = −
1
2
α
>Aα + 1
>α,
2
where 1 is a vector of ones and A ∈ R
N×N whose (i, j)-th entry is given by
Aij = y
(i)
y
(j)
exp

−

x
(i) − x
(j)

2
2

2

 .
Solution.
3 CS 446 / ECE 449
2. Implementing Support Vector Machine
(a) Recall the dual problem of SVM in the previous problem and the domain C = [0, ∞]
N = {α : αi ≥ 0}
for a hard-margin SVM and C = [0, C]
N = {α : 0 ≤ αi ≤ C} for a soft-margin SVM. We can solve
this dual problem by projected gradient descent, which starts from some α0 ∈ C (e.g., 0) and updates
as follows:
αt+1 = ΠC

αt − η∇f(αt)

.
Here ΠC[α] is the projection of α onto C, defined as the closest point to α in C:
ΠC[α] := arg min
α0∈C

0 − αk2.
If C is convex, the projection is uniquely defined. With such information, prove that

Π[0,∞)n [α]

i
= max{αi
, 0},

Π[0,C]n [α]

i
= min{max{0, αi}, C}.
Note: Include this question in the written submission.
Hint: Show that the i’th component of any α0 ∈ C is further from the i’th component of α than
the i’th component of the projection is. Specifically, show that

α
0
i − αi

max {0, αi} − αi

for
α0 ∈ [0, ∞)
n and that

α
0
i − αi

min 
max {0, αi} , C
− αi

for α0 ∈ [0, C]
n.
(b) Implement an svm solver(), using projected gradient descent formulated as above. Initialize your α
to zeros. See the docstrings in hw2.py for details.
Remark: Consider using the .backward() function in pytorch. However, then you may have to use
in-place operations like clamp (), otherwise the gradient information is destroyed.
Library routines: torch.outer, torch.clamp, torch.autograd.backward, torch.tensor(…,
requires grad=True), with torch.no grad():, torch.tensor.grad.zero , torch.tensor.detach.
(c) Implement an svm predictor(), using an optimal dual solution, the training set, and an input. See
the docstrings in hw2.py for details.
(d) On the area [−5, 5] × [−5, 5], plot the contour lines of the following kernel SVMs, trained on the
XOR data. Different kernels and the XOR data are provided in hw2 utils.py. Learning rate 0.1 and
10000 steps should be enough. To draw the contour lines, you can use hw2 utils.svm contour().
• The polynomial kernel with degree 2.
• The RBF kernel with σ = 1.
• The RBF kernel with σ = 2.
• The RBF kernel with σ = 4.
Include these four plots in your written submission.
Solution.
4 CS 446 / ECE 449