Sale!

ECE M146 Homework 4 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (8 votes)

1. In this section, you will use the k-NN classifier to predict whether a person survives or
not on the Titanic. You will be using the same data set provided in HW3.
Unlike what we did in the last homework, we are going to build the k-NN classifier from
the first principle. You may not use fitcknn (sklearn.neighbors.KNeighborsClassifier
for python) in this problem as you will get incorrect answer by using those built-in
functions.
The k-NN classifier classifies a data point with feature xtest based on a training set by
performing the following procedures:
• Compute the distance from xtest to the feature of all training points. We will use
the Euclidean distance in this problem.
• Find the k nearest neighbors of this point.
• Classify this points as the majority class of its k nearest neighbors.
We use the following two rules to handle ties:
(a) Let dk be the distance of the k-th nearest neighbor of xtest. If there are multiple
training points that have distance dk from xtest. Choose those points with the
smallest indexes to be included in the k nearest neighbors. For example, let
k = 3, if there is x9 that is distance 1 away from xtest; x1, x3 and x4 that is
distance 2 away from xtest, then the 3 nearest neighbor of xtest are x1, x3 and x9.
Note that dk = 2 in this example.
(b) For even k, among all k nearest neighbors of a data point, if the number of points
from class 0 is the same as the number of points from class 0, classify this data
point as ytie deterministically.
Answer the following questions:
(a) With ytie = 1, implement the k-NN algorithm. Find and plot the training and
testing error for k = 1, 2, · · · , 15.
(b) With ytie = 0, implement the k-NN algorithm. Find and plot the training and
testing error for k = 1, 2, · · · , 15.
1 ECE M146
(c) Comment on the performance of the k-NN classifier in (a) and (b). How does
larger k affect the training and testing error? How does even or odd k affect
the performance of the k-NN classifier in (a) and (b), respectively? Are they
contradictory to each other? Explain why. Hint: you may use your result from
HW3 Q6 (a) to get some intuition.
2 ECE M146
2. In this section, you will consider a k-nearest neighbor classifier using Euclidean distance
metric on a binary classification task. We assign the class of the test point to be the
class of the majority of the k nearest neighbors.
Consider the following two datasets:
Table 1: KNN Example 1
Table 2: KNN Example 2
(a) For k ∈ {1, 3, 5, 7}, what values of k minimize leave-one-out cross-validation error
for each dataset? What is the resulting validation error?
(b) In general, what are the drawbacks of using too big a k? What about too small
a k? To see this, calculate the leave-one-out cross-validation error for the dataset
in Figure 1 using k = 1 and k = 13.
3 ECE M146
3. In this problem, we will derive the least square solution for multi-class classification.
Consider a general classification problem with K classes, with a 1-of-K binary encoding
scheme (defined latter) for the target vector t, t ∈ R
K. Suppose we are given a training
data set {xn, tn}, n = 1, · · · , n where xn ∈ R
D. For the 1-of-K binary encoding scheme,
tn has the k-th element being 1 and all other elements being 0 if the n-th data is in
class k. We can use the following linear model to describe each class:
yk(x) = w
T
k x + wk0,
where k = 1, · · · , K. We can conveniently group these together using vector notation
so that
y(x) = W˜ T x, ˜
where W˜ is a matrix whose k-th column comprises the D + 1-dimensional vector ˜w =
[wk0, wT
k
]
T and ˜x is the corresponding augmented input vector [1, xT
]
T
. For each new
input with feature x, we assign it to the class for which the output yk = ˜w
T
k x˜ is largest.
Define a matrix T whose n-th row is the vector t
T
n
and together a matrix X˜ whose n-th
row is ˜x
T
n
, the sum-of-squares error function can be written as
J(W˜ ) = 1
2
T r n
(X˜W˜ − T)
T
(X˜W˜ − T)
o
.
Find the closed form solution of W˜ that minimizes the objective function J(W˜ ). Hint:
You many use the following two matrix derivative about trace, ∂
∂Z T r(AZ) = AT and

∂Z T r(Z
TAZ) = (AT + A)Z.
4 ECE M146
4. You are given the following data set which is comprised of x
(i) ∈ R
2 and y
(i) ∈ {−1, 1}.
i x
(i)
1 x
(i)
2
y
(i)
1 -3 9 1
2 -2.5 6.25 1
3 3 9 1
4 -1.5 2.25 -1
5 0 0 -1
6 1 1 -1
(a) Plot the data. Is the data linearly separable?
(b) Look at the data and circle the support vectors by inspection. Find and plot the
maximum margin separating hyperplane.
(c) Find the αi
, w and b in
h(x) = sign X
n∈S
αny
(n)x
T x
(n) + b
!
= sign
w
T x + b

,
where S is the index set of all support vectors. Do this by solving the dual form
of the quadratic program. How is w and b related to your solution in part (b)?

5. In this exercise, we will use MATLAB to solve both the primal and the dual problem of
SVM. In Data.csv, the first two columns contain feature vectors x
(i) ∈ R
2 and the last
column contains the label y
(i) ∈ {−1, 1}. We will use CVX as the optimization solver
in this problem. For help with CVX, refer to the CVX Users’ Guide. Attach your code
for submission. For Python user, feel free to use the following libraries: math, csv,
numpy, matplotlib and cvxpy.
(a) Visulization Use different color to plot data with different labels in the 2-D
feature space. Is the data linearly separable?
(b) The Primal Problem Use CVX to solve the primal problem of this form:
min
w,b
1
2
kwk
2
s.t. y(i)
(w
T x
(i) + b) ≥ 1, i = 1, · · · , m
Report w and b. Plot the hyperplane defined by w and b.
(c) The Dual Problem Use CVX to solve the dual problem of this form:
max
α
W(α) = Xm
i=1
αi −
1
2
Xm
i,j=1
y
(i)
y
(j)
aiaj hx
(i)
, x(j)
i
s.t. 0 ≤ αi
, i = 1, · · · , m
Xm
i=1
αiy
(i) = 0.
Use the resulting a to identify the support vectors on the plot. Report you nonzero a
0
i
s. How many support vectors do you have? Circle those support vectors.
Note: The latter part of W(a) is in quadratic form, i.e., a
TP a. To use CVX,
first find P and then use quad form(a,P). For Python user, you will need to add
a small number to the diagonal of P matrix to make cvxpy work. i.e. Run the
following code before using cvxpy: “P += 1e-13 * numpy.eye(31)”, where 31 is
the total number of data. Also, assume it is 0 if a number is less than 1e-9.
6 ECE M146