Sale!

EE 559 Homework 6 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (7 votes)

1. We perform Parzen Window density estimation, using trapezoidal window functions
given (in unnormalized form) in the figure below: (15 pts)
-h -h/2 0 h/2 h
1
φ(u)
u
Choose h = 1. Assume that we have the following data
Dω1 = {0, 2, 5}
Dω2 = {4, 7}
(a) Sketch or plot the Parzen window estimates of the pdfs p(x|ω1) and p(x|ω2).
Please label pertinent values on both axes.
(b) Estimate the prior probabilities based on frequency of occurrence of the prototypes
in each class.
(c) Use the estimates you have developed in above to find the decision boundaries and
regions for a Bayes minimum-error classifier based on Parzen windows. Only the
part of feature space where at least one density is nonzero need to be classified.
2. We perform k-nearest neighbor density estimation, using k = 2 . Assume that you are
given the following training set for a 2-class problem with one feature: (20 pts)
Dω1 = {2, 5}
Dω2 = {4, 7}
(a) Sketch or plot the k-nearest neighbors estimates of the pdfs p(x|ω1). Please label
pertinent values on both axes. Also give the density estimates algebraically, for
each region in feature space.
(b) Estimate the prior probabilities based on frequency of occurrence of the prototypes
in each class.
(c) Use the estimates you have developed in above to find the decision boundaries
and regions for a Bayes minimum-error classifier based on k-nearest neighbors.
1
Homework 6 EE 559,
(d) Derive a classifier based on using KNN as a discriminative technique that estimates p(ωi
|x) directly using nearest neighbors, and compare it to the classifier
you obtained in 2c. If there are ties, break them in favor of ω2.
3. Consider the following training data set:
x1 = [1, 0]T
, z1 = −1
x2 = [0, 1]T
, z2 = −1
x3 = [0, −1], z3 = −1
x4 = [−1, 0]T
, z4 = 1
x5 = [0, 2]T
, z5 = 1
x6 = [0, −2]T
, z6 = 1
x7 = [−2, 0]T
, z7 = 1
Use following nonlinear transformation of the input vector x = [x1, x2]
T
to the transformed vector u = [ϕ1(x), ϕ2(x)]T
: ϕ1(x) = x
2
2 − 2×1 + 3 and ϕ2(x) = x
2
1 − 2×2 − 3.
What is the equation of the optimal separating “hyperplane” in the u space? (15 pts)
4. Consider the following training data set : (25 pts)
x1 = [0, 0]T
, z1 = −1
x2 = [1, 0]T
, z2 = 1
x3 = [0, −1], z3 = 1
x4 = [−1, 0]T
, z4 = 1
Note that in the following, you need to use equations that describe w and give rise to
the dual optimization problem.
(a) Write down the dual optimization problem for training a Support Vector Machine
with this data set using the polynomial kernel function
κ(xi
, xj ) = (x
T
i xj + 1)2
(b) Solve the optimization problem and find the optimal λi
’s using results about
quadratic forms and check the results with Wolfram Alpha or any software package.
(c) Show that the equation of the decision boundary in a kernel SVM wTu + w0 = 0
can be represented as g(x) = PN
i=1 λiziκ(xi
, x) + w0.
(d) We learned that for vectors that do not violate the margin1
(i.e. zj (wTuj +w0)−
1 > 0), the Lagrange multiplier is zero, i.e. λj = 0. On the other hand, for
1For simplicity, consider Kernel SVM with hard margins, i.e. no slack variables.
2
Homework 6 EE 559, Instructor: Mohammad Reza Rajati
vectors on the margin (zj (wTuj + w0) − 1 = 0), λj 6= 0. Show that, consequently,
one can find a vector xj
for which λj 6= 0 and calculate w0 as w0 = 1/zj − PN
i=1 λiziκ(xi
, xj ).
(e) Sketch the decision boundary for this data set based on parts (4c) and (4d).
5. In the following figure, there are different SVMs with different decision boundaries. The
training data is labeled as zi ∈ {−1, 1}, represented as circles and squares respectively.
Support vectors are drawn in solid circles. Determine which of the scenarios described
below matches one of the 6 plots (note that one of the plots does not match any
scenario). Each scenario should be matched to a unique plot. Explain your reason for
matching each figure to each scenario. (10 pts)
(a) A soft-margin linear SVM with C = 0.02
(b) A soft-margin linear SVM with C = 20
(c) A hard-margin kernel SVM with κ(xi
, xj ) = x
T
i xj + (x
T
i xj )
2
(d) A hard-margin kernel SVM with κ(xi
, xj ) = exp(−5kxi − xjk
2
)
(e) A hard-margin kernel SVM with κ(xi
, xj ) = exp(−
1
5
kxi − xjk
2
)
6. Programming Part: Multi-class and Multi-Label Classification Using Support Vector Machines
(a) Download the Anuran Calls (MFCCs) Data Set from: https://archive.ics.
uci.edu/ml/datasets/Anuran+Calls+%28MFCCs). Choose 70% of the data randomly as the training set.
(b) Each instance has three labels: Families, Genus, and Species. Each of the labels
has multiple classes. We wish to solve a multi-class and multi-label problem.
One of the most important approaches to multi-class classification is to train a
classifier for each label. We first try this approach:
3
Homework 6 EE 559,
i. Research exact match and hamming score/ loss methods for evaluating multilabel classification and use them in evaluating the classifiers in this problem.
ii. Train a SVM for each of the labels, using Gaussian kernels and one versus all
classifiers. Determine the weight of the SVM penalty and the width of the
Gaussian Kernel using 10 fold cross validation.2 You are welcome to try to
solve the problem with both normalized3 and raw attributes and report the
results. (15 pts)
iii. Repeat 6(b)ii with L1-penalized SVMs.4 Remember to normalize the attributes. (10 pts)
iv. Repeat 6(b)iii by using SMOTE or any other method you know to remedy
class imbalance. Report your conclusions about the classifiers you trained.
(10 pts)
v. Extra Practice: Study the Classifier Chain method and apply it to the above
problem.
vi. Extra Practice: Research how confusion matrices, precision, recall, ROC,
and AUC are defined for multi-label classification and compute them for the
classifiers you trained in above.
2How to choose parameter ranges for SVMs? One can use wide ranges for the parameters and a fine
grid (e.g. 1000 points) for cross validation; however,this method may be computationally expensive. An
alternative way is to train the SVM with very large and very small parameters on the whole training data
and find very large and very small parameters for which the training accuracy is not below a threshold (e.g.,
70%). Then one can select a fixed number of parameters (e.g., 20) between those points for cross validation.
For the penalty parameter, usually one has to consider increments in log(λ). For example, if one found that
the accuracy of a support vector machine will not be below 70% for λ = 10−3 and λ = 106
, one has to choose
log(λ) ∈ {−3, −2, . . . , 4, 5, 6}. For the Gaussian Kernel parameter, one usually chooses linear increments,e.g.
σ ∈ {.1, .2, . . . , 2}. When both σ and λ are to be chosen using cross-validation, combinations of very small
and very large λ’s and σ’s that keep the accuracy above a threshold (e.g.70%) can be used to determine the
ranges for σ and λ. Please note that these are very rough rules of thumb, not general procedures.
3
It seems that this dataset is already normalized!
4The convention is to use L1 penalty with linear kernel.
4