## Description

## Problem 1

The goal of this problem is to derive the parameters of a support vector

machine with a hard margin. We will consider the following toy dataset of 4 points: x(1) =

(1, 1)⊤, x(2) = (−1, −1)⊤, x(3) = (1, 0)⊤, and x(4) = (0, 1)⊤. x(1) is the only point from

the positive class, whereas x(2)

, x(3)

, and x(4) are from the negative class. Put another way,

y(1) = 1 and y(2) = y(3) = y(4) = −1.

1. (2 points) Plot the data and draw the maximum-margin hyperplane. By inspecting the

plot, give the equation of this hyperplane. You will use it to verify your answers at the

end of this problem.

2. (1 point) Which of the points are support vectors?

3. (1 point) Recall the optimization problem we defined for the SVM with a hard margin:

min w�,b

1

2

�w��2 s.t. ∀j(w� · x(j) + b)y(j) ≥ 1 (1)

Write down the Lagrangian L(w�, b, α�) corresponding to this problem, where α� is the

vector of dual variables.

4. (2 points) We can therefore rewrite Equation 1 as follows:

min w�,b

max

α� L(w�, b, α�)

Because Slater’s condition holds, this is equivalent to solving:

max

α� min w�,b

L(w�, b, α�) (2)

Given a fixed α�, solve the inner (minimization) problem. You should obtain two relationships taking the form of

A� = �

j

BjCjD�j

�

j

EjFj = G

5. (2 points) Simplify Equation 2 with the result of question 4. Show that it takes the

form of

max

α�

�

j

Aj − 1

2

�

j

�

k

BjCkDjEkF� · G�

Note that the letters used in this question are not related to the letters used in the

previous question.

6. (1 point) We now come back to our toy dataset. Because it has 4 trainings points, we

have j ∈ 1..4, i.e., four dual variables. For points that are not support vectors, recall

from class that the corresponding constraints in Equation 1 can be removed. Deduce the

value of one of the dual variable αj corresponding to one value of j (i.e., one constraint)

which we will write j∗ to not reveal the answer. Follow indexes introduced in the problem

statement above when describing the dataset.

7. (1 point) Deduce from the second expression found in question 4 (i.e.,

�

j EjFj = G),

a relationship between all remaining dual variables αj for j ∕= j∗

8. (3 points) Solving the optimization problem from question 5, and using question 7 in

addition, find values for all dual variables αj for j ∈ 1..4.

9. (1 point) Deduce the numerical values of the two components of w�.

10. (2 points) Through an analysis of constraints in Equation 1 which are tight, deduce the

numerical value of b. Hint: recall the role of support vectors here.

11. (0 points) Check that the hyperplane you found analytically is the same than the one

you found geometrically in the first question.

## Problem 2

– Binary linear classification vs. SVM For this problem, we will empirically compare the performance of a binary linear classifier to a SVM. Because the purpose

of this problem is to compare the performance of the two classifiers, you can use their implementation in sklearn. However, note that (a) you should understand how to implement

these classifiers from scratch and (b) you may be evaluated on this later in the course.

Consider the iris dataset included in sklearn. For all questions of this problem (except

the last question), we will only consider the first 100 entries of the dataset. With the help

of train test split from sklearn.model selection, split the dataset into a training set and a

test set.

For now, you can do so by setting the argument test size to 0.8 in train test split.

To ensure results below are comparable and reproducible, set the random state argument

of your train test split calls to 0: this will control the shuffling applied to the data before

applying the split.

1. (2 points) Implement a binary linear classifier on the first two dimensions (sepal length

and width) of the iris dataset and plot its decision boundary. (Hint: sklearn refers to

the binary linear classifier as a LogisticRegression, we will see why later in the course.)

2. (1 point) Report the accuracy of your binary linear classifier on both the training and

test sets.

3. (2 points) Implement a linear SVM classifier on the first two dimensions (sepal length

and width). Plot the decision boundary of the classifier and its margins.

4. (1 point) Circle the support vectors. Please justify how to identify them through the

duality theorem. (hint: KKT condition)

5. (1 point) Report the accuracy of your linear SVM classifier on both the training and

test sets.

6. (1 point) What is the value of the margin? Justify your answer.

7. (1 point) Which vector is orthogonal to the decision boundary?

8. (3 points) Split the iris dataset again in a training and test set, this time setting test size

to 0.4 when calling train test split. Train the SVM classifier again. Does the decision

boundary change? How about the test accuracy? Please justify why (hint: think about

the support vectors), and illustrate your argument with a new plot.

9. (1 point) Do the binary linear classifier and SVM have the same decision boundaries?

The comparison should be made for the SVM obtained with test size=0.8.

10. (3 points) Now consider all 150 entries in the iris dataset, and retrain the SVM. You

should find that the data points are not linearly separable. How can you deal with it?

Justify your answer and plot the decision boundary of your new proposed classifier. For

this question, use the SVM obtained with test size=0.4.

∗

∗ ∗