## Description

## Problem 1

Assume we collected a dataset D = {(x(i)

,t

(i)

)}i∈1..7 of N = 7 points (i.e., observations) with inputs {x(i)

}i∈1..7 = (1, 2, 3, 4, 5, 6, 7) and outputs {t

(i)

}i∈1..7 = (6, 4, 2, 1, 3, 6, 10)

for a regression problem with both scalar inputs and outputs.

1. (1 point) Draw a scatter plot of the dataset on a spreadsheet software (e.g., Excel).

2. (6 points) Let us use a linear regression model gw,b(x) = wx + b to model this data.

Write down the analytical expression of the least squares loss (covered in Video 6) of

this model on dataset D. Your loss should take the form of

1

2N

�

i∈1..N

Aiw2 + Bib2 + Ciwb + Diw + Eib + Fi

where Ai, Bi, Ci, Di, Ei, and Fi are expressed only as a function of x(i) and t

(i) or constants. Do not fill-in any numerical values yet.

3. (4 points) Derive the analytical expressions of w and b by minimizing the mean squared

loss from the previous question. Your expressions for parameters w and b should only

depend on A = �

i Ai, B = �

i Bi, C = �

i Ci, D = �

i Di and E = �

i Ei. Do not fill-in

any numerical values yet.

4. (2 points) Give approximate numerical values for w and b by plugging in numerical

values from the dataset D.

Introduction to ML – Fall 2023 Assignment 2 – Page 2 of 3 Sep 27

5. (0 points) Double-check your solution with the scatter plot from the question earlier:

e.g., you can use Excel to find numerical values of w and b. You do not need to hand

in anything here, this is just for you to verify you obtained the correct solution in the

previous questions.

## Problem 2

The goal of this problem is to revisit Problem 1, but solving it with a different

technique known as the method of least squares. This will serve as a “warm-up” to Problem

3. In the rest of this problem, any reference to a dataset refers to the dataset described in

Problem 1.

1. (1 point) Verify that one can rewrite the linear regression model gw,b(x) = wx+b in the

simpler form of

gw(�x) = �xw�

if one assumes each input �x is a two-dimensional row vector such that a point in our

dataset is now �x(i) = (x(i)

, 1) where x(i) is the scalar input described in Problem 1. Write

the components of the new column vector w� as a function of w and b from Problem 1.

2. (4 points) Derive analytically ∇w� �Xw� −�t�2 where X is a N × 2 matrix such that each

row of X is a vector �x(i) described in the previous question, and �t = {t

(i)

}i∈1..7.

3. (1 point) Conclude that the model’s weight value w� ∗ which minimizes the least squares

loss (covered in Video 6) must satisfy

2X⊤Xw� ∗ − 2X⊤�t = 0

4. (1 point) Assuming that X⊤X is invertible, derive analytically the value of w� ∗.

5. (0 points) Using numPy, implement the solution you found in the previous question and

verify that you obtain the same results for w and b than in Problem 1. You do not need

to hand in anything here, this is just a way for you to verify you obtained the correct

solution in the previous questions.

## Problem 3

Let us now assume that D is a dataset with d features per input and N > 0

inputs. We have D = {(( �

x(i)

j )j∈1..d,ti)}i∈1..N . In other words, each �x(i) is a column vector

with d components indexed by j such that x(i)

j is the jth component of �x(i). The output �t

(i)

remains a scalar (real value).

Let us assume for simplicity that we have a simplified linear regression model, as presented

in the Question 1 of Problem 2. We would like to train a regularized linear regression model,

where the mean squared loss is augmented with an ℓ2 regularization penalty 1

2 �w��2

2 on the

weight parameter w�:

ε(w�, D) = 1

2N

�

i∈1..N

(gw� ( �x(i)) − t

(i)

)

2 +

λ

2

�w��2

2

where λ > 0 is a hyperparameter that controls how much importance is given to the penalty.

Introduction to ML – Fall 2023 Assignment 2 – Page 3 of 3 Sep 27

1. (3 points) Let A = �

i∈1..N �x(i) �x(i)

⊤

. Give a simple analytical expression for the components of A.

2. (6 points) Let us write �b = �

i∈1..N t

(i) �x(i), prove that the following holds:

∇ε(w�, D) = 1

N

�

Aw� −�b

�

+ λw�

3. (2 points) Write down the matrix equation that w� ∗ should satisfy, where:

w� ∗ = arg min

w� ε(w�, D)

Your equation should only involve A,�b, λ, N, and w� ∗.

4. (3 points) Prove that all eigenvalues of A are non-negative.

5. (3 points) Demonstrate that matrix A + λNId is invertible by proving that none of its

eigenvalues are zero. Here, Id is the identity matrix of dimension d.

6. (2 points) Using the invertibility of matrix A+λNId, solve the equation stated in question 3 and deduce an analytical solution for w� ∗. You’ve obtained a linear regression

model regularized with an ℓ2 penalty.

∗

∗ ∗