Description
1. [10 points] In Module 2, we gave the normal equation (i.e., closed-form solution) for linear regression
using MSE as the cost function. Prove that the closed-form solution for Ridge Regression
is π = (ππΌ + π
π
β π)
β1
β π
π
β π, where πΌ is the identity matrix, π = (π₯
(1)
, π₯
(2)
, β¦ , π₯
(π)
)
π
is the input
data matrix, π₯
(π) = (1, π₯1, π₯2, β¦ , π₯π) is the π-th data sample, and π = (π¦
(1)
, π¦
(2)
, β¦ , π¦
π). Assume the
hypothesis function βπ€(π₯) = π€0 + π€1π₯1 + π€2π₯2 + β― + π€ππ₯π , and π¦
(π)
is the measurement of
βπ€(π₯) for the π-th training sample. The cost function of the Ridge Regression is πΈ(π) =
β (ππ
β π
(π) β π¦
(π)
)
π
π=1
2 + π β π€π
π 2
π=1
.
2. [10 points] Assume we have K different classes in a multi-class Softmax Regression model. The
posterior probability is πΜπ = πΏ(π π
(π₯))π =
exp (π π(π₯))
β exp (π π
(π₯))
πΎ
π=1
for π = 1, 2, β¦ ,πΎ, where π π
(π₯) = ππ
π
β π₯,
input π₯ is an n-dimension vector, and K the total number of classes.
1) To learn this Softmax Regression model, how many parameters we need to estimate? What are
these parameters?
2) Consider the cross-entropy cost function π½(π©) of π training samples {(π₯π
, π¦π)}π=1,2,β¦,π as below.
Derive the gradient of π½(π©) regarding to ππ.
π½(π©) = β
1
π
ββπ¦π
(π)
log (πΜπ
(π)
)
πΎ
π=1
π
π=1
where π¦π
(π) = 1 if the ith instance belongs to class k; 0 otherwise.
3. [44 points] Write a program to find the coefficients for a linear regression model for the dataset
provided (data2.txt). Assume a linear model: y = w0 + w1*x. You need to
1) Plot the data (i.e., x-axis for the 1st column, y-axis for the 2nd column),
and use Python to implement the following methods to find the coefficients:
2) Normal equation, and
3) Gradient Descent using batch AND stochastic modes respectively:
a) Split dataset into 80% for training and 20% for testing.
b) Plot MSE vs. iteration of each mode for both training set and testing set; compare
batch and stochastic modes (with discussion) in terms of accuracy (of testing set) and
speed of convergence (You need to determine an appropriate termination condition,
e.g., when cost function is less than a threshold, and/or after a given number of
iterations.)
c) Plot MSE of the testing set vs. learning rate (using 0.001, 0.002, 0.003, 0.004, 0.005,
0.006, 0.007, 0.008, 0.009, 0.01) and determine the best learning rate.
Please implement the algorithms by yourself and do NOT use the fit() function of the library.




