Name: CPE 695A/WS Homework 2 Solved
SKU: 31862
Price: 35.00 USD
Availability: InStock

Description

5/5 - (1 vote)

1. [6 points] Prove Bayes’ Theorem. Briefly explain why it is useful for machine learning problems, i.e.,
by converting posterior probability to likelihood and prior probability.
2. [10 points] In Lecture 3-1, we gave the normal equation (i.e., closed-form solution) for linear
regression using MSE as the cost function. Prove that the closed-form solution for Ridge Regression
is 𝒘 = (𝜆𝐼 + 𝑋
𝑇
∙ 𝑋)
−1
∙ 𝑋
𝑇
∙ 𝒚, where 𝐼 is the identity matrix, 𝑋 = (𝑥
(1)
, 𝑥
(2)
, … , 𝑥
(𝑚)
)
𝑇
is the input
data matrix, 𝑥
(𝑖) = (1, 𝑥1, 𝑥2, … , 𝑥𝑛) is the 𝑖th data sample, and 𝑦 = (𝑦
(1)
, 𝑦
(2)
, … , 𝑦
𝑚). Assume the
hypothesis function ℎ𝑤(𝑥) = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝑛𝑥𝑛 , and 𝑦
(𝑗)
is the measurement of
ℎ𝑤(𝑥) for the 𝑗th training sample. The cost function of the Ridge Regression is 𝐸(𝒘) = 𝑀𝑆𝐸(𝒘) +
𝜆
2
∑ 𝑤𝑖
𝑚 2
𝑖=1
. [Hint: please refer to the proof of the normal equation of linear regression. [ Note: Please
use the following rectified definition of MSE when you prove: 𝑀𝑆𝐸(𝑤) = ∑ (𝒘𝑇
∙ 𝒙
(𝒊) − y
(i)
)
𝑚
𝑖=1
2
. ] .
3. [10 points] Recall the multi-class Softmax Regression model on page 16 of Lecture 3-3. Assume we
have K different classes. The posterior probability is 𝑝̂𝑘 = 𝛿(𝑠𝑘
(𝑥))𝑘 =
exp (𝑠𝑘(𝑥))
∑ exp (𝑠𝑗
(𝑥))
𝐾
𝑗=1
for 𝑘 =
1, 2, … ,𝐾, where 𝑠𝑘
(𝑥) = 𝜃𝑘
𝑇
∙ 𝑥, and input 𝑥 is an n-dimension vector.
1) To learn this Softmax Regression model, how many parameters we need to estimate? What are
these parameters?
2) Consider the cross-entropy cost function 𝐽(𝛩) (see page 16 of Lecture 3-3) of 𝑚 training samples
{(𝑥𝑖
, 𝑦𝑖)}𝑖=1,2,…,𝑚. Derive the gradient of 𝐽(𝛩) regarding to 𝜃𝑘 as shown in page 17 of Lecture 3-3
Programming Problem:
4. [44 points] In this problem, we write a program to find the coefficients for a linear regression model
for the dataset provided (data2.txt). Assume a linear model: y = w0 + w1*x. You need to
1) Plot the data (i.e., x-axis for 1
st column, y-axis for 2
nd column),
and use Python to implement the following methods to find the coefficients:
2) Normal equation, and
3) Gradient Descent using batch AND stochastic modes respectively:
a) Determine an appropriate termination condition (e.g., when cost function is less than a
threshold, and/or after a given number of iterations).
b) Print the cost function vs. iterations for each mode; compare and discuss batch and
stochastic modes in terms of the accuracy and the speed of convergence.
c) Choose a best learning rate. For example, you can plot cost function vs. learning rate to
determine the best learning rate.
Please implement the algorithms by yoursef and do NOT use the fit() function of the library.

CPE 695A/WS Homework 2 Solved

Description

Related products

CPE/EE/AAI 695 Homework 1 Solved

CPE/EE 695 Homework 4 Solved

CPE/EE/AAI 695 Homework 4 Solved