Sale!

ECE-GY 6143 Homework 3 Solved

Original price was: $40.00.Current price is: $35.00. $29.75

Category:

Description

5/5 - (1 vote)

ECE-GY 6143 Homework 3

1. (10 points) Suppose we wish to learn a regularized least squares model:
L(w) = 1
2
Xn
i=1
(yi − hw, xii)
2 + λR(w)
where R(w) is a regularization function to be determined. Suggest good choices for R(w) if
the following criteria need to be achieved (there are no unique correct answers) and justify
your choice in a sentence or two:
a. All parameters w are free to be determined.
b. w should be sparse (i.e., only a few coefficients of w are nonzero).
c. The coefficients of w should be small in magnitude on average.
d. For most indices j, wj should be equal to wj−1.
e. w should have no negative-valued coefficients.

 

2. (10 points) The Boston Housing Dataset has been collected by the US Census Service and
consists of 14 urban quality-of-life variables, with the last one being the median house price for
a given town. Code for loading the dataset is provided at the end of this assignment. Implement
a linear regression model with ridge regression that predicts median house prices from the
other variables. Use 10-fold cross validation on 80-20 train-test splits and report the final R2
values that you discovered. (You may want to preprocess your data to the range [0, 1] in order
to get meaningful results.)

 

3. (10 points) In class, we discussed the lasso objective, where the regularizer was chosen to
be the `1-norm. Here, we will derive an analytical closed form expression for the minimizer
of a slightly simpler problem. Suppose x is a d-dimensional input and w is a d-dimensional
variable. Show that the minimizer of the loss function:
L(w) = 1
2
kx − wk
2
2 + λkwk1
1
is given by:
w

i =



xi − λ if xi > λ,
xi + λ if xi < −λ,
0 otherwise.

 

4. (20 points) In this problem, we will implement logistic regression trained with GD/SGD and
validate on synthetic training data.
a. Suppose that the data dimension d equals 2. Generate two clusters of data points with
100 points each (so that the total data size is n = 200), by sampling from Gaussian
distributions centered at (0.5, 0.5) and (−0.5, −0.5). Call the data points xi
, and label
them as yi = ±1 depending on which cluster they originated from. Choose the variance
of the Gaussian to be small enough so that the data points are sufficiently well separated.
Plot the data points on the 2D plane to confirm that this is the case.

b. (Derive your own GD routines; do not use sklearn functions here.)

Train a logistic
regression model that tries to minimize:
L(w) = −
Xn
i=1
yi
log 1
1 + e−hw,xii
+ (1 − yi) log e
−hw,xii
1 + e−hw,xii
using Gradient Descent (GD). Plot the decay of the training loss function as a function of
number of iterations.

 

c. Train the same logistic regression model, but this time using Stochastic Gradient Descent
(SGD). Demonstrate that SGD exhibits a slower rate of convergence than GD, but is faster
per-iteration, and does not suffer in terms of final quality. You may have to play around a
bit with the step-size parameters as well as mini-batch sizes to get reasonable answers.

 

d. Overlay the original plot of data points on the 2D data plane with the two (final) models
that you obtained above in parts b and c to visualize correctness of your implementation.

 

5. (optional) How much time (in hours) did you spend working on this assignment?
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
boston_dataset = load_boston()

boston = pd.DataFrame(boston_dataset.data,
columns=boston_dataset.feature_names)
boston[‘MEDV’] = boston_dataset.target
boston.head()
2