Sale!

STAT 4001 Data Mining and Statistical Learning Homework 2 solved

Original price was: $35.00.Current price is: $30.00.

Download Details:

  • Name: Homework-2-a7nbff.zip
  • Type: zip
  • Size: 284.30 KB

Category:

Description

5/5 - (9 votes)

1. (15 marks) Ridge regression v.s. Least squares Given data (yi , xi)i=1,…,n, yi = β0+β1xi+i , where i i.i.d ∼ N(0, σ2 ), (xi)i=1,…,n is known and fixed. Least square estimate (βˆLS 0 , βˆLS 1 ) = argmin β0,β1 Pn i=1(yi − β0 − β1xi) 2 . Ridge regression (βˆRidge 0 , βˆRidge 1 ) = argmin β0,β1 Pn i=1(yi − β0 − β1xi) 2 + λβ2 1 . (a) Show that the least square estimate is unbiased by showing E(βˆLS 0 ) = β0 and E(βˆLS 1 ) = β1. (b) Show that the ridge regression estimate is biased by calculating E(βˆRidge 0 ) and E(βˆRidge 1 ). Hint: You may directly use some derivations in the lecture. 2. (20 marks) Invariant of linear regression to scaling, but not ridge regression without standardization (a) Consider the following data set: y = (2.2, 3.3, 3.8), x = (1, 2, 3). Fit y = β0 + β1x. 1 i. Calculate least square parameter estimates βˆLS 0 , βˆLS 1 and ridge regression parameter estimates βˆRidge 0 , βˆRidge 1 with λ = 1. ii. Calculate ˆy LS and ˆy Ridge for x. (b) Consider the data set in (a) with x 0 = 10x, i.e. x 0 = (10, 20, 30). i. Calculate least square parameter estimates βˆL 0 , βˆL 1 and ridge regression parameter estimates βˆR 0 , βˆR 0 with λ = 1. ii. Compare βˆR 0 and βˆR 1 in 2b(i) with βˆRidge 0 and βˆRidge 1 10 in 2a(i) which are without scaling, also compare βˆL 0 and βˆL 1 2b(i) with βˆLS 0 and βˆLS 1 10 in 2a(i) for least square. iii. Calculate ˆy L and ˆy R for x 0 , and compare with 2a(ii). (Note: You will see that scaling x will have an effect on yˆ for ridge regression, but not in least square) Hint: You may directly use some derivations in the lecture. 3. (15 marks) Cyclic coordinate descent for LASSO Given f(βj ) = aβ2 j − 2bβj + λ|βj |, where a > 0, λ > 0. Show that when b < − λ 2 < 0, βˆ j = 2b+λ 2a minimizes f(βj ). 4. (35 marks) Variance and bias for Linear regression v.s. Ridge regression Fit the data with model yi = β0 + β1xi + i , where i i.i.d ∼ N(0, σ2 ). Least square parameter estimtates: βˆLS 0 = ¯y − βˆLS 1 x¯, and βˆLS 1 = Pn i=1 P (xi−x¯)yi n i=1(xi−x¯) 2 Ridge regression parameter estimtates: βˆRidge 0 = ¯y−βˆRidge 1 x¯, and βˆRidge 1 = Pn i=1 P (xi−x¯)yi n i=1(xi−x¯) 2+λ For a new point x0, calculate bias and variance for (a) Linear regression (b) Ridge regression Where bias2 = [β0 + β1×0 − E(ˆy0)]2 and variance = E[ˆy0 − E(ˆy0)]2 . (Note: You will see that compared with linear regression, the bias2 for ridge is larger, but the variance is smaller. At high dimensional setting, ridge regression (and lasso) will be better because of the smaller variance.) Hint: You may directly use some derivations in the lecture. V ar(A + B) = V ar(A) + V ar(B) + 2cov(A, B) V ar(αA) = α 2V ar(A), where α is a scalar. 5. (15 marks) R code exercise 2 (a) Use the rnorm() function to generate a predictor X of length n = 100, µX = 0, σX = 1, as well as a noise vector  of length n = 100, µ = 0, σ = 0.1. (b) Generate a response vector Y of length n = 100 according to the model Y = 1 + X + X2 + X3 + . (c) Fit a lasso model to the simulated data, using X, X2 , …, X10 as predictors. Use cross-validation to select the optimal value of λ. Create plots of the crossvalidation error (i.e. Mean-Square error v.s. log(λ)) as a function of λ. Report the resulting coefficient estimates. (d) Now re-generate a response vector Y according to the new model Y = 1 + X7 +. Again, re-fit a lasso model using X, X2 , …, X10 as predictors. Use cross-validation to select the optimal value of λ. Create plots of the cross-validation error (i.e. Mean-Square error v.s. log(λ)) as a function of λ. Report the resulting coefficient estimates. (Note: You will see that when the true data-generating model is sparser, cross-validation tends to select a sparser model.) Hint: You may refer to the tutorial notes ‘Tutorial05’. – End – 3