Sale!

STAT 239 Homework 4 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (3 votes)

Data Analysis 1. Bagging Linear Predictors
The purpose of this exercise is to apply bagging to linear predictors.
Let us recall the forward stepwise procedure. Consider training data {(xi, yi)}n
1 ,
where x = (x1,…,xp) and let bi(x) be a linear predictor of x that depends only
on i of the p predictors. It is constructed as follows. If the model bi1(x) is the
linear predictor given by
bi1(x) = a0 + a1xj1 + … + ai1xji1
the model bi(x) is obtained by adding to bi1(x) the best variable chosen among
the variables x /2 {xj1 ,…,xji1 }. The best variable is the variable xk such
that the RSS of the linear regression of y vs {xj1 ,…,xji1 , xk} is the least.
[You can use a criterion other than RSS, if you wish, as we discussed when
we considered model selection procedures in linear regression]. Let {b1,…,bp}
be the sequence of models built according to the procedure (forward stepwise
selection). Generally one then selects the best of these p models as the one model
that is thought of as fitting the data best. This procedure (and similar variable
selection methods such as best subset selection, backward stepwise selection)
are very unstable since variables are competing to enter the models, and small
changes in the training data will result in very di↵erent models.
We want to see the advantages of bagging such predictors and compare the
accuracy of the models obtained by forward stepwise selection with that of the
models obtained by bagging such a linear predictors. The comparison is to be
carried out via simulations. You may want to follow these steps
1) Simulate the training data T. Assume p = 30, n = 60. Draw (X1,…,Xp) ⇠
Np(0, ⌃) a multivariate normal with ⌃ij = ⇢|ij|
. Then simulate Y from
the model Yi = 0 + Pp
i=1 Xijj + ✏i, with ✏i ⇠ N (0, 1). The following code will work as a building block for the simulation for an arbitrary
choice on non-zero coecients , but you should choose a di↵erent vectors
of coecients. Namely, you should consider the (two) cases in which 5 and
25 of the regression coecients are non zero (choose yourselves values for
the non-zero coecients). You may also want to consider a di↵erent value
of ⇢ among {0.3, 0.4, 0.6, 0.7, 0.8} (one only).
1
library(MASS)
n=60; p=30
rho=0.6
betazero=-1.2
beta=c(-1,1.3,4,1.2,5,-2,0.34)
beta=c(beta, rep(0, p-length(beta))
Sigma=matrix(ncol=p, nrow=p,0)
for(i in 1:p){
for(j in 1:i){Sigma[i,j]=rho^{abs(i-j)}
Sigma[j,i]=Sigma[i,j]
}
}
X=mvrnorm(n=60, mu=rep(0,p), Sigma)
Y=betazero+ X%*%beta +rnorm(n,0,1)
2) Compute by forward stepwise selection the sequence {b1(x, T),…,bp(x, T)}
and their mean-squared prediction errors {e1,…,ep}
3) Draw B = 50 bootstrap samples Tb and for each determine by forward
stepwise selection the set of models {b1(x, Tb),…,bp(x, Tb)}
4) Consider the bagged sequence {¯b1(x),…, ¯bp(x)} and the prediction error
{e
bag
1 ,…,ebag
p }
5) Repeat step 1 to 4, 300 times, average the errors and plot the the curves
of resulting mean errors as a function of the number of predictors in the
model. (In step one, you need to draw new X’s and new errors, to get the
new Y , but ⌃, , p, n should not change)
6) Explain the curves. Are the errors equal? Is the minimum achieved at the
same point? Is one of the two error curves always lower than the other?
If not, why not?
Notice that since these data are simulated, you can simulate an additional
data set as your validation set to compute your error.
Data Analysis 2. Support Vector Machine and
RF
Consider the attached simulated data. The file train.txt contains the training
data (p = 21, N = 300, the last column being the class label) and the file test.txt
contains the testing data (N = 200). Compare the predictive performance of
Random Forest, SVM, boosted trees and, if you wish, the Lasso on these data.
2
Data Analysis 3. Clustering
For the data set train.txt, consider clustering the X using a) K-means (you
can choose K = 2 since you already know there are 2 classes), b) hierarchical
clustering with the following distances: Euclidean, correlation-based distance
and using the proximity measure as obtained by RF (in the previous exercise).
Comment your findings.
Theoretical Questions
1) Exercise 12.1 of ESL page 455
2) Exercise 12.2 of ESL page 455
3