Description
Data Analysis
This exercise is to learn to use Lasso, Ridge regression, PCR, PLS, and as
a byproduct cross-validation. Data should be used to examine the relation between the level of prostate-specific antigen lpsa and a number of clinical measures
in men who were about to receive a radical prostatectomy: lcavol (log cancer
volume), lweight (log prostate weight), age (in years), lbph (log of the amount of
benign prostatic hyperplasia), svi (seminal vesicle invasion), lcp (log of capsular
penetration), gleason (a numeric vector), pgg45 (percent of Gleason score 4 or
5).
Finally the data set cointains a column called train a logical vector. You
should consider the rows for which train=T and use them as your training set
and run on it Lasso, RR, PCR, OLS. Use the rest of the data as a validation
set, to estimate prediction error.
The exercise consists in reproducing Figure 3.7, and Table 3.3 of ELS, and
Figure 3.8, 3.10. Please submit only the tables and plots you obtain, and an
ASCII file with the code you use (and please since there is randomness in this
exercise, you should fix the seed set.seed(a number) so that your exercise can be
reproducible).
Theoretical Questions
The first two questions consist basically in filling some (temporary) omissions
in the third set of notes. See the notes3.pdf file, where I highlighted where the
missing parts should be inserted. To answer the third question, you may want
to look at page 4 on the notes2.pdf file, the Another Example section.
1) Compute the mean sqaured error for ridge regression in the coordinate
system of the eigen directions of the predictor space (for simplicity just
assume the case of maximum rank), that is, where the matrix XT X is
diagonal diag(d
2
1
, . . . , d2
p
). Assume that the training predictors are not
random (that is, only average over the error distribution). In particular
show that the variance of the estimator Fˆ(x) = x
Tβˆ at the point x is
1
given by
V ar(x) = σ
2Xp
i=1
x
2
i
1
d
2
i
d
4
i
(d
2
i + λ)
2
and the bias is given by
B(x) = Xp
i=1
xiβi
d
2
i
d
2
i + λ
− 1
(You can ignore βˆ
0 by assuming Y was been centered).
2) Show that in the case of orthogonal regressors XT X = Ip the Lasso estimates are given by
βˆ
i = sign(βˆOLS
i
) · max{|βˆOLS
i
| − λ, 0} = sign(βˆOLS
i
)
|βˆOLS
i
| − λ
+
where (x)+ is the positive part of x.
3) This exercise is a similar case to the worked-out example in the second set
of lectures (page 4). The only difference is that now you have to consider
other losses. Just choose one of the two losses only.
Consider the training set T = {(yi
, xi)}i=1,…,n, where for simplicity we
assume xi to be fixed and yi ∈ {−1, 1} to be a binary random variable
with pi = P(yi = 1). Given a training set, we assume our estimate for
each pi will be ˆpi = (1 + ayi)/2, where 0 ≤ a ≤ 1 is a parameter that
controls the degree of fit to the training data. Larger values provide a
closer fit. We want to compute training and test error using one of the
following two losses: exponential
L1(y, F) = exp(−yF)
and the squared error
L2(y, F) = (y − F)
2
a) For the loss of your choice, F is defined as the population minimizer
of the coresponding population risk:
F = arg min
F
E[L(y, F)]
Show that
F1 =
1
2
log (p/(1 − p))
using L1 and
F2 = 2p − 1
using L2.
2
b) Show now that the training error Rˆ (the average loss on the training
data) and the test error R (average population risk) are
Rˆ
1 =
1 − a
1 + a
1/2
R1 = (1 − e¯)
1 − a
1 + a
1/2
+ ¯e
1 + a
1 − a
1/2
and
Rˆ
2 = (1 − a)
2 R2 = (1 − a)
2 + 4¯ea
where ¯e =
2
N
PN
i=1 pi(1 − pi)
3



