Sale!

STAT 3008 Applied Regression Analysis Assignment #4 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (7 votes)

Problem 1 [30 points]: Consider simple linear regression
i i i
y   x e 0
1
, with E(ei) =0
and Var(ei) = σ
2
for i = 1, 2, …, n.
(a) [11 points] By simplifying the Hat Matrix
H X X’X X’ 1
( )


, show that
SXX
1 ( )
2
x x
n
h
i
ii

 
for i = 1, 2, …, n
[Part (b) and (c)] Suppose xn is a leverage point, with
xn
 a  (n 1)
, but
xi
 a 
for i =
1, 2, …, n-1 for some constants a and δ≠0.
(b) [6 points] Show that hnn = 1.
(c) [5 points] Compute hii as a function of n for i = 1,2, …, n-1.
(d) [8 points] Suppose n=2m+1, with
x1  x2  xm
 a  , xm1  xm2  x2m
 a 
and x2m+1 = a. Evaluate hii as a function of n for i = 1, 2, … n
*Note: Results for part (b) and (c) should be consistent with the H in Ch7 page 9; Results
from part (b) and (d) should provide the upper and lower bounds for Property#5 on page 7.
Problem 2 [23 points]: Suppose we want to explain Tension by Sulfur in the dataset
“baeskel.txt” using a simple linear regression,
library(car); library(alr3); x<-baeskel$Sulfur; y<-baeskel$Tension
(a) [4 points] Draw a scatterplot of the data using the “plot” function in R. Does the plot
suggest a linear relationship between the two variables?
(b) [5 points] Suppose a simple linear regression
i i i
y   x  e 0 1
is fitted to the data.
What is the regression equation based on OLS estimates and its R
2
?
(c) [7 points] Generate the 4 residual plots based on the “plot” function (as in Ch7 page 30).
Comment on the null plot assumption of the residuals.
(d) [7 points] Generate the table of influence diagnostics using the “influence.measures”
function in R. What conclusion can you draw from each of the following measures?
(i) DFFITS (ii) DFBETAS (iii) Cook’s Distance (iv) Leverage
Page 2/2
Problem 3 [47 points]: The data set “stopping” in alr3 contains hypothetical data to explain
the distance (in feet) required to stop an automobile, based on its speed (miles per hour)
right before the brake is applied. library(alr3); x<-stopping$Speed; y<-stopping$Distance; plot(x,y)
(a) [5 points] Suppose a quadratic regression is fitted to the data
(Model Q)
2 2
0 1 2
yi
  xi  xi  ei
with E(ei
)  0 and Var(ei
) 
What are the OLS estimates
0 1 2 
ˆ
,
ˆ and
ˆ
, and the RSS of the model?
(b) [8 points] Suppose Model Q is the full model. Using the “stepAIC” function (Ch6 p.26),
show that the parsimonious model based on AIC and forward selection is
(Model P)
* 2 2
2
*
0
yi
  xi  ei
with E(ei
)  0 and Var(ei
) 
What are the OLS estimates
*
2
*
0 
ˆ and
ˆ , and the RSS of the model?
(c) [8 points] Use the “plot” function to obtain the residual plots (as in Ch7 p30) for Model P.
Which of the null plot assumption (i.e. constant mean, constant variance and separated
points) is invalid based on plots? Explain.
[part (d) to (e)] Suppose we apply the scale power transform
  

(x, )  (x 1)/
s
to x,
where λ = 1.5, 2.0 and 2.5. Consider the regression model with mean function
(Model λ)
( | ) ( , ) E y X  x  0  1 s
x 
(d) [7 points] In the original scatterplot (i.e x vs y), draw the fitted curves for Model λ with
λ = 1.5, 2.0 and 2.5 based on Approach #1 on Ch8 page 11.
(e) [10 points] Compute the RSS of the 3 models (λ = 1.5, 2.0, 2.5). Show that (i) λ = 2.0 is
the best model among the 3, and (ii) explain why RSS(Model λ = 2.0) = RSS(Model P).
(f) [9 points] Suppose a simple linear regression is fitted to transformed data based on
power transform
 (u,)  u
as follows:
(y, )   (x, )  e   0 1  .
For each of λ = 0.2, 0.4, 0.67 and 1.0, draw a scatterplot of
 (x,), ( y,)
with the
inclusion of the corresponding fitted regression line. Which λ is able to provide the
smallest number of leverage points?
– End of the Assignment –