Sale!

STAT 3008 Applied Regression Analysis Assignment #3 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (2 votes)

Problem 1 [25 points]: Consider the scatterplot below with data {(xi
, yi), i = 1, 2, …, 24}:
Suppose the data is fitted to a quadratic regression with mean function
2
0 1 2 E(Y | X  x)     x   x
In matrix form,
Y  Xβ  e =>






















































24
2
1
2
1
0
2
24 24
2
2 2
2
1 1
24
2
1
1
1
1
e
e
e
x x
x x
x x
y
y
y
    



Given that
x  5.833333, y  56.6275 ,  Y’ Y


24
1
2
i
i
y = 89882.2642.











1164 10568 98268
140 1164 10568
24 140 1164
X’ X ,  











 



0.020761 0.015105 0.001389
0.242619 0.167181 0.015105
0.450020 0.242619 0.020761
1 X’X ,











47452 .65
6322.83
1359.06
X’Y
(a) [6 points] Compute the OLS estimates for 0, 1 and 2.
(b) [5 points] Compute the RSS and show that

ˆ
 13.99
(12.65 also acceptable).
(c) [5 points] Let x* be the optimal value of x which maximizes the response y. What is the
point estimate of x*?
(d) [6 points] Construct an ANOVA table to test if β2 = β1 = 0 [not β0 = β1 = 0]
(No testing procedure is required, only the ANOVA table is sufficed).
(e) [3 points] Suppose x is an experimental EV. Before performing the experiment to obtain
the response y, it’s known that the optimal value of x is somewhere in the middle of
the interval [1.0, 10.0]. Briefly comment on whether the current choice of EV values {xi
,
i = 1,2, … ,24} is reasonable.
Page 2/3
Problem 2 [33 points]: The data ‘salary’ from the alr3 library contains salary and other
characteristics of all faculty members in a small Midwestern college in early 1980s. Below
are the description of selected variables in the data file:
Variable Notation Description
Sex S 1 = Female, 0 = Male
Rank R 1 = Assistant Professor, 2 = Associate Professor, 3 = Full Professor
Year X Number of years in current rank
Salary Y Annual salary (in US$)
> library(car); library(alr3); S<-salary$Sex; R<-salary$Rank; X<-salary$Year; Y<-salary$Salary
Let U2 and U3 be the dummy variables for Associate Professor and Full Professor
respectively.
(a) [8 points] Assume that the impact of the number of years in current rank (Year X) is the
same for different sex and ranks, we construct a linear model to explain the salary by the
3 other variables:
(Model 1)
 

       
3
2
0 1 0 1
( | , , )
j
j j j j E Y S s R j X x   s  x  U  U s
Compute the OLS estimates for the parameters
( , , , , , , , )
2 0 1  02 03 12 13  .
(b) [2 points] Suppose Mary received an offer as Assistant Professor from that college in
early 1980s right after she received her PhD. Estimate the annual salary (in US$) offered
by the college to her.
(c) [3 points] What is the RSS of the model in part (a)?
(d) [8 points] Construct an ANOVA table for the hypotheses on whether Rank is important to
explain the Salary. That is,
H0:
E(Y | S s,R j, X x) s x    0 1   vs
H1
:
 

       
3
2
0 1 0 1
( | , , )
j
j j j j E Y S s R j X x   s  x  U  U s
(e) [3 points] What are the (I) decision and (II) conclusion you would draw from the results
in part (d)?
[Part (f) to (i)] Suppose ANOVA is used to test the hypothesis on whether salary for male
and female are the same for all the 3 ranks in Model 1.
(f) [1 point] What is the mean function for H0?
(g) [1 point] What is the mean function for H1?
(h) [4 points] Construct the corresponding ANOVA table.
(i) [3 points] What are the (I) decision and (II) conclusion you would draw from the results
in part (h)?
Page 3/3
Problem 3 [21 points] (Modified from Final Exam 2014-15 Term2): Consider a multiple
linear regression with 4 terms:
y   x  x  x  x  e 0 1 1 2 2 3 3 4 4
The table below shows the AIC and BIC for models with different subsets of terms.
(For instance, AIC = -121.8 for Model 3:
y   x e 0 2 2
BIC = -7304.6 for Model 16:
y   x  x  x  x  e 0 1 1 2 2 3 3 4 4
)
(a) [5 points] Implement the forward selection method using the AIC. Show your steps in
details on how you come up with the parsimonious model.
(b) [10 points] Repeat part (a) if the backward selection method is implemented using the
BIC.
(c) [2 points] What is the sample size n?
(d) [4 points] Do you think multicollinearity exist in Model 16? If so, identify the terms
which are highly collinear with each other.
Problem 4 [21 points]: Consider the Berkeley Guidance Study data mentioned in Section 4.1.
Suppose we want to model the height of girls at age 18 by 6 other variables taken at age 2
and age 9 (x1 to x6 in the R codes below):
library(alr3)
y<-BGSgirls$HT18 # height at age 18 (in cm)
x1<-BGSgirls$WT2 # weight at age 2 (in kg)
x2<-BGSgirls$HT2 # height at age 2 (in cm)
x3<-BGSgirls$WT9 # weight at age 9 (in kg)
x4<-BGSgirls$HT9 # height at age 9 (in cm)
x5<-BGSgirls$LG9 # leg circumference at age 9 (in cm)
x6<-BGSgirls$ST9 # strength at age 9 (in kg)
(a) [8 points] Based on the stepAIC function in R (similar to those from Ch6 p26 and p32),
show that the parsimonious model based on AIC is the same regardless of the
(forward/backward) model selection methods
(b) [8 points] Repeat part (a) based on BIC. How do those parsimonious models differ from
the one in part (a).
(c) [5 points] Note that leg circumference at age 9 (variable x5) is not included in the
parsimonious model in part (a) because of multicollinearity. What is the value of
variance inflation factor VIF5 in the full model (i.e. model with all the 6 terms)?
– End of the Assignment –
Revised AIC & BIC values at the
table: AIC and BIC should be
negative in the table instead of
positive in the original
assignment, since BIC>AIC for
each model.