Description
Question 1:
In Homework #1 you conducted EDA for a sub-sample of K=300 girls from the Six Cities
study, specifically from Topeka. In this question we are going to return to these data and fit a
series of linear mixed models investigating the relationship between lung function and age. Towards
adjusting for height we are going to consider the response:
Yki = FEVki
height2
ki
.
With this response in mind, consider two linear mixed models with the following (induced) marginal
means:
Yki = β0 + β1Ageki (2)
Yki = β0 + β1Ageki + β2Age2
ki + β3Age3
ki (3)
The ultimate goal is to evaluate whether the more parsimonious model (2) can be adopted, or if
there is sufficient evidence in the data to suggest that the more complex model (3) is appropriate.
For the purposes of this question, restrict attention to those girls in the dataset with at least 5
measurements.
(a) Produce a figure of the response, Yki as a function of age. On the figure indicate the individual
trajectories for a random sample of 4 girls.
(b) Use the lme() function to fit model (3) with the following dependence structures:
(i) independent, homoskedastic errors
(ii) random intercepts plus independent, homoskedastic errors
(iii) random intercepts/slopes plus independent, homoskedastic errors
(iv) random intercepts plus auto-regressive errors
(v) random intercepts plus exponential spatial errors
(vi) random intercepts plus exponential spatial errors and independent, homoskedastic errors
(vii) random intercepts plus independent, heteroskedastic errors
(viii) random intercepts/slopes plus independent, heteroskedastic errors
Note for (vii) and (viii), consider heterskedasticity by age. Report the results in two succinct
tables. Specifically, as in the notes, produce a table reporting the log-likeihood and AIC
commenting on which two models provide the best fit of the data. For these two models,
write out the models in full notation, and report point and standard error estimates from the
fits.
(c) Produce another figure of the response as a function of age (as in part (a)) but instead of
indicating individual trajectories, augment the figure with two fitted regression curves The
first is the fitted regression curve (i.e. the induced marginal mean model) using the bestfitting dependence structure from part (b). The second is the fitted regression curve using
the same best-fitting dependence structure but with the mean model given by (2).
(d) Using at most 3-4 sentences, provide an explanation/interpretation of the curve for model (3)
that you could use with a non-biostatistician collaborator.
(e) Using the output from the model fits in part (c), conduct a formal evaluation of whether
model (2) can be adopted in favor of model (3). Explain in detail how you do this and what
you conclude.
(f) Given the (somewhat artificial) choice between models (2) and (3), explain which model you
could advise a collaborator to adopt and why.
5 P8157 Midterm