## Description

E[Yki] = β0 + β1tki +

X

J

j=2

β2jxk(j) +X

J

j=2

β3j tkixk(j)

where xk(j) is a dummy variable that indicating membership in the j

th category. Again using

a compound symmetric correlation structure but only using restricted maximum likelihood,

1

BIO 245

Homework #2

## Question 1:

Continuing our investigations with the MACS data, the MACS-VL.RData dataset on the course

website has longitudinal information on CD4+ cell counts for K=225 MACS participants with

baseline viral load data. In this question we are going to consider the relationship between baseline

viral load and the rate of decline of CD4 count.

(a) Summarize the key variables using simple numerical and/or graphical summaries as relevant

to the scientific question of interest.

(b) Use appropriate exploratory methods to characterize the covariance structure of the data.

What structured covariance model(s) appear plausible/reasonable?

(c) Use the gls() command in the nlme library to fit the model:

E[Yki] = β0 + β1tki + β2xk + β3tkixk

where xk is the (possibly transformed) baseline viral load and tki is time since seroconversion

in months. Use compound symmetric correlation but consider both maximum likelihood and

restricted maximum likelihood for estimation. Present your results in a concise manner that

would be suitable for a journal and provide a precise interpretation of the estimates for the

mean model. Ccomment on whether there is a significant association between baseline viral

load and the rate of decline in CD4+ based on the estimates from this model.

(d) The model in part (c) restricts the analysis in that it is estimating a linear relationship

between (possibly-transformed) baseline viral load and CD4 count over time. As a way of

relaxing this restriction, consider categorizing baseline viral load. Given a categorization with

J levels, one alternative to the model in part (c) is

in which the slope for time depends on the value of the covariate. Specifically, while

the model in (c) assumes that γ1(xk) = β1 + β3xk, the model in (d) assumes a discrete

function where γ1(xk) = β1 +

P

j

β3kxk(j).

Hence, the model in (c) utilizes viral load

in its continuous form, but is restrictive in the nature of the relationship (i.e. linearity), the model in (d) utilizes a categorial version of viral load but makes no assumptions

regarding the functional form of how the rate of decline differs across the viral load categories.

Beyond these two special cases, allowing γ0(xk) and γ1(xk) to consider richer functional forms

than the linear form used in the model in (c) provides a more flexible description of how the

rate of decline differs for different values of baseline viral load.

With this in mind, use a

varying coefficient model for the rate of decline in CD4+ that characterizes how the rate

of decline depends on baseline viral load. I recommend that you use natural or restricted

cubic splines for the coefficient functions and simply choose two knots. Plot the estimated

coefficient function ˆγ1(xk) with pointwise 95% confidence bands, and interpret specific values.

What does this plot suggest about the adequacy of the model in (c)?

2

fit the model and present your results in a concise manner that would be suitable for a journal.

Provide a precise interpretation of the estimates in this regression model, and comment on

whether there is a significant association between baseline viral load and the rate of decline

in CD4+ based on the estimates from this model.

(e) (optional) The models in parts (c) and (d) can be viewed as special cases of a ‘varying

coefficient’ model:

E[Yki] = γ0(xk) + γ1(xk)tki,

(aIm + b1m)

−1 =

1

a

Im −

b

a + mb1m

for a 6= 0 and a 6= −mb and:

|aIm + b1m| = a

m−1

(a + mb)

(a) Derive the likelihood and log-likelihood as a function of (µ, σ

2

, τ

2

).

(b) Show that the MLEs for µ, σ

2

, and τ

2 are given by:

µˆ = Y¯

··

σˆ

2 = MSE

τˆ

2 =

(1 − 1/n)MSA − MSE

m

where MSA = n

P

i

(Y¯

k· − Y¯

··)

2/(K − 1) and MSE = P

k

P

i

(Yki − Y¯

k·)

2/[K(n − 1)]. Hint: It

may be helpful to write λ = σ

2 + nτ 2

.

(c) Obtain the form for Var[ˆµ] and hence an estimate of this quantity.

(d) Find the REML estimators for σ

2 and τ

2 by integrating µ out of the likelihood in part (a).

(e) In the one-way random effects model with balanced data, it can be shown that:

MSA/(σ

2 + mτ 2

)

MSE/σ2

∼ FK−1, K(n−1)

where FK−1, K(n−1) denotes the F distribution with K − 1 and K(n − 1) degrees of freedom. Hence explain why F

? = MSA/MSE may be compared to an FK−1, K(n−1) to test the

hypothesis H: τ

2 = 0.

3

## Question 2 (Optional):

Consider the one-way analysis of variance model:

Yki = µ + γk + ki,

with i = 1, . . . , n replicates on k = 1, . . . , K units and

γk ∼ Normal(0, τ

2

),

ki ∼ Normal(0, σ

2

),

γk ⊥ ki.

The following may be useful: Let Im denote the m × m identity matrix and 1m denote the m × 1

vector of 1’s. Then: