## Description

## Assignment A5.1 (8.1 in Textbook):

Components of variance: Consider the hierarchical model where

θ1, . . . , θm | µ, τ 2 ∼ i.i.d. normal

µ, τ 2

y1,j , . . . , ynj ,j | θj , σ2 ∼ i.i.d. normal

θj , σ2

.

For this problem, we will eventually compute the following: Var

yi,j | θi

, σ2

, Var

y¯·, jθi

, σ2

,

Cov

yi1,j , yi2,j | θj , σ2

Var

yi,j | µ, τ 2

, Var

y, j ¯ | µ, τ 2

, Cov

yi1,j , yi2,j | µ, τ 2

First, lets use our intuition to guess at the answers:

• Which do you think is bigger, Var

yi,j | θi

, σ2

or Var

yi,j | µ, τ 2

?

To guide your intuition,

you can interpret the first as the variability of the Y ’s when sampling from a fixed group,

and the second as the variability in first sampling a group, then sampling a unit from within

the group.

• Do you think Cov

yi1,j , yi2,j | θj , σ2

is negative, positive, or zero? Answer the same for

Cov [yi1,j , yi2,j | µ, τ ]. You may want to think about what yi2,j tells you about yi1,j if θj is

known, and what it tells you when θj is unknown.

• Now compute each of the six quantities above and compare to your answers in a) and b).

• Now assume we have a prior p(µ) for µ. Using Bayes’ rule, show that

p

µ | θ1, . . . , θm, σ2

, τ 2

, y1

, . . . , ym

= p

µ | θ1, . . . , θm, τ 2

Interpret in words what this means.

## Assignment A5.2 (8.3 in Textbook):

Hierarchical modeling: The files school1.dat through school8.dat give weekly hours spent on

homework for students sampled from eight different schools. Obtain posterior distributions for

the true means for the eight different schools using a hierarchical normal model with the following

prior parameters:

µ0 = 7, γ2

0 = 5, τ 2

0 = 10, η0 = 2, σ2

0 = 15, ν0 = 2

• Run a Gibbs sampling algorithm to approximate the posterior distribution of

θ, σ2

, µ, τ 2

.

Assess the convergence of the Markov chain, and find the effective sample size for

σ

2

, µ, τ 2

.

Run the chain long enough so that the effective sample sizes are all above 1,000 .

• Compute posterior means and 95% confidence regions for

σ

2

, µ, τ 2

. Also, compare the

posterior densities to the prior densities, and discuss what was learned from the data.

Page 1 of 3

• Plot the posterior density of R =

τ

2

σ2+τ

2 and compare it to a plot of the prior density of R.

Describe the evidence for between-school variation.

• Obtain the posterior probability that θ7 is smaller than θ6, as well as the posterior probability

that θ7 is the smallest of all the θ’s.

• Plot the sample averages y¯1, . . . , y¯8 against the posterior expectations of θ1, . . . , θ8, and

describe the relationship. Also compute the sample mean of all observations and compare it

to the posterior mean of µ.

## Assignment A5.3 (9.3 in Textbook):

Crime: The file crime.dat contains crime rates and data on 15 explanatory variables for

47 U.S. states, in which both the crime rates and the explanatory variables have been centered and scaled to have variance 1. A description of the variables can be obtained by typing

library(MASS);?UScrime in R.

• Fit a regression model y = Xβ + using the g-prior with g = n, ν0 = 2 and σ

2

0 = 1. Obtain

marginal posterior means and 95% confidence intervals for β, and compare to the least

squares estimates. Describe the relationships between crime and the explanatory variables.

Which variables seem strongly predictive of crime rates?

• Lets see how well regression models can predict crime rates based on the X-variables.

Randomly divide the crime roughly in half, into a training set {ytr, Xtr} and a test set

{yte , Xte}

– Using only the training set, obtain least squares regression coefficients βˆ

ols . Obtain

predicted values for the test data by computing yˆols = Xteβˆ

ols. Plot yˆols versus yte

and compute the prediction error 1

nte

P (yi, te − yˆi,ols)

2

.

– Now obtain the posterior mean βˆ

Bayes = E [β | ytr ] using the g-prior described above

and the training data only. Obtain predictions for the test set yˆBayes = Xtest βˆ

Bayes .

Plot versus the test data, compute the prediction error, and compare to the OLS

prediction error. Explain the results.

• Repeat the procedures in b) many times with different randomly generated test and training

sets. Compute the average prediction error for both the OLS and Bayesian methods.

## Assignment A5.4 (10.2 in Textbook):

esting success: Younger male sparrows may or may not nest during a mating season, perhaps

depending on their physical characteristics. Researchers have recorded the nesting success of 43

young male sparrows of the same age, as well as their wingspan, and the data appear in the file

msparrownest.dat. Let Yi be the binary indicator that sparrow i successfully nests, and let xi

denote their wingspan. Our model for Yi

is logit Pr (Yi = 1 | α, β, xi) = α + βxi

, where the logit

function is given by logit θ = log[θ/(1 − θ)].

• Write out the joint sampling distribution Qn

i=1 p (yi

| α, β, xi) and simplify as much as

possible.

• Formulate a prior probability distribution over α and β by considering the range of Pr(Y =

1 | α, β, x) as x ranges over 10 to 15 , the approximate range of the observed wingspans.

Page 2 of 3

• Implement a Metropolis algorithm that approximates p(α, β | y, x). Adjust the proposal

distribution to achieve a reasonable acceptance rate, and run the algorithm long enough so

that the effective sample size is at least 1,000 for each parameter.

• Compare the posterior densities of α and β to their prior densities.

• Using output from the Metropolis algorithm, come up with a way to make a confidence

band for the following function fαβ(x) of wingspan:

fαβ(x) = e

α+βx

1 + e

α+βx ,

where α and β are the parameters in your sampling model. Make a plot of such a band.

Sheet 5 is due on Dec. 11st. Submit your solutions before Dec. 11st, 5:00 pm.

Page 3 of 3