Sale!

# DDA 4010  Exercise Sheet 5 solution

Original price was: \$35.00.Current price is: \$28.00.

Category:

5/5 - (1 vote)

## Assignment A5.1 (8.1 in Textbook):

Components of variance: Consider the hierarchical model where
θ1, . . . , θm | µ, τ 2 ∼ i.i.d. normal 
µ, τ 2

y1,j , . . . , ynj ,j | θj , σ2 ∼ i.i.d. normal 
θj , σ2

.

For this problem, we will eventually compute the following: Var
yi,j | θi
, σ2

, Var
y¯·, jθi
, σ2

,
Cov
yi1,j , yi2,j | θj , σ2

Var
yi,j | µ, τ 2

, Var
y, j ¯ | µ, τ 2

, Cov
yi1,j , yi2,j | µ, τ 2

First, lets use our intuition to guess at the answers:
• Which do you think is bigger, Var
yi,j | θi
, σ2

or Var
yi,j | µ, τ 2

?

you can interpret the first as the variability of the Y ’s when sampling from a fixed group,
and the second as the variability in first sampling a group, then sampling a unit from within
the group.
• Do you think Cov
yi1,j , yi2,j | θj , σ2

is negative, positive, or zero? Answer the same for
Cov [yi1,j , yi2,j | µ, τ ]. You may want to think about what yi2,j tells you about yi1,j if θj is
known, and what it tells you when θj is unknown.

• Now compute each of the six quantities above and compare to your answers in a) and b).
• Now assume we have a prior p(µ) for µ. Using Bayes’ rule, show that
p

µ | θ1, . . . , θm, σ2
, τ 2
, y1
, . . . , ym

= p

µ | θ1, . . . , θm, τ 2

Interpret in words what this means.

## Assignment A5.2 (8.3 in Textbook):

Hierarchical modeling: The files school1.dat through school8.dat give weekly hours spent on
homework for students sampled from eight different schools. Obtain posterior distributions for
the true means for the eight different schools using a hierarchical normal model with the following
prior parameters:
µ0 = 7, γ2
0 = 5, τ 2
0 = 10, η0 = 2, σ2
0 = 15, ν0 = 2
• Run a Gibbs sampling algorithm to approximate the posterior distribution of 
θ, σ2
, µ, τ 2

.
Assess the convergence of the Markov chain, and find the effective sample size for 
σ
2
, µ, τ 2

.
Run the chain long enough so that the effective sample sizes are all above 1,000 .
• Compute posterior means and 95% confidence regions for 
σ
2
, µ, τ 2

. Also, compare the
posterior densities to the prior densities, and discuss what was learned from the data.
Page 1 of 3
• Plot the posterior density of R =
τ
2
σ2+τ
2 and compare it to a plot of the prior density of R.
Describe the evidence for between-school variation.
• Obtain the posterior probability that θ7 is smaller than θ6, as well as the posterior probability
that θ7 is the smallest of all the θ’s.
• Plot the sample averages y¯1, . . . , y¯8 against the posterior expectations of θ1, . . . , θ8, and
describe the relationship. Also compute the sample mean of all observations and compare it
to the posterior mean of µ.

## Assignment A5.3 (9.3 in Textbook):

Crime: The file crime.dat contains crime rates and data on 15 explanatory variables for
47 U.S. states, in which both the crime rates and the explanatory variables have been centered and scaled to have variance 1. A description of the variables can be obtained by typing
library(MASS);?UScrime in R.
• Fit a regression model y = Xβ +  using the g-prior with g = n, ν0 = 2 and σ
2
0 = 1. Obtain
marginal posterior means and 95% confidence intervals for β, and compare to the least
squares estimates. Describe the relationships between crime and the explanatory variables.
Which variables seem strongly predictive of crime rates?

• Lets see how well regression models can predict crime rates based on the X-variables.
Randomly divide the crime roughly in half, into a training set {ytr, Xtr} and a test set
{yte , Xte}
– Using only the training set, obtain least squares regression coefficients βˆ
ols . Obtain
predicted values for the test data by computing yˆols = Xteβˆ
ols. Plot yˆols versus yte
and compute the prediction error 1
nte
P (yi, te − yˆi,ols)
2
.

– Now obtain the posterior mean βˆ
Bayes = E [β | ytr ] using the g-prior described above
and the training data only. Obtain predictions for the test set yˆBayes = Xtest βˆ
Bayes .
Plot versus the test data, compute the prediction error, and compare to the OLS
prediction error. Explain the results.
• Repeat the procedures in b) many times with different randomly generated test and training
sets. Compute the average prediction error for both the OLS and Bayesian methods.

## Assignment A5.4 (10.2 in Textbook):

esting success: Younger male sparrows may or may not nest during a mating season, perhaps
depending on their physical characteristics. Researchers have recorded the nesting success of 43
young male sparrows of the same age, as well as their wingspan, and the data appear in the file
msparrownest.dat. Let Yi be the binary indicator that sparrow i successfully nests, and let xi
denote their wingspan. Our model for Yi
is logit Pr (Yi = 1 | α, β, xi) = α + βxi
, where the logit

function is given by logit θ = log[θ/(1 − θ)].
• Write out the joint sampling distribution Qn
i=1 p (yi
| α, β, xi) and simplify as much as
possible.
• Formulate a prior probability distribution over α and β by considering the range of Pr(Y =
1 | α, β, x) as x ranges over 10 to 15 , the approximate range of the observed wingspans.
Page 2 of 3

• Implement a Metropolis algorithm that approximates p(α, β | y, x). Adjust the proposal
distribution to achieve a reasonable acceptance rate, and run the algorithm long enough so
that the effective sample size is at least 1,000 for each parameter.
• Compare the posterior densities of α and β to their prior densities.

• Using output from the Metropolis algorithm, come up with a way to make a confidence
band for the following function fαβ(x) of wingspan:
fαβ(x) = e
α+βx
1 + e
α+βx ,
where α and β are the parameters in your sampling model. Make a plot of such a band.
Sheet 5 is due on Dec. 11st. Submit your solutions before Dec. 11st, 5:00 pm.
Page 3 of 3