Description
ISyE 6420 Homework 1 Solved
Q1
A student answers a multiple choice examination with two questions that have four possible answers each.
Suppose that the probability that the student knows the answer to a question is 0.65 and the probability
that the student guesses is 0.35. If the student guesses, the probability of guessing the correct answer is 0.25.
The questions are independent, that is, knowing the answer on one question is not influenced by the other
question.
a) What is the probability that both questions will be answered correctly?
b) Suppose both questions were answered correctly. What is the probability that the student really knew
the correct answer to both questions?
c) How would you generalize the above from 2 to n questions, that is, what are answers to (a) and (b) if
the test has n independent questions? What happens to probabilities in (a) and (b) if n → ∞.
Q2
A circuit S consisting of seven independent elements E1, . . . , E7 is connected as shown:
The elements are operational during time interval T with probabilities
E1 E2 E3 E4 E5 E6 E7
Probability of working (p) 0.5 0.4 0.1 0.6 0.9 0.8 0.7
a) Find the probability that the circuit is operational during time interval T.
b) If the circuit was found operational at the time T, what is the probability that the element E6 was
operational.
Q3
A machine has four independent components, three of which fail with probability q = 1 − p, and one with
probability 0.5. The machine is operational as long as at least three components are working.
a) What is the probability that the machine will fail? Evaluate this probability for p = 0.75.
b) If the machine failed, what is the probability that the component which fails with probability 0.5 actually
failed?
c) Suppose that after the machine fails, a diagnostic test is performed to determine whether the component
which fails with probability 0.5 actually failed. The test is 90% accurate when this component fails
(sensitivity) and 80% accurate when it does not fail (specificity). Compute the posterior probability that
this component failed, given that the diagnostic test indicates a failure.
2
ISyE 6420 Homework 2 with Solution
Q1
Suppose data is generated from the model yi
| µ
iid∼ N(µ, 1) for i = 1, . . . , n. Consider a mixture normal
prior:
µ ∼ .5N(−1, 1) + .5N(1, 1)
that is,
p(µ) = .5
√
2π
e
− 1
2
(µ+1)2
+
.5
√
2π
e
− 1
2
(µ−1)2
Suppose we have observed y = 1. Find the posterior distribution of µ.
1. Start by using Bayes theorem: p(µ | y) ∝ p(y | µ)p(µ) = p(y | µ){.5ϕ(µ; −1, 1) + .5ϕ(µ; 1, 1)} and
simplify the two components using the following result.
ϕ
x; µ1, σ2
1
ϕ
x; µ2, σ2
2
= ϕ
x;
µ1/σ2
1 + µ2/σ2
2
1/σ2
1 + 1/σ2
2
,
1
1/σ2
1 + 1/σ2
2
ϕ
µ1 − µ2; 0, σ2
1 + σ
2
2
where ϕ
x; µ, σ2
is the density of a normal distribution with mean µ and variance σ
2
.
2. The posterior is going to be a mixture of two normal distributions. So you will only need to identify
the two mean and variance parameters as well as the weights for the two normal distributions.
Q2
Engineering system of type k-out-of-n is operational if at least k out of n components are operational.
Otherwise, the system fails. Suppose that a k-out-of-n system consists of n identical and independent
elements for which the lifetime has Weibull distribution with parameters r and λ. More precisely, if T is a
lifetime of a component,
P(T ≥ t) = e
−λtr
, t ≥ 0.
Time t is in units of months, and consequently, rate parameter λ is in units (month)
−1
. Parameter r is
dimensionless.
Assume that n = 10, k = 7, r = 1.3 and λ = 1/20.
1. Find the probability that a k-out-of-n system is still operational when checked at time t = 6.
2. At the check up at time t = 6 the system was found operational. What is the probability that at that
time exactly 7 components were operational?
3. At the check up at time t = 6, the system was found operational. What is the probability that the
system would still be operational at the time t = 9?
Hint: The probability that a k-out-of-n system is operational corresponds to the tail probability of binomial
distribution: P(X ≥ k), where X is the number of components working. You can do exact binomial
calculations or use binocdf in Octave/MATLAB (or dbinom in R, or scipy.stats.binom.cdf in Python).
Be careful with ≤ and <, because of the discrete nature of the binomial distribution. Part 2 is straightforward
Bayes formula. Part 3 is the total probability with hypotheses whose probabilities are obtained as in (b).
Q3
From the first page of Rand’s book A Million Random Digits with 100,000 Normal Deviates.
31060 10805 45571 82406 35303 42614 86799 07439 23403 09732
85269 77602 02051 65692 68665 74818 73053 85247 18623 88579
63573 32135 05325 47048 90553 57548 28468 28709 83491 25624
73796 45753 03529 64778 35808 34282 60935 20344 35273 88435
98520 17767 14905 68607 22109 40558 60970 93433 50500 73998
The second 50 five-digit numbers form the Rand’s “A Million Random Digits with 100,000 Normal Deviates”
book (shown above) are rescaled to [0, 1] (by dividing by 100,000 ) and then all numbers < 0.7 are retained.
We can consider the n = 35 retained numbers as a random sample from uniform U(0, 0.7) distribution.
0.3106 0.10805 0.45571 0.35303 0.42614 0.07439 0.23403
0.09732 0.02051 0.65692 0.68665 0.18623 0.63573 0.32135
0.05325 0.47048 0.57548 0.28468 0.28709 0.25624 0.45753
0.03529 0.64778 0.35808 0.34282 0.60935 0.20344 0.35273
0.17767 0.14905 0.68607 0.22109 0.40558 0.60970 0.50500
Pretend now that the threshold 0.7 is not known to us, that is, we are told that the sample is from uniform
U(0, θ) distribution, with θ to be estimated.
Let M be the maximum of the retained sample u1, . . . , u35, in our case M = 0.68665. The likelihood is
f (u1, . . . , u35 | θ) = Y
35
i=1
1
θ
1 (θ > ui) = θ
−351(θ > M)
where 1(A) is 1 if A is true, and 0 if A is false.
Assume noninformative (Jeffreys’) prior on θ,
π(θ) = 1
θ
1(θ > 0)
2
Figure 1: First page of RAND’s book.
Posterior depends on data via the maximum M and belongs to the Pareto family, Pa(c, α), with a density
αcα
θ
α+1 1(θ > c)
1. What are α and c?
2. Estimate θ in Bayesian fashion. Then calculate the 95% equi-tailed credible set. Is the true value of
parameter (0.7) in the credible set?
3. Plot the posterior PDF, adding marks for the regions bound by the above credible set, along with your
point estimate, for each plot.
4. Experiment by replacing the Jeffrey’s prior with increasingly informative Pareto priors. Start with
c < M and very small α. Report what happens to the posterior when varying the Pareto prior
parameters and compare them to the Jeffrey’s prior model.
3
ISyE 6420 Homework 3 Solution
Q1
Marietta Traffic Authority is concerned about the repeated accidents at the intersection of Canton and
Piedmont Roads. Bayes-inclined city-engineer would like to estimate the accident rate, even better, find a
credible set.
A well-known model for modeling the number of road accidents in a particular location/time window is
the Poisson distribution. Assume that X represents the number of accidents in a 3 month period at the
intersection of Canton and Piedmont Roads.
Assume that [X | θ] ∼ Poi(θ). Nothing is known a priori about θ, so it is reasonable to assume the Jeffreys
prior
π(θ) = 1
√
θ
1(0 < θ < ∞)
In the four most recent three-month periods the following realizations for X are observed: 1, 2, 0, and 2 .
(a) Compare the Bayes estimator for θ with the MLE (For Poisson, recall, ˆθMLE = X¯ ).
(b) Compute the 95% equitailed credible set.
(c) Compute (numerically) the 95% HPD credible set. Use an optimization method, not a sampling method.
(d) Numerically find the mode of the posterior, that is, MAP estimator of θ. Make sure it matches the result
of the known equation for the posterior mode.
(e) If you test the hypotheses
H0 : θ ≥ 1 vs H1 : θ < 1
based on the posterior, which hypothesis will be favored?
(f) Derive the posterior predictive distribution. Based on this, how many accidents do you predict for the
next year?
Q2
Waiting time. The waiting time for a bus at a given corner at a certain time of day is known to have a
U(0, θ) distribution. It is desired to test H0 : 0 ≤ θ ≤ 15 versus H1 : θ > 15. From other similar routes, it
is known that θ has a Pareto (5, 3) distribution. If waiting times of 10, 8, 10, 5, and 14 are observed at the
given corner, calculate the posterior odds ratio and the Bayes factor. Which hypothesis would you favor?
Note: the density of a Pareto distribution with parameters (c, α) is given by
αcα
θ
α+1 1(θ > c)
Q3
The Maxwell distribution with parameter α > 0, has a probability density function for x > 0 given by
p(x | α) = r
2
π
α
3/2x
2
exp
−
1
2
αx2
(a) Find the Jeffreys prior for α.
(b) Find a transformation of this parameter in which the corresponding prior is uniform. (Hint: See Section 5.7 of Engineering Biostatistics).
(c) Find the posterior distribution for n independent and identically distributed datapoints x1, . . . , xn from
the Maxwell distribution, assuming the Jeffreys prior on α from part (a).
2
ISyE 6420 Homework 4 Solution
Q1
Pairs (Xi
, Yi), i = 1, . . . , n consist of correlated standard normal random variables (mean 0, variance 1)
forming a sample from a bivariate normal MVN 2(0, Σ) distribution, with covariance matrix
Σ =
1 ρ
ρ 1
.
The density of (X, Y ) ∼ MVN 2(0, Σ) is1
f(x, y|ρ) = 1
2π
p
1 − ρ
2
exp
−
1
2(1 − ρ
2)
(x
2 − 2ρxy + y
2
)
,
with ρ as the only parameter. Take prior on ρ by assuming Jeffreys’ prior on Σ as π(Σ) = 1
|Σ|
3/2 =
1
(1−ρ2)
3/2
,
since the determinant of Σ is 1 − ρ
2
. Thus
π(ρ) = 1
(1 − ρ
2)
3/2
1(−1 ≤ ρ ≤ 1).
(a) If (Xi
, Yi), i = 1, . . . , n are observed, write down the likelihood for ρ. Write down the expression for the
posterior, up to the proportionality constant (that is, un-normalized posterior as the product of likelihood
and prior).
(b) Since the posterior for ρ is complicated, develop a Metropolis-Hastings algorithm to sample from the
posterior. Assume that n = 100 observed pairs (Xi
, Yi) gave the following summaries:
X
100
i=1
x
2
i = 113.5602,
X
100
i=1
y
2
i = 101.6489, and X
100
i=1
xiyi = 75.1491.
In forming a Metropolis-Hastings chain take the following proposal distribution for ρ: At step i generate a
candidate ρ
′
from the uniform U(ρi−1 − 0.1, ρi−1 + 0.1) distribution. Why does the proposal distribution
cancel in the acceptance ratio expression?
(c) Simulate 51000 samples from the posterior of ρ and discard the first 1000 samples (burn in). Plot two
figures: the histogram of ρs and the realizations of the chain for the last 1000 simulations (known as a trace
plot). What is the Bayes estimator and 90% equitailed credible set of ρ based on the simulated chain?
(d) Replace the proposal distribution from (b) by the uniform U(−1, 1) (independence proposal). Comment
on the results.
1See (6.1) on page 243 in http://statbook.gatech.edu.
Q2
Imagine your statistics professor made you watch him flip a coin one hundred times and record the results.
He then tells you that, at some point, he switched the coin. Both of the coins had different biases for the
probability of landing on heads.
He challenges you to use a Bayesian change point model to estimate at which flip he started using the second
coin. You should assume that there were exactly two coins used and that the change point was equally likely
to have happened at any flip.
The results of the coin flips can be found in flips.csv, where a value of 1 means heads and 0 means tails.
(a) Set up a Gibbs sampler for your model. Put Beta(2, 2) priors on the probability of each coin coming up
heads. What likelihood is appropriate?
(b) In your report, include a point estimate, the 94% HPD credible set, and a density plot for the probability
of each coin coming up heads and for the change point.
(c) The professor then says that he actually can’t remember if he switched the coin or not. Use the posterior
odds ratio for the change point to help evaluate whether the professor actually switched the coin. Based on
this information, was the coin likely switched or not?
Q3
In a study of mating calls in the gray tree frogs Hyla hrysoscelis and Hyla versicolor, Gerhart (1994)2
reports
that in a location in Lousiana the following data on the length of male advertisement calls have been collected:
Sample Average SD of
size duration duration
Hyla chrysoscelis 43 0.65 0.18
Hyla versicolor 12 0.54 0.14
The two species cannot be distinguished by external morphology, but H. chrysoscelis (Fig. 1) are diploids
while H. versicolor are tetraploids. The triploid crosses exhibit high mortality in larval stages, and if they
attain sexual maturity, they are sterile. Females responding to the mating calls try to avoid mismatches.
Assume that duration observations are normally distributed with means µ1 and µ2, and precisions τ1 and
τ2, for the two species respectively. For i = 1, 2, assume normal priors on µi
’s as N (0.6, 1) and gamma priors
on τi
’s as Ga(20, 0.5), where 0.5 is a rate hyperparameter.
(a) Based on observations and given priors, in the same loop construct two Gibbs samplers, one for (µ1, τ1)
and the other for (µ2, τ2).
(b) Form a sequence of differences µ1,j −µ2,j , j = 1, . . . , 11000, and after rejecting the initial 1000 differences,
from the remaining simulations estimate 95% equitailed credible set for µ1 − µ2. Does this set contain zero?
What can you say about the hypothesis H0 : µ1 = µ2 based on this credible set? Elaborate on whether the
2Gerhardt, H. C. (1994). Reproductive character displacement of female mate choice in the grey treefrog, Hyla chrysoscelis.
Anim. Behav., 47, 959–969.
2
Figure 1: Hyla chrysoscelis
length of call is a discriminatory characteristic.
Hint: When no raw data are given, that is, when data are summarized via sample size, sample mean, and
sample standard deviation, the following identity may be useful:
Xn
i=1
(yi − µ)
2 = (n − 1)s
2 + n(y − µ)
2
,
where n, y, and s are sample size, sample mean, and sample standard deviation, respectively.
3
ISyE 6420 Homework 5 Solution
Q1
The data in the file enzyme.csv gives the initial rate of reaction of an enzyme (y) and the substrate concentration (x). Consider the following nonlinear regression model:
y =
θ1x
θ2 + x
+ ϵ
where ϵi
iid∼ N
0, σ2
, θ1 > 0, and θ2 > 0. Assume noninformative priors for θ1, θ2, and σ
2
. You can choose
appropriate prior distributions.
(a) Plot the marginal posterior densities of θ1, θ2, and σ
2
. Use θ1 = 200, θ2 = 0.1, and σ
2 = 100
(equivalently, τ = 0.01) for initializing the MCMC chain.
(b) Provide evidence that your model has converged, whether it is a trace plot, lack of divergences, the
Gelman-Rubin statistic (Rhat), or something else.
(c) Compute 95% credible intervals, the mean, and the standard deviation for each of the three parameters.
(From now on, we will rarely specify which type of credible interval—you may use the default for your
chosen software.)
(d) Plot the posterior predictive distribution of y when x = 0.75 and provide the 95% credible intervals.
Q2
Walpole et al. (2007)1 provide data from a study on the effect of magnesium ammonium phosphate on the
height of chrysanthemums, which was conducted at George Mason University in order to determine a possible
optimum level of fertilization, based on the enhanced vertical growth response of the chrysanthemums. Forty
chrysanthemum seedlings were assigned to 4 groups, each containing 10 plants. Each was planted in a similar
pot containing a uniform growth medium. An increasing concentration of MgNH4PO4, measured in grams
per bushel, was added to each plant. The 4 groups of plants were grown under uniform conditions in a
greenhouse for a period of 4 weeks. The treatments and the respective changes in heights, measured in
centimeters, are given in the following table:
1Walpole, W. A., Myers, R. H., Myers, S. L., and Ye (2007). Probability and Statistics for Engineers and Scientists (9th
ed.). Pearson.
Treatment
50 g/bu 100 g/bu 200 g/bu 400 g/bu
13.2 16.0 7.8 21.0
12.4 12.6 14.4 14.8
12.8 14.8 20.0 19.1
17.2 13.0 15.8 15.8
13.0 14.0 17.0 18.0
14.0 23.6 27.0 26.0
14.2 14.0 19.6 21.1
21.6 17.0 18.0 22.0
15.0 22.2 20.2 25.0
20.0 24.4 23.2 18.2
Solve the problem as a one-way ANOVA. Use STZ constraints on treatment effects.
(a) Do different concentrations of MgNH4PO4 affect the average attained height of chrysanthemums? Look
at the 95% credible sets for the differences between treatment effects.
(b) Find the 95% credible set for the contrast µ1 − µ2 − µ3 + µ4.
(c) In a standard one-way ANOVA, we assume constant variance σ
2
for each group. If you relax that
assumption and put a prior on each group’s standard deviation (σi for i = 1, . . . , 4), do the results
from (a) and (b) change? Do the contrasts between the posterior distributions of each σi show that
they were significantly different?
Q3
The data set (available as wolves.csv) described below provides skull morphometric measurements on
wolves (Canis lupus L.) coming from two geographic locations: Rocky Mountain (0) and Arctic (1). The
original source of the data is from Jolicoeur (1959)2
, and many authors have subsequently used this data to
illustrate various multivariate statistical procedures.
The goal of Jolicoeur’s study was to determine how location and gender affect skull shape among wolf
populations. There were 9 predictor variables measured (see Table 1).
2Jolicoeur, P. (1959). Multivariate geographical variation in the wolf Canis lupus L. Evolution, 13(3), 283–299. Data here
are given in inches.
2
Table 1: Wolf skull morphometric data (in inches) from Jolicoeur (1959)3
.
Variable Description
location 0 = Rocky Mountain, 1 = Arctic
gender 0 = male, 1 = female
x1 Palatal length
x2 Postpalatal length
x3 Zygomatic width
x4 Palatal width (outside first upper molars)
x5 Palatal width (inside second upper molars)
x6 Width between postglenoid foramina
x7 Interorbital width
x8 Least width of the braincase
x9 Crown length of the first upper molar
(a) Try a frequentist logistic regression on the data (in Python, you can use the statsmodels package or
sklearn). What are the results?
(b) Set up a Bayesian logistic regression. Try at least three separate models, changing regression coefficient
variance to increasingly informative values for each. What do you observe in the results? How do they
differ from the frequentist model and from each other?
(c) Re-sample the model with only three predictors: gender, x3, and x7. Give an estimate and credible
interval of the probability that a female wolf with measures x3 = 5.28 and x7 = 1.78 comes from an
Arctic habitat.
3




