Sale!

STAT 4006 Categorical Data Analysis Problem Sheet 4 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (6 votes)

1. (Exercise 8.1 from Agresti (2013)) For Table 1, let Y = belief in existence of heaven, x1 = gender (1=
females, 0 = males), and x2 = race (1=blacks, 0=whites). Table 2 shows the fit of the model
log(πj/π3) = αj + β
G
j x1 + β
R
j x2, j = 1, 2
with s.e. in parentheses.
(a) Find the prediction equation for log(π1/π2).
(b) Using the “yes” and “no” response categories, interpret the conditional gender effect using a 95% confidence interval for an odds ratio.
(c) Find ˆπ1 = Pˆ(Y = yes ) for white females.
(d) Without calculating estimated probabilities, explain why the intercept estimates indicate that for white
males, ˆπ1 > πˆ2 > πˆ3. Use the intercept and gender estimates to show that the same ordering applies for
black females.
(e) Without calculating estimated probabilities, explain why the estimates in the gender row indicate that
πˆ1 is higher for females than for males, for each race.
(f) For this fit, G2 = 0.69. Deleting the gender effect, G2 = 46.74. Conduct a likelihood-ratio test of whether
opinion is independent of gender, given race. Interpret.
Belief in Heaven
Race Gender Yes Unsure No
Black Female 88 16 2
Male 54 17 5
White Female 397 141 24
Male 235 189 39
Table 1: Belief in the Existence of Heaven Data
Belief Categories for Logit
Parameter Yes/No Unsure/No
Intercept 1.785 (0.168) 1.554 (0.172)
Gender 1.044 (0.259) 0.254 (0.269)
Race 0.703 (0.411) -0.106 (0.438)
Table 2: Heaven Data – Fitted Values
1
2. (Exercise 8.37 from Agresti (2013)) Consider the logit model,
logit[P(Y ≤ j)] = αj + βjx,
not having proportional odds form.
(a) With continuous x taking values in (−∞, ∞), show that the model is improper in that cumulative
probabilities are misordered for a range of x values.
(b) When x is a binary indicator, explain why the model is proper but requires constraints on (αj + βj ) (as
well as the usual ordering constraint on {αj}) and is then equivalent to the saturated model.
3. There are 9 different hierarchical loglinear models can be fit to a contingency table with three variables X,
Y and Z. List all models by using the notation adopted in class notes. For each model you have, state the
structure among X, Y and Z.
4. For the saturated model with “Belief in Afterlife” data (Example 10.1 in the lecture notes), Table 3 reports
the {λ
XY
ij } estimates: Show how to use the data in Table 3 to estimate the odds ratio.
Parameter df Estimate Std. Error
gender*belief females yes 1 0.1368 0.1705
gender*belief females no 0 0.0000 0.0000
gender*belief males yes 0 0.0000 0.0000
gender*belief males no 0 0.0000 0.0000
Table 3: Afterlife Data
5. In a survey study, 2,276 students are asked whether they had ever used alcohol (A), cigarettes (C), or marijuana
(M) in their final year of high school in a non-urban area near Dayton, Ohio. The fitted values for several
loglinear models are shown in Table 10.6 from the Chapter 6 notes.
(a) Use AIC and BIC to select the best model based on the information given in Table 4.
Model G2 df
(A,C,M) 1286.0 4
(AC,M) 843.8 3
(AM,C) 939.6 3
(CM,A) 534.2 3
(AM,CM) 187.8 2
(AC,AM) 497.4 2
(AC,CM) 92.0 2
(AC,AM,CM) 40.4 1
Table 4: Model Selection
(b) Write down the loglinear model you identified in (b). Also show clearly how can you derive the corresponding logit model, regarding the whether they had ever used cigarettes (C) as the response variable.
6. The 1988 General Social Survey compiled by the National Opinion Research Center asked: “Do you support
or oppose the following measures to deal with AIDS? (1) Have the government pay all of the health care costs
of AIDS patients; (2) Develop a government information program to promote safe sex practices, such as the
use of condoms.” Table 5 summarizes fits of loglinear models about health care costs (H) and the information
program (I), classified also by the respondent’s gender (G).
(a) Explain why these model (GH, GI, HI) has one degree of freedom.
(b) Use Table 5 to test which interaction terms are significant using likelihood ratio tests. Which models
would you like to fit next?
THE END
2
Model df Deviances p-value
(GH, GI) 2 11.67 0.0029
(GH, HI) 2 4.127 0.1270
(GI, HI) 2 2.383 0.3038
(GH, GI, HI) 1 0.3007 0.5834
Table 5: AIDS Survey Model Fits
3