## Description

Use the following information for Questions 1 to 3 [5 marks each]

A one-way ANOVA was estimated to see if a single factor with four levels (denoted a, b,

c and d) had any effect on a certain response variable, Y . A balanced design was used,

and the conventional null hypothesis was rejected at a 5% significance level. A post-hoc

Tukey test was carried out with a 5% experiment-wise error rate and the results from the

Tukey test are presented in the following underlining diagram.

a c b d

3 4 5 6

## 1. The advantage of a balanced design is that it:

(a) allows the use of Q-Q plots to check for normality of the error random variables.

(b) ensures constant variance of the errors.

(c) ensures there are no outliers in the data.

(d) gives the highest power for the ANOVA.

(e) helps to reduce bias.

## 2. The sample mean of the response variable, y, is approximately equal to:

(a) 3.14

(b) 4.00

(c) 4.45

(d) 5.65

(e) 6.00

## 3. The Tukey test and underlining diagram indicate that the population mean of Y :

(a) has no significant differences between any levels of the factor.

(b) has no significant differences between levels b and c but has significant differences between (b and c) and all other levels of the factor.

(c) is significantly lower at level a than at all other levels of the factor.

(d) is significantly lower at level a than at level d.

(e) has significant differences between all levels of the factor.

2

Use the following information for Questions 4 and 5 [5 marks each]

Consider the random effects model for a one-way ANOVA with p = 5 groups,

Yij = µ + Ai + Eij

where Yij is the j

th observation in the i

th group, i = 1, 2, 3, 4, 5.

4. The number of random variables in the random effects model for a one-way ANOVA

with p = 5 groups is:

(a) 1

(b) 2

(c) 3

(d) 4

(e) 5

5. The number of components of variation in the random effects model for a one-way

ANOVA with p = 5 groups is:

(a) 1

(b) 2

(c) 3

(d) 4

(e) 5

## 6. [5 marks] In a linear multiple regression model the:

(a) errors are all assumed to be zero.

(b) errors are assumed to be independent of all the explanatory variables.

(c) residuals are spread evenly around the line of best fit in the Q-Q plot.

(d) response variable is assumed to be independent of all the explanatory variables.

(e) response variable is assumed to be independent of the error term.

3

Use the following information for Questions 7 and 8 [5 marks each]

Consider the complete model for an ANCOVA with response variable Y , one qualitative

factor with p = 3 levels, and one covariate, x:

Y = αi + βix + E,

for the factor at level i, i = 1, 2, 3.

A graph showing that complete model fitted to 85 observations from an unbalanced, observational study with data from 3 groups labelled A, B and C follows.

7. The relationship between the response variable and the covariate is:

(a) different, depending on the level of the factor, but always negative.

(b) not displayed in that graph.

(c) not significant, since the lines have different slopes.

(d) of no interest, since the covariate is always a nuisance variable in any ANCOVA.

(e) positive in every ANCOVA model by assumption, but sometimes it is estimated

to be negative due to sampling variability, as in this case.

8. Given there are n = 85 observations and p = 3 levels of the factor, the number of

independent parameters fitted in the complete ANCOVA model, as displayed in the

graph, is:

(a) 3 = p

(b) 4 = p + 1

(c) 5 = 2p − 1

(d) 6 = 2p

(e) 82 = n − p

4

## Section B: Written Answers [60 marks]

For Section B questions, you must write out your answers. Ensure you label your work,

to make it clear which part of each question you are answering.

9. [20 marks]

Four educational tests for 10-year-olds have been developed. They are meant to be

of the same standard, so that they may be used interchangeably. Forty 10-year-olds

were randomly selected, and ten were allocated randomly to each test. Their scores

out of 100 follow:

Test Y = Score out of 100 Mean

Test 1 64 81 53 73 50 67 69 53 53 52 61.5

Test 2 53 60 64 56 71 60 57 56 54 71 60.2

Test 3 52 54 69 51 61 61 60 74 45 59 58.6

Test 4 76 69 63 61 68 48 77 74 72 68 67.6

Relevant SAS output is on pages 6 and 7.

(a) Give the model equation and hypotheses for a one-way analysis of variance.

State the meaning of all the terms in the equation.

(b) State the assumptions of a one-way ANOVA. Do you think they are satisfied?

Justify your answer.

(c) Using a 5% significance level, give the test statistic, the degrees of freedom and

the p-value for the test, and your statistical conclusion plus interpretation of

the result.

(d) Failure to reject a null hypothesis may be caused by different groups having

equal population means. However, if there really is a difference of population

means, there may still be failure to reject H0. Give a statistical reason why

this may happen.

5

SAS Output for Q9

6

SAS Output for Q9 continued

7

10. [40 marks]

Sixty children (who were not colour-blind) aged 7 to 12 years were given the online

Farnsworth-Munsell 100 hue test, in which they sort blocks of colour into order

by hue from purple to magenta. Their scores are TES = total error score, with a

score of 0 indicating perfect sorting by colour. This is regarded as an exploratory

experiment, with possible predictors of the TES scores being Age (in years) and

Time (time taken to do the sorting, in seconds).

TES = total error score

Age = age in years

Time = time taken (seconds)

Age (yrs) Variable Observed value

7 TES 180 197 173 171 130 208 180 224 173 148

Time (sec) 228 183 263 212 259 223 311 256 217 260

8 TES 129 191 136 157 160 195 154 179 191 178

Time (sec) 341 336 355 308 313 220 283 191 278 373

9 TES 166 104 127 179 120 139 120 122 165 94

Time (sec) 351 207 375 227 347 295 225 372 343 392

10 TES 131 169 109 144 115 108 91 156 101 111

Time (sec) 206 299 389 212 380 264 320 202 351 265

11 TES 113 102 99 98 94 102 106 103 83 90

Time (sec) 212 186 384 347 217 342 189 209 345 383

12 TES 82 92 64 68 83 85 83 71 73 93

Time (sec) 228 247 299 317 370 259 376 261 286 261

SAS output from fitting six different regression models is on pages 9 to 14. A new

variable logTES = log(TES) was also defined in the data file.

(a) In general, for any data set, give the model equation for a multiple regression

with two predictors, X1 and X2. Define all the terms in the model.

(b) State the assumptions for a multiple regression model with two predictors, X1

and X2.

(c) For this specific data set, we have X1 = Age and X2 = Time. Why is the

analysis using Y = logTES preferable to the one using Y = TES? Support

your answer by referring to the SAS output.

(d) Using the logTES analysis, compare all the models, with one and two predictors, using t-tests and a 5% significance level. For each comparison, give

the hypotheses, the t statistic and the p-value. You may present these results

either in words or using a lattice diagram of the models.

(e) Interpret your results for all three models using the logTES analysis from part

(d). Include:

i. comments for each model on the meaning of the negative signs of the beta

(slope) estimates,

ii. a brief explanation of why Time is significant in the multiple regression

but not significant if it is the only predictor.

8

SAS Output for Q10

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: TES

Number of Observations Read 60

Number of Observations Used 60

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 2 73930 36965 84.11 <.0001

Error 57 25051 439.49956

Corrected Total 59 98982

Root MSE 20.96424 R-Square 0.7469

Dependent Mean 130.15000 Adj R-Sq 0.7380

Coeff Var 16.10776

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 344.92450 18.38970 18.76 <.0001

Age 1 -19.82001 1.59749 -12.41 <.0001

Time 1 -0.09266 0.04241 -2.19 0.0330

9

SAS Output for Q10 continued

10

SAS Output for Q10 continued

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: TES

Number of Observations Read 60

Number of Observations Used 60

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 1 71832 71832 153.45 <.0001

Error 58 27150 468.10034

Corrected Total 59 98982

Root MSE 21.63563 R-Square 0.7257

Dependent Mean 130.15000 Adj R-Sq 0.7210

Coeff Var 16.62361

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 322.62000 15.78631 20.44 <.0001

Age 1 -20.26000 1.63550 -12.39 <.0001

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: TES

Number of Observations Read 60

Number of Observations Used 60

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 1 6276.68814 6276.68814 3.93 0.0523

Error 58 92705 1598.36141

Corrected Total 59 98982

Root MSE 39.97951 R-Square 0.0634

Dependent Mean 130.15000 Adj R-Sq 0.0473

Coeff Var 30.71803

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 175.59005 23.50406 7.47 <.0001

Time 1 -0.15897 0.08022 -1.98 0.0523

11

SAS Output for Q10 continued

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: logTES

Number of Observations Read 60

Number of Observations Used 60

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 2 4.67546 2.33773 95.48 <.0001

Error 57 1.39558 0.02448

Corrected Total 59 6.07103

Root MSE 0.15647 R-Square 0.7701

Dependent Mean 4.81833 Adj R-Sq 0.7621

Coeff Var 3.24745

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 6.50972 0.13726 47.43 <.0001

Age 1 -0.15859 0.01192 -13.30 <.0001

Time 1 -0.00064657 0.00031650 -2.04 0.0457

12

SAS Output for Q10 continued

13

SAS Output for Q10 continued

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: logTES

Number of Observations Read 60

Number of Observations Used 60

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 1 4.57328 4.57328 177.10 <.0001

Error 58 1.49775 0.02582

Corrected Total 59 6.07103

Root MSE 0.16070 R-Square 0.7533

Dependent Mean 4.81833 Adj R-Sq 0.7490

Coeff Var 3.33510

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 6.35408 0.11725 54.19 <.0001

Age 1 -0.16166 0.01215 -13.31 <.0001

Linear Regression Results

The REG Procedure

Model: Linear_Regression_Model

Dependent Variable: logTES

Number of Observations Read 60

Number of Observations Used 60

Analysis of Variance

Source DF Sum of

Squares

Mean

Square

F Value Pr > F

Model 1 0.34417 0.34417 3.49 0.0670

Error 58 5.72686 0.09874

Corrected Total 59 6.07103

Root MSE 0.31423 R-Square 0.0567

Dependent Mean 4.81833 Adj R-Sq 0.0404

Coeff Var 6.52150

Parameter Estimates

Variable DF Parameter

Estimate

Standard

Error

t Value Pr > |t|

Intercept 1 5.15482 0.18474 27.90 <.0001

Time 1 -0.00118 0.00063053 -1.87 0.0670

14