[Return to syllabus page]
## Statistics 200: Lab 9 (Friday 28 March 1997)

Today's tasks:

*Analysis of variance and diagnostics.*
The following data are taken from Box and Cox (1964), J. Royal
Statistical Society B.
They give the survival times in units of ten hours for experimental
animals given various combinations of poison and treatment. Each
poison (levels I, II, III) was administered to 4 animals for each
treatment (levels A, B. C, D). The four columns of the following
array correspond to the four treatments.

Poison I
.31 .82 .43 .45
.45 1.10 .45 .71
.46 .88 .63 .66
.43 .72 .76 .62
Poison II
.36 .92 .44 .56
.29 .61 .35 1.02
.40 .49 .31 .71
.23 1.24 .40 .38
Poison III
.22 .30 .23 .30
.21 .37 .25 .36
.18 .38 .24 .31
.23 .29 .22 .33

Remark: This is a famous example that is often used to demonstrate
the effect of transformations.
#### Problem 1

Create a data frame 'boxcox' with 48 rows and 3 columns, with the survival
time in the first column, a factor (with levels I, II, III) in the
second column indicating the poison, and a factor (with levels A, B, C,
D) in the third column indicating the treatment.

Hint: Save the data
to a text file. Strip out the Poison labels. Use scan() to get the
treatments into a vector. Use something like
factor(rep(LETTERS[1:4],12))

to get the factors. **Important:** Check that your
factors are lined up correctly with the survival times.
#### Problem 2

Try plot(), plot.factor(), interaction.plot() on the boxcox data.
Make sure you understand what is produced. What do you learn about
the dependence of survival times on treatment and poison?
#### Problem 2

Try
bc1<-aov(survival~poison+treatment,boxcox)

The aov() function fits an additive model by least squares. If you
(conceptually) rearrange survival into a three-way array SURVIVAL,
indexed by treatments, poisons, and animals, then aov() is effectively
finding vectors T.effect and P.effect to solve a least-squares problem:
Decompose
SURVIVAL[treat,poison,k]
= constant
+ T.effect[treat] + P.effect[poison]
+ residual[treat,poison,k]
for
treat = A, B, C, D
poison = I, II, III
k = animals 1,2,3,4
so as to minimize sum(residual^2).

Of course you don't actually have to construct SURVIVAL
explicitly--Splus does it all for you.
The additive fit would be perfect if all residuals were zero. The fit
is considered reasonable if there are no "obvious patterns" in the residuals.

#### Problem 3

Find the attributes of the bc1 object. Try print(bc1),
print.default(bc1), plot(bc1).
If you have heard of analysis of variance, try anova(bc1). (If you
haven't heard of analysis of variance, we can explain it to you.)
What do you see?
How good is the fit?
#### Problem 3

Modify the fit by changing the model formula to
survival~poison*treatment

The new formula has the effect of replacing
T.effect[treat] + P.effect[poison] by a matrix
TandP.effect[treat,poison]. That is, it fits the model with
*interactions*.
How good is the fit this time?

#### Problem 4

Repeat the model fitting with survival replaced by 1/survival.
Look at what you get.
#### Homework Problem

Prepare an informal report **of at most three pages** that you
could use to explain to Dean Brodhead why one should not blindly fit
models without looking at diagnostics.