## Statistics 200: Lab 9 (Friday 28 March 1997)

Analysis of variance and diagnostics.

The following data are taken from Box and Cox (1964), J. Royal Statistical Society B. They give the survival times in units of ten hours for experimental animals given various combinations of poison and treatment. Each poison (levels I, II, III) was administered to 4 animals for each treatment (levels A, B. C, D). The four columns of the following array correspond to the four treatments.

```Poison I
.31	.82	.43	.45
.45	1.10	.45	.71
.46	.88	.63	.66
.43	.72	.76	.62

Poison II
.36	.92	.44	.56
.29	.61	.35	1.02
.40	.49	.31	.71
.23	1.24	.40	.38

Poison III
.22	.30	.23	.30
.21	.37	.25	.36
.18	.38	.24	.31
.23	.29	.22	.33
```
Remark: This is a famous example that is often used to demonstrate the effect of transformations.

#### Problem 1

Create a data frame 'boxcox' with 48 rows and 3 columns, with the survival time in the first column, a factor (with levels I, II, III) in the second column indicating the poison, and a factor (with levels A, B, C, D) in the third column indicating the treatment.
Hint: Save the data to a text file. Strip out the Poison labels. Use scan() to get the treatments into a vector. Use something like
```factor(rep(LETTERS[1:4],12))
```
to get the factors. Important: Check that your factors are lined up correctly with the survival times.

#### Problem 2

Try plot(), plot.factor(), interaction.plot() on the boxcox data. Make sure you understand what is produced. What do you learn about the dependence of survival times on treatment and poison?

#### Problem 2

Try
```bc1<-aov(survival~poison+treatment,boxcox)
```
The aov() function fits an additive model by least squares. If you (conceptually) rearrange survival into a three-way array SURVIVAL, indexed by treatments, poisons, and animals, then aov() is effectively finding vectors T.effect and P.effect to solve a least-squares problem:
```Decompose

SURVIVAL[treat,poison,k]
=  constant
+ T.effect[treat] + P.effect[poison]
+ residual[treat,poison,k]
for
treat = A, B, C, D
poison = I, II, III
k = animals 1,2,3,4

so as to minimize sum(residual^2).
```
Of course you don't actually have to construct SURVIVAL explicitly--Splus does it all for you.

The additive fit would be perfect if all residuals were zero. The fit is considered reasonable if there are no "obvious patterns" in the residuals.

#### Problem 3

Find the attributes of the bc1 object. Try print(bc1), print.default(bc1), plot(bc1). If you have heard of analysis of variance, try anova(bc1). (If you haven't heard of analysis of variance, we can explain it to you.) What do you see? How good is the fit?

#### Problem 3

Modify the fit by changing the model formula to
```survival~poison*treatment
```
The new formula has the effect of replacing T.effect[treat] + P.effect[poison] by a matrix TandP.effect[treat,poison]. That is, it fits the model with interactions.

How good is the fit this time?

#### Problem 4

Repeat the model fitting with survival replaced by 1/survival. Look at what you get.

#### Homework Problem

Prepare an informal report of at most three pages that you could use to explain to Dean Brodhead why one should not blindly fit models without looking at diagnostics.