Homework 2 for Statistics 101-106 (Fall 98)

Due: Thursday 24 September

(0.1) Problem 2.60 from Moore & McCabe:

 
Table 2.10 presents four sets of data prepared by the statistician
Frank Anscombe to illustrate the dangers of calculating without first
plotting the data (Frank J. Anscombe, "Graphs in statistical analysis," The
American Statistician, 27 (1973), pp. 17-21.)

(a) Without making scatterplots, find the correlation and the least-squares
regression line for all four data sets. What do you notice? Use the
regression line to predict y for x = 10.

(b) Make a scatter plot for each of the data sets and add the regression
line to each plot.

(c) In which of the four cases would you be willing to use the regression
line to describe the dependence of y on x? Explain your answer in each
case.
These data were created artificially by Anscombe to make an important point about the dangers of blind calculation without plotting.
Add an extra part to the question:

(d) For each dataset, make a plot of residuals against fitted values. Such plots are one standard way of identifying bad fits. Explain how these plots could help you with your answers to part (c).

Here are the data from Table 2.10:

x1	y1	x2	y2	x3	y3	x4	y4
10	8.04	10	9.14	10	7.46	8	6.58
8	6.95	8	8.14	8	6.77	8	5.76
13	7.58	13	8.74	13	12.74	8	7.71
9	8.81	9	8.77	9	7.11	8	8.84
11	8.33	11	9.26	11	7.81	8	8.47
14	9.96	14	8.1	14	8.84	8	7.04
6	7.24	6	6.13	6	6.08	8	5.25
4	4.26	4	3.1	4	5.39	8       5.56
12	10.84	12	9.13	12	8.15	8	7.91
7	4.82	7	7.26	7	6.42	8	6.89
5	5.68	5	4.74	5	5.73	19	12.5

(0.2) Page 6 of the handout for Lecture 2 shows four plots, illustrating the effect of standardizing the classwork and final exam scores before fitting the least squares line. The four steps correspond to:

  1. subtract the mean from the final exam scores
  2. then divide final-mean(final) by its standard deviation
  3. then subtract the mean from the classwork scores
  4. then divide classwork-mean(classwork) by its standard deviation
After the four steps, both variables are standardized.

Carry out these four steps, calculating the equation of the least squares line at each step. Explain the change in the coefficients (that is, the pair of constants that give the intercept and slope of the least squares line) from each step to the next.


Each Section will assign extra problems.