(0.1) Problem 2.60 from Moore & McCabe:

Table 2.10 presents four sets of data prepared by the statistician Frank Anscombe to illustrate the dangers of calculating without first plotting the data (Frank J. Anscombe, "Graphs in statistical analysis," The American Statistician, 27 (1973), pp. 17-21.) (a) Without making scatterplots, find the correlation and the least-squares regression line for all four data sets. What do you notice? Use the regression line to predict y for x = 10. (b) Make a scatter plot for each of the data sets and add the regression line to each plot. (c) In which of the four cases would you be willing to use the regression line to describe the dependence of y on x? Explain your answer in each case.These data were created artificially by Anscombe to make an important point about the dangers of blind calculation without plotting.

Add an extra part to the question:

(d) For each dataset, make a plot of residuals against fitted values. Such plots are one standard way of identifying bad fits. Explain how these plots could help you with your answers to part (c).

Here are the data from Table 2.10:

x1 y1 x2 y2 x3 y3 x4 y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 8 5.56 12 10.84 12 9.13 12 8.15 8 7.91 7 4.82 7 7.26 7 6.42 8 6.89 5 5.68 5 4.74 5 5.73 19 12.5

(0.2) Page 6 of the handout for Lecture 2 shows four plots, illustrating the effect of standardizing the classwork and final exam scores before fitting the least squares line. The four steps correspond to:

- subtract the mean from the final exam scores
- then divide final-mean(final) by its standard deviation
- then subtract the mean from the classwork scores
- then divide classwork-mean(classwork) by its standard deviation

Carry out these four steps, calculating the equation of the least squares line at each step. Explain the change in the coefficients (that is, the pair of constants that give the intercept and slope of the least squares line) from each step to the next.

Each Section will assign extra problems.