(0.1) Problem 2.60 from Moore & McCabe:
Table 2.10 presents four sets of data prepared by the statistician Frank Anscombe to illustrate the dangers of calculating without first plotting the data (Frank J. Anscombe, "Graphs in statistical analysis," The American Statistician, 27 (1973), pp. 17-21.) (a) Without making scatterplots, find the correlation and the least-squares regression line for all four data sets. What do you notice? Use the regression line to predict y for x = 10. (b) Make a scatter plot for each of the data sets and add the regression line to each plot. (c) In which of the four cases would you be willing to use the regression line to describe the dependence of y on x? Explain your answer in each case.These data were created artificially by Anscombe to make an important point about the dangers of blind calculation without plotting.
(d) For each dataset, make a plot of residuals against fitted values. Such plots are one standard way of identifying bad fits. Explain how these plots could help you with your answers to part (c).
Here are the data from Table 2.10:
x1 y1 x2 y2 x3 y3 x4 y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 8 5.56 12 10.84 12 9.13 12 8.15 8 7.91 7 4.82 7 7.26 7 6.42 8 6.89 5 5.68 5 4.74 5 5.73 19 12.5
(0.2) Page 6 of the handout for Lecture 2 shows four plots, illustrating the effect of standardizing the classwork and final exam scores before fitting the least squares line. The four steps correspond to:
Carry out these four steps, calculating the equation of the least squares line at each step. Explain the change in the coefficients (that is, the pair of constants that give the intercept and slope of the least squares line) from each step to the next.