Data structures. Regression and model fitting. Manipulation of lm objects.
42.8 40.0 37.0 63.5 93.5 49.5 37.5 35.5 34.5 39.5 30.0 36.0 45.5 52.0 43.0 38.5 17.0 28.0 43.0 38.5 37.0 22.5 8.5 20.0 37.0 33.0 33.5 23.5 9.5 30.5 33.0 21.0 38.5 58.0 79.0 47.0Task: Enter the data into Splus (there are two ways that we have available)
EITHER:
Copy the data to a file, then use read.table() to create a data frame called cath (short for catheter), with the column names height, weight, dist.
OR:
Attach the ricedata section of the library to your searchpath and look for the dataframe cath (if you can't remember how, check. the class FAQ on libraries
Use plot() and pairs() to get a rough idea of how the variables are related. What do you see?
lm(dist~height,cath)Look at what gets written to the screen. Do you understand what it is saying? Be prepared to explain what is returned to either BM or JH.
The function lm() fits a linear model by least squares. The second argument of the function, cath, tells lm that the variables dist and height are components of the dataframe cath.
The notation "dist~height" means that dist is to be predicted by a linear function of height: that is, the lm() function is finding constants c1 and c2 for which the sum of squared residuals
is as small as possible.![]()
Note the presence of the "intercept" term c1; Splus includes it, by default.
The "model formula"
dist ~ height -1would force Splus to omit the intercept term.
Splus calls the minimizing constants coefficients. The corresponding value c1+c2*height is called the vector of fitted values. The remainder, dist - c1 -c2*height, is called the vector of residuals (or residual values).
Try
lm(dist~height,cath) reg1 <- lm(dist~height,cath) print(reg1)What does the output tell you about the lm object reg1?
Look at the attributes of reg1. Try to figure out the meaning of as many of the components of reg1 as you can. In particular, try to figure out how the "residual standard error" is calculated. Hint: ?lm ?lm.object ?print.lm ?summary (If you have studied regression before you might even try to figure out what anova(reg1) does.)
plot(reg1)then try
plot(cath$height,cath$dist) abline(reg1)What happens? What is being plotted in each case? Where is Splus finding the necessary information?
residuals/(residual.standard.error * sqrt(1-hat))Write a function that will accept an lm object as its argument and then draw a plot of standardized residuals versus fitted values.
dist ~ height + weighttells Splus to predict dist as a linear combination c1 + c2*height + c3*weight. Fit such a model, saving the output in an object reg2. Does the added predictor variable weight improve the fit much?
lm(Mileage ~ Weight)Figure out how to get rid of the problem with missing values. (?lm) Look at the output that is produced when you finally get lm() to work. What do you learn and see?