height in inches weight in pounds distance in cm (to pulmonary artery) 42.8 40.0 37.0 63.5 93.5 49.5 37.5 35.5 34.5 39.5 30.0 36.0 45.5 52.0 43.0 38.5 17.0 28.0 43.0 38.5 37.0 22.5 8.5 20.0 37.0 33.0 33.5 23.5 9.5 30.5 33.0 21.0 38.5 58.0 79.0 47.0EITHER:
Use plot() and pairs() to get a rough idea of how the variables are related. What do you see?
lm(dist~height,cath)Look at what gets written to the screen. You will soon understand what it is saying. Be prepared to explain what is returned to either BM or DP.
The function lm() fits a linear model by least squares. The second
argument, cath, tells lm that the variables dist and height are
components of the dataframe cath. The notation "dist~height" means
that dist is to be predicted by a linear function of height: that is,
the lm() function is finding constants c1 and c2 for which the sum of
squared residuals
is as small as possible. Note the presence of the "intercept" term c1; Splus includes it, by default. The "model formula"![]()
dist ~ height -1would force Splus to omit the intercept term.
Splus calls the minimizing constants coefficients. The corresponding value c1+c2*height is called the vector of fitted values. The remainder, dist - c1 -c2*height, is called the vector of residuals (or residual values).
Try
lm(dist~height,cath) reg1 <- lm(dist~height,cath) print(reg1)What does the output tell you about the lm object reg1?
Look at the attributes of reg1. Try to figure out the meaning of as many of the components of reg1 as you can. In particular, try to figure out how the "residual standard error" is calculated. Hint: ?lm ?lm.object ?print.lm ?summary (If you have studied regression before you might even try to figure out what anova(reg1) does.)
plot(reg1)then try
plot(cath$height,cath$dist) abline(reg1)What happens? What is being plotted in each case? Where is Splus finding the necessary information?
residuals/(residual.standard.error * sqrt(1-hat))Write a function that will accept an lm object as its argument and then draw a plot of standardized residuals versus fitted values.
dist ~ height + weighttells Splus to predict dist as a linear combination c1 + c2*height + c3*weight. Fit such a model, saving the output in an object reg2. Does the added predictor variable weight improve the fit much?
lm(Mileage ~ Weight)Figure out how to get rid of the problem with missing values. (?lm) Look at the output that is produced when you finally get lm() to work. What do you learn and see?