Since the observed values for *y* vary about their means _{y},
the statistical model includes a term for this variation. In words, the model is expressed
as DATA = FIT + RESIDUAL, where the "FIT" term represents the expression _{0} +
_{1}*x*.
The "RESIDUAL" term represents the deviations of the observed values *y* from their
means _{y}, which are normally distributed with mean
0 and variance . The notation for the model deviations is
.

In formal terms, the model for linear regression is the following:

Given *n* pairs of observations (*x _{1}, y_{1}*),
(

In the least-squares model, the best-fitting line for the observed data is calculated by
minimizing the sum of the squares of the
vertical deviations from each data point to the line (if a point lies on the fitted line exactly,
then its vertical deviation is 0). Because the deviations are first squared, then summed, there
are no cancellations between positive and negative values. The least-squares estimates
*b _{0}* and

The computed values for *b _{0}* and

The values fit by the equation *b _{0}* +

The variance ² may be estimated by ** s² =
**, also known as the mean-squared error (or MSE).

The estimate of the standard error

Using the MINITAB "REGRESS" command with "sugar" as an explanatory variable and "rating" as the dependent variable gives the following result:

Regression Analysis The regression equation is Rating = 59.3 - 2.40 SugarsA plot of the data with the regression line added is shown to the right:

After fitting the regression line, it is important to investigate the residuals to determine whether or not they appear to fit the assumption of a normal distribution. A plot of the residuals

The MINITAB output provides a great deal of information. Under the equation for the
regression line, the output provides the least-squares estimate for the constant *b _{0}*
and the slope

Predictor Coef StDev T P Constant 59.284 1.948 30.43 0.000 Sugars -2.4008 0.2373 -10.12 0.000 S = 9.196 R-Sq = 57.7% R-Sq(adj) = 57.1%

**The test statistic t is equal to b_{1}/s_{b1},
the slope parameter estimate
divided by its standard deviation. This value follows a t(n-2) distribution.**

In the example above, the slope parameter estimate is -2.4008 with standard deviation 0.2373. The test statistic is t = -2.4008/0.2373 = -10.12, provided in the "T" column of the MINITAB output. For a two-sided test, the probability of interest is 2

In the example above, a 95% confidence interval for the slope parameter _{1}
is computed to be (-2.4008 __+__ 2.000*0.2373) = (-2.4008 - 0.4746, -2.4008 + 0.4746)
= (-2.8754, -1.9262).

The value for "S" printed in the MINITAB output provides the estimate for the standard deviation
, and the "R-Sq" value is the square of the correlation *r*
written as a percentage value. This indicates the 57.7% of the variability in the cereal ratings
may be explained by the "sugars" variable.

The MINITAB "BRIEF 3" command expands the output provided by the "REGRESS" command to include
the observed values of *x* and *y*, the fitted values
_{y}, the standard deviation of the fitted values (StDev Fit), the residual values,
and the standardized residual values. The table below shows this output for the first 10
observations.

Obs Sugars Rating Fit StDev Fit Residual St Resid 1 6.0 68.40 44.88 1.07 23.52 2.58R 2 8.0 33.98 40.08 1.08 -6.09 -0.67 3 5.0 59.43 47.28 1.14 12.15 1.33 4 0.0 93.70 59.28 1.95 34.42 3.83R 5 8.0 34.38 40.08 1.08 -5.69 -0.62 6 10.0 29.51 35.28 1.28 -5.77 -0.63 7 14.0 33.17 25.67 1.98 7.50 0.84 8 8.0 37.04 40.08 1.08 -3.04 -0.33 9 6.0 49.12 44.88 1.07 4.24 0.46 10 5.0 53.31 47.28 1.14 6.03 0.66To compute a confidence interval for the mean response of an observation, first choose a critical value from the appropriate

The value

*Note:The standard error associated with
a prediction interval is larger than the standard deviation for the mean response, since
the standard error for a predicted value must account for added variability.*

The MINITAB "PREDICT" subcommand computes the predicted response variable and provides 95% confidence limits. Suppose we are interested in predicting the rating for a cereal with a sugar level of 5.5. MINITAB produces the following output:

Fit StDev Fit 95.0% CI 95.0% PI 46.08 1.10 ( 43.89, 48.27) ( 27.63, 64.53)The fitted value 46.08 is simply the value computed when 5.5 is substituted into the equation for the regression line: 59.28 - (5.5*2.40) = 59.28 - 13.20 = 46.08. The value given in the 95.0% CI column is the confidence interval for the mean response, while the value given in the 95.0% PI column is the prediction interval for a future observation.

For additional tests and a continuation of this example, see ANOVA for Regression.