# Correlation

The strength of the linear association between two variables is quantified by the *correlation
coefficient*.

Given a set of observations (*x*_{1}, y_{1}),
(*x*_{2},y_{2}),...(*x*_{n},y_{n}), the formula for
computing the correlation coefficient is given by

The correlation coefficient always takes a value between -1 and 1, with 1 or -1 indicating perfect
correlation (all points would lie along a straight line in this case). A positive correlation
indicates a positive association between the variables (increasing values in one variable correspond
to increasing values in the other variable), while a negative correlation indicates a negative
association between the variables (increasing values is one variable correspond to decreasing values
in the other variable). A correlation value close to 0 indicates no association between the
variables.

Since the formula for calculating the correlation coefficient standardizes the variables, changes
in scale or units of measurement will not affect its value. For this reason, the correlation
coefficient is often more useful than a graphical depiction in determining the strength of the
association between two variables.

## Correlation in Linear Regression

The square of the correlation coefficient, *r²*, is a useful value in linear regression.
This value represents the fraction of the variation in one variable that may be explained by the
other variable. Thus, if a correlation of 0.8 is observed between two variables (say, height and
weight, for example), then a linear regression model attempting to explain *either* variable
in terms of the other variable will account for 64% of the variability in the data.
The correlation coefficient also relates directly to the regression line *Y = a + bX*
for any two variables, where .

Because the least-squares regression line will always pass through the means of x and y, the regression line may be entirely described by the
means, standard deviations, and correlation of the two variables under investigation.

For some good examples of studies using correlation, see the
correlation index in the Statlib Data and Story Library (DASL).