For example, suppose the random variable X records a randomly selected student's score on a national test, where the population distribution for the score is normal with mean 70 and standard deviation 5 (N(70,5)). Given a simple random sample (SRS) of 200 students, the distribution of the sample mean score has mean 70 and standard deviation 5/sqrt(200) = 5/14.14 = 0.35.
This result follows from the fact that any linear combination of independent normal random variables is also normally distributed. This means that for two independent normal random variables X and Y and any constants a and b, aX + bY will be normally distributed. In the case of the sample mean, the linear combination is = (1/n)*(X1 + X2 + ... Xn).
For example, consider the distributions of yearly average test scores on a
national test in two areas of the country. In the first area, the
test score X is normally distributed with mean 70 and standard
deviation 5. In the second area, the yearly average test score Y is
normally distributed with mean 65 and standard deviation 8. The difference
X - Y between the two areas is normally distributed, with
mean 70-65 = 5 and variance 5² + 8² = 25 + 64 = 89. The standard
deviation is the square root of the variance, 9.43. The probability that
area X will have a higher score than area Y may be calculated
P(X > Y) = P(X - Y > 0)
= P(((X - Y) - 5)/9.43 > (0 - 5)/9.43)
= P(Z > -0.53) = 1 - P(Z < -0.53) = 1 - 0.2981 = 0.7019.
Area X will have a higher average score than area Y about 70% of the time.
A formal statement of the Central Limit Theorem is the following:
If is the mean of a random sample X1, X2, ... , Xn of size n from a distribution with a finite mean and a finite positive variance ², then the distribution of W = is N(0,1) in the limit as n approaches infinity.
This means that the variable is distributed N(,).
One well-known application of this theorem is the normal approximation to the binomial distribution.
Descriptive Statistics Variable N Mean Median Tr Mean StDev SE Mean C101 50 0.49478 0.49436 0.49450 0.02548 0.00360 Variable Min Max Q1 Q3 C101 0.43233 0.55343 0.47443 0.51216The mean 0.49 is nearly equal to the population mean 0.5. The desired value for the standard deviation is the population standard deviation divided by the square root of the size of the sample (which is 10 in this case), approximately 0.3/10 = 0.03. The calculated value for this sample is 0.025. To evaluate the normality of the sample mean data, I used the "NSCORES" and "PLOT" commands to create a normal quantile plot of the data, shown below.
The plot indicates that the data follow an approximately normal distribution, lying close to a diagonal line through the main body of the points.