Comparison of Two Means
In many cases, a researcher is interesting in gathering information about
two populations in order to compare them. As in statistical inference for
one population parameter, confidence intervals
and tests of significance are useful statistical
tools for the difference between two population parameters.
Confidence Interval for the Difference Between Two Means
A confidence interval for the difference between two means specifies a
range of values within which the difference between the means of the
two populations may lie. These intervals may be calculated by, for example,
a producer who wishes to estimate the difference in mean daily output
from two machines; a medical researcher who wishes to estimate the
difference in mean response by patients who are receiving two
different drugs; etc.
The confidence interval for the difference between two means contains
all the values of (
-
)
(the difference between the two population means) which would not be
rejected in the two-sided hypothesis test of
H0:
=
against
Ha:
, i.e.
H0:
-
= 0 against
Ha:
-
0.
If the confidence interval includes 0 we can say that there is no significant difference
between the means of the two populations, at a given level of confidence.
(Definition taken from Valerie J. Easton and John H. McColl's
Statistics Glossary v1.1)
Tests of Significance for Two Unknown Means and Known Standard Deviations
Given samples from two normal populations of size n1 and n2 with
unknown means
and
and
known standard deviations
and
, the test statistic comparing the means is known as
the two-sample z statistic
which has the standard normal distribution (N(0,1)).
The null hypothesis always assumes that the means are equal, while the alternative hypothesis
may be one-sided or two-sided.
Tests of Significance for Two Unknown Means and Unknown Standard Deviations
In general, the population standard deviations are not known, and are estimated by the
calculated values s1 and s2. In this case, the test statistic
is defined by the two-sample t statistic
.
Although the two-sample statistic does not exactly follow the t distribution (since
two standard deviations are estimated in the statistic), conservative P-values may be
obtained using the t(k) distribution where k represents the smaller of
n1-1 and n2-1. Another option is to estimate the degrees of
freedom via a calculation from the data, which is the general method used by statistical software
such as MINITAB.
The confidence interval for the difference in means
-
is given by
where t* is the upper (1-C)/2 critical value for the t distribution
with k degrees of freedom (with k equal to either the smaller of n1-1 and
n1-2 or the calculated degrees of freedom).
Example
The dataset "Normal Body Temperature, Gender, and Heart Rate" contains 130 observations of
body temperature, along with the gender of each individual and his or her heart rate. In the
dataset, the first column gives body temperature and the second column gives the value "1" (male)
or "2" (female) to describe the gender of each subject. Using the
MINITAB "DESCRIBE" command with the "BY" subcommand to separate the two genders provides the
following information:
Descriptive Statistics
Variable C2 N Mean Median Tr Mean StDev SE Mean
C1 1 65 98.105 98.100 98.114 0.699 0.087
2 65 98.394 98.400 98.390 0.743 0.092
Variable C2 Min Max Q1 Q3
C1 1 96.300 99.500 97.600 98.600
2 96.400 100.800 98.000 98.800
Is there a significant difference between the mean body temperatures for men and women?
To test H0:
-
= 0 against
Ha:
-
0,
compute the test statistic (98.105 - 98.394)/(sqrt(0.699²/65 + 0.743²/65))
= -0.289/0.127 = -2.276. Using the t(64) distribution, estimated in Table E in Moore
and McCabe by the t(60) distribution, we see that 2P(t>2.276) is between 0.04 and 0.02, indicating
a significant difference between the means at the 0.05 level (although not at the 0.01 level).
To compute a 95% confidence interval, we first note that the 0.025 critical value t*
for the t(60) distribution is 2.000, giving the interval ((98.105 - 98.394) +
2.000*0.127) = (-0.289 - 0.254, -0.289 + 0.254) = (-0.543, -0.045). The value 0 is not included in
the interval, again indicating a significant difference at the 0.05 level.
Performing this test in MINITAB using the "TWOT" command gives the results
Two Sample T-Test and Confidence Interval
Two sample T for C1
C2 N Mean StDev SE Mean
1 65 98.105 0.699 0.087
2 65 98.394 0.743 0.092
95% CI for mu (1) - mu (2): ( -0.540, -0.039)
T-Test mu (1) = mu (2) (vs not =): T= -2.29 P=0.024 DF= 127
Although the MINITAB calculated degrees of freedom (127) are much higher than the conservative
estimate of 64, we see that the results are much the same.
Data source: Data presented in Mackowiak, P.A., Wasserman, S.S., and Levine, M.M. (1992),
"A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and
Other Legacies of Carl Reinhold August Wunderlich," Journal of the American Medical
Association, 268, 1578-1580. Dataset available through the
JSE Dataset Archive.
Pooled t Procedures
If it reasonable to assume that two populations have the same standard deviation, than an
alternative procedure known as the pooled t procedure may be used instead of the
general two-sample t procedure. Since only one standard deviation is to be estimated in
this case, the resulting test statistic will exactly follow a t distribution with
n1 + n2 - 2 degrees of freedom. The pooled estimator of
the variance
is used in the pooled two-sample
t statistic
which has a t(n1 +
n2 -2) distribution.
Example
In the body temperature example above, the sample standard deviations for the male and female
subjects are reasonable close. Using the MINITAB subcommand "POOLED" with the two-sample
t test gives the following results:
Two Sample T-Test and Confidence Interval
Two sample T for C1
C2 N Mean StDev SE Mean
1 65 98.105 0.699 0.087
2 65 98.394 0.743 0.092
95% CI for mu (1) - mu (2): ( -0.540, -0.039)
T-Test mu (1) = mu (2) (vs not =): T= -2.29 P=0.024 DF= 128
Both use Pooled StDev = 0.721
The test results were nearly identical in this case.