Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. We would write H0: there is no difference between the two drugs on average.
The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to establish. For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new drug has a different effect, on average, compared to that of the current drug. We would write Ha: the two drugs have different effects, on average. The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In this case we would write Ha: the new drug is better than the current drug, on average.
The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude "reject Ha", or even "accept Ha".
If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.
(Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Hypotheses are always stated in terms of population parameter, such as the mean . An alternative hypothesis may be one-sided or two-sided. A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis. A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis -- the direction does not matter.
Hypotheses for a one-sided test for a population mean take the following form:
H0: = k
Ha: > k
or
H0: = k
Ha: < k.
Hypotheses for a two-sided test for a population mean take the following form:
H0: = k
Ha: k.
A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
The null hypothesis H0 claims that there is no difference between the mean score for female students and the mean for the entire population, so that = 70. The alternative hypothesis claims that the mean for female students is higher than the entire student population mean, so that > 70.
The test statistic follows the standard normal distribution (with mean = 0 and standard deviation
= 1). The test statistic z is used to compute the
P-value for the standard normal distribution, the probability that a value at least as
extreme as the test statistic would be observed under the null hypothesis. Given the null
hypothesis that the population mean is equal to a given
value 0, the P-values for testing H0
against each of the possible alternative hypotheses are:
P(Z > z) for Ha: >
0
P(Z < z) for Ha: <
0
2P(Z>|z|) for Ha: 0.
The probability is doubled for the two-sided test, since the two-sided alternative hypothesis considers the possibility of observing extreme values on either tail of the normal distribution.
In a one-sided test, corresponds to the critical value z* such that P(Z > z*) = . For example, if the desired significance level for a result is 0.05, the corresponding value for z must be greater than or equal to z* = 1.645 (or less than or equal to -1.645 for a one-sided alternative claiming that the mean is less than the null hypothesis). For a two-sided test, we are interested in the probability that 2P(Z > z*) = , so the critical value z* corresponds to the /2 significance level. To achieve a significance level of 0.05 for a two-sided test, the absolute value of the test statistic (|z|) must be greater than or equal to the critical value 1.96 (which corresponds to the level 0.025 for a one-sided test).
Another interpretation of the significance level , based in decision theory, is that corresponds to the value for which one chooses to reject or accept the null hypothesis H0. In the above example, the value 0.0082 would result in rejection of the null hypothesis at the 0.01 level. The probability that this is a mistake -- that, in fact, the null hypothesis is true given the z-statistic -- is less than 0.01. In decision theory, this is known as a Type I error. The probability of a Type I error is equal to the significance level , and the probability of rejecting the null hypothesis when it is in fact false (a correct decision) is equal to 1 - . To minimize the probability of Type I error, the significance level is generally chosen to be small.
Since the pharmaceutical company is interested in any difference from the mean recovery time for all individuals, the alternative hypothesis Ha is two-sided: 30. The test statistic is calculated to be z = (28.5 - 30)/(8/sqrt(100)) = -1.5/0.8 = -1.875. The P-value for this statistic is 2P(Z > 1.875) = 2(1 - P((Z < 1.875) = 2(1- 0.9693) = 2(0.0307) = 0.0614. This is not significant at the 0.05 level, although it is significant at the 0.1 level.
For claims about a population mean from a population with a normal distribution or for any sample with large sample size n (for which the sample mean will follow a normal distribution by the Central Limit Theorem) with unknown standard deviation, the appropriate significance test is known as the t-test, where the test statistic is defined as t = .
The test statistic follows the t distribution with n-1 degrees of freedom. The test statistic z is used to compute the P-value for the t distribution, the probability that a value at least as extreme as the test statistic would be observed under the null hypothesis.
The dataset "Normal Body Temperature, Gender, and Heart Rate" contains 130 observations of body temperature, along with the gender of each individual and his or her heart rate. Using the MINITAB "DESCRIBE" command provides the following information:
Descriptive Statistics Variable N Mean Median Tr Mean StDev SE Mean TEMP 130 98.249 98.300 98.253 0.733 0.064 Variable Min Max Q1 Q3 TEMP 96.300 100.800 97.800 98.700Since the normal body temperature is generally assumed to be 98.6 degrees Fahrenheit, one can use the data to test the following one-sided hypothesis:
H0: = 98.6 vs
Ha: < 98.6.
The t test statistic is equal to (98.249 - 98.6)/0.064 = -0.351/0.064 = -5.48. P(t< -5.48) = P(t> 5.48). The t distribution with 129 degrees of freedom may be approximated by the t distribution with 100 degrees of freedom (found in Table E in Moore and McCabe), where P(t> 5.48) is less than 0.0005. This result is significant at the 0.01 level and beyond, indicating that the null hypotheses can be rejected with confidence.
To perform this t-test in MINITAB, the "TTEST" command with the "ALTERNATIVE" subcommand may be applied as follows:
MTB > ttest mu = 98.6 c1; SUBC > alt= -1. T-Test of the Mean Test of mu = 98.6000 vs mu < 98.6000 Variable N Mean StDev SE Mean T P TEMP 130 98.2492 0.7332 0.0643 -5.45 0.0000These results represents the exact calculations for the t(129) distribution.
Data source: Data presented in Mackowiak, P.A., Wasserman, S.S., and Levine, M.M. (1992), "A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich," Journal of the American Medical Association, 268, 1578-1580. Dataset available through the JSE Dataset Archive.
Analysis of data from a matched pairs experiment compares the two measurements by subtracting one from the other and basing test hypotheses upon the differences. Usually, the null hypothesis H0 assumes that that the mean of these differences is equal to 0, while the alternative hypothesis Ha claims that the mean of the differences is not equal to zero (the alternative hypothesis may be one- or two-sided, depending on the experiment). Using the differences between the paired measurements as single observations, the standard t procedures with n-1 degrees of freedom are followed as above.
In MINITAB, subtracting the air-filled measurement from the helium-filled measurement for each trial and applying the "DESCRIBE" command to the resulting differences gives the following results:
Descriptive Statistics Variable N Mean Median Tr Mean StDev SE Mean Hel. - Air 39 0.46 1.00 0.40 6.87 1.10 Variable Min Max Q1 Q3 Hel. - Air -14.00 17.00 -2.00 4.00Using MINITAB to perform a t-test of the null hypothesis H0: = 0 vs Ha: > 0 gives the following analysis:
T-Test of the Mean Test of mu = 0.00 vs mu > 0.00 Variable N Mean StDev SE Mean T P Hel. - A 39 0.46 6.87 1.10 0.42 0.34The P-Value of 0.34 indicates that this result is not significant at any acceptable level. A 95% confidence interval for the t-distribution with 38 degrees of freedom for the difference in measurements is (-1.76, 2.69), computed using the MINITAB "TINTERVAL" command.
Data source: Lafferty, M.B. (1993), "OSU scientists get a kick out of sports controversy," The Columbus Dispatch (November 21, 1993), B7. Dataset available through the Statlib Data and Story Library (DASL).
To perform a sign test on matched pairs data, take the difference between the two measurements in each pair and count the number of non-zero differences n. Of these, count the number of positive differences X. Determine the probability of observing X positive differences for a B(n,1/2) distribution, and use this probability as a P-value for the null hypothesis.