Hypothesis testing can be a difficult subject to introduce to students who are taking introductory courses in probability and statistics. Several articles in this journal have suggested ways of overcoming the problems which students can face (Reeves and Brewer, 1980; Johnson, 1981; Birnbaum, 1982). One of the prime difficulties is that of getting over to students the essential asymmetry of a hypothesis test, when conducted according to Neyman-Pearson principles as a decision procedure. There is an important difference in logical status between the null or test hypothesis (H0) and the alternative hypothesis (H1), which is mirrored in the logically different test outcomes "reject H0" and "do not reject H0" (see Reeves and Brewer, 1980).
A full understanding of this asymmetry and its implications can only be achieved if the notion of the "power" of a test or an equivalent concept—is introduced. Confidence intervals carry with them an indication of the "sensitivity" of the accompanying inference, in the form of their width, but hypothesis tests do not, at least not explicitly, unless power calculations are carried out. However, power calculations tend to be difficult except for the simplest statistical models, and even here they are commonly tedious. These presumably are the reasons why, despite the importance of the power concept, elementary statistics texts typically dismiss the subject in at most two or three pages, with an illustrative power calculation and a diagram of a power curve.
Since many of the models for inference treated in introductory statistics courses involve Gaussian sampling distributions, the construction and presentation of the associated power curves can be much simplified by the use of normal probability graph paper. For a one-sided hypothesis test based on a Gaussian sampling distribution (textbooks usually present power calculations for one-sided rather than two-sided tests because of their rather greater simplicity), the power curve takes the form of a Gaussian cumulative distribution that extends from a probability value of a (the size of the test) to a probability value of 1. Hence the power curve will be a straight line when plotted on normal probability graph paper.
Figure 1 shows the customary power curve, plotted on ordinary arithmetic graph paper, for the very simple case of testing a population mean p when the population variance s 2is known, and the sampling distribution of the sample mean m for a sample of size n is N(µ,s 2/n). The test is one-sided, size 0.05, with:
and s2 = 50, n = 35. On the diagram, µ0 represents the null hypothesis value of µ for which the power of the test is at a minimum, and mc (= 75 + 1.65 sqrt(50/35) = 77) represents the "critical" value of the sample mean, beyond which H0 is rejected at the 5% level. In order to draw this diagram at all accurately, it is necessary to calculate the power for a series of values of µ under the alternative hypothesis H1.
On normal probability paper, the plot is a straight line, as shown in Figure 2. Two points only suffice to locate the line, and if they are chosen wisely, little calculation is involved. For example, power = a when µ = µ0, and power = 1/2 when µ under the alternative hypothesis coincides with mc, the critical test value (these two points are marked with crosses on Figure 2).
The great ease of constructing power curves on normal probability paper (so long as the sampling distribution is Gaussian) makes it relatively simple and quick to investigate the effects on power of different sample sizes and different sizes of test (see Figure 3). The case of a two-sided test is not quite so convenient, but even here straight line constructions can be useful. For a two-sided test of size a , a construction based on two one-sided tests each of size a /2 will give a good
approximation of the power except at values of µ which are very close to µ0(see Figure 4). In this region the straight line constructions underestimate the power by a factor of up to 2.
The University College at Buckingham
Birnbaum, I. (1982). Interpreting statistical significance, Teaching Statistics, 4(1), 24—26. Johnson, L. W. (1981). Teaching hypothesis testing as a six step process, Teaching Statistics, 3(2), 47—49.
Reeves, C. A. and Brewer, I. K. (1980). Hypothesis testing and proof by contradiction: an analogy, Teaching Statistics, 2(2), 57—59.
Back to top
to contents of The Best of Teaching Statistics
Back to main Teaching Statistics page