Learning from One’s Mistakes
A Lesson in Standard Errors
R. D. LEE

Not all practical work turns out as expected. The good teacher should be prepared to help his pupils learn from mistakes.

This note concerns an exercise that was given to a group of 30 A-level biology teachers to illustrate the concept of Standard Error. The mistakes implicit in the statement of the exercise, explicit once the exercise had been attempted, serve to highlight the effect of sampling without replacement in small populations and lead subsequently to consideration of the relative importance of sample and population size.

Exercise
(a) Find the mean and standard deviation of the numbers 1, 2, 3, 4, 5 and 6.
(b) Divide yourselves into groups of six; in each group each person should take a different sample of five numbers from the six in (a) and find the mean  of those five numbers.
(c) As a group tabulate the six different values of  found by the members of your group.
(d) Find the mean and standard deviation of the values of  tabulated in (c) and verify that
        (i) mean in part (d) = mean in part (a) and
        (ii) standard deviation in part (d) = [standard deviation in part (a)]/

This exercise was designed to illustrate, with as little arithmetic as possible, a series of important steps necessary to the understanding of standard errors, namely that the mean  varies according to which sample is chosen, and therefore gives rise to a new distribution, the sampling distribution or distribution of  as tabulated in (c), different from the original distribution. This distribution of has itself got a mean and standard deviation and it is this standard deviation which is called the Standard Error. The relationship between the mean and standard deviation of the distribution of  and the mean and standard deviation of the original set of numbers 1 to 6 namely = µ and  were to be illustrated by part (d).

When the group had finished its work we looked at the answers.
(a) Mean 3.5, Standard Deviation 1.87.
(b) and (c)
Sample 
1, 2, 3, 4, 5,         3.0
1, 2, 3, 4, 6         3.2
1, 2, 3, 5, 6         3.4
1, 2, 4, 5, 6         3.6
1, 3, 4, 5, 6         3.8
2, 3, 4, 5, 6         4.0

(d) Mean of the values of  = 3.5 = Mean in part (a)

Standard Deviation of the values of  = 0.37
does not equal [Standard Deviation in part (a)]/

Why then was  not the same as s/? The relationship   = s/ is anyway obviously incorrect for if we extended the sample from 5 to 6 numbers thus giving only one possible sample, hence no variation from sample to sample,  = 0, not s/. The relationship  = s/ is so basic to the concept of Standard Error where can the error be?  tThe only thing unusual about our exercise was that we had sampled from a small population, the numbers 1 to 6, so that we could look at all possible samples!

The Error Explained

The mistake is either blindingly obvious or totally invisible depending on which way you happen to be looking at the problem. In my case I had to re-examine the proof of  to find my error. This fundamental relationship applies when  refers to the standard deviation of the means of all possible independent random samples of size n. Thus in choosing the n elements of the sample the choices must be independent. With our population consisting of the numbers 1 to 6 this implies sampling with replacement so that there are not simply 6 different samples of size 5 but numbers may be repeated in the sample.

This may seem strange to biologists, for suppose I am taking a random sample of, say, kilometre squares in which to count the number of dead elms I am hardly likely to allow a sample which includes the same square twice. So if in practice we sample without replacement and still use the relationship s/ to calculate the Standard Error, a relationship which applies to sampling with replacement, how great is the inaccuracy?

The Importance of Sample Size

Let N be the population size and n the sample size. Then if  is the mean of a sample taken without replacement it can be shown that the variance of  is

as opposed to s2/n for sampling with replacement. Since

if n/N is small s2/n is a good approximation to the variance of .

In our exercise where N = 6, n =5, the formula gives the variance of  to be s2/25 and so the standard deviation of  should be s2/5 rather than s/ as we previously suggested - the calculations confirm this. The "sampling fraction" n/N was in our case 5/6 and thus the approximation was poor.

The importance of the standard error, or standard deviation of the distribution of , is that it is a measure of the accuracy of using the sample mean as an estimate of the population mean. From the formula

  = 
it is clear that the sample size n is the principle influence as long as N is reasonably large. Thus suppose there are two populations with 1000 and 100000 individuals respectively and a sample of 10 is to be drawn from each. The factor 1/n = 1/10 is the same for each whilst for the population of 1000, (N - n)/(N - 1) = 990/999 = 0.99, to 2 significant figures, and for N= 10000, (N - n)/(N - 1) = 99990/99999= 1.00. So although the population is 100 times greater in the second case the variance of the sample mean is increased by only 1 per cent.

Thus it is the size of the sample which is important rather than the fraction of the population sampled. The intuitive notion that to sample a ten acre field, for buttercups say, we ought to take ten times as large a sample as we would for a 1 acre field is therefore false.

Back to Contents of The Best of Teaching Statistics
Home
Back to main Teaching Statistics Page