*A full understanding of correlation requires an appreciation of bivariate
distributions, but increasingly rank correlation coeffjicients are being
used as a measure of agreement with pupils for whom such appreciation is
not possible. How can we justify the formula used?*

Although the formula for Spearman’s Coefficient of Rank Correlation
is being increasingly used in school courses in Geography and other subjects,
thc justification for its use is rarely available. That the Spearman formula
is the result of finding the product moment correlation for the ranks,
although bestowing some credibility on the formula, is not helpful since
the product moment coefficient is not usually known at this level. The
Schools Council publication *Mathematics across the Curriculum *(Blackie)
remarks (p. 104) "Kendall’s coefficient has an advantage for teaching purposes
over Spearman’s, in that it is more easily explained as a reasonable measure".
Whatever the validity of that remark Spearman’s coefficient is the one
which is commonly used and in this article I try to explain how its algebraic
structure arises.

Assuming that readers understand the principle of ranking, I propose
that it is desirable that any coefficient of correlation should both give
an indication of the extent to which two sets of ranks differ (or agree)
and also should be standardised so as to be consistent with other measures
of correlation in that its range should be between -1 and **+**1.

**An Example**

Suppose we measure two characteristics *A *and *B *of eight
towns. Let *A *be the density of public houses and *B *the density
of places of worship, in each case given as the number per 10000 of the
population.

Town | P | Q | R | S | T | U | V | W |

41 | 36 | 26 | 45 | 48 | 35 | 51 | 43 | |

22 | 7 | 14 | 21 | 13 | 11 | 17 | 20 |

Town | P | Q | R | S | T | U | V | W |

Rank of A |
4 | 3 | 1 | 6 | 7 | 2 | 8 | 5 |

Rank of B |
8 | 1 | 4 | 7 | 3 | 2 | 5 | 6 |

Town | R | U | Q | P | W | S | T | V |

Rank of A | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

Rank of B |
4 | 2 | 1 | 8 | 6 | 7 | 3 | 5 |

Thus

S*d ^{2 }*= 3

(It is reasonable to ask whether other treatments of the differences
in ranks could provide a suitable coefficient, e.g. the sum of the absolute
values ·S|d|*, *but that is another
article.)

In general this measure is small when there is a high agreement between
the ranks and only for complete agreement does it take its minimum value
(it is obvious that *· *S*d ^{2
}*cannot be negative and that 0 will be its smallest value).

Rank of A | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

Rank of B | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

This measure is large when there is a high disagreement between the ranks and only for complete disagreement does it take its maximum value (this is not obvious although intuitively reasonable).

Rank of A | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

Rank of B | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |

Thus our coefficient does seem to discriminate between different degrees of agreement by taking values in the range 0 to 168.

We can use this approach to generate Spearman’s formula for *n *pairs
of values but first we need to calculate the maximum value of ·S*d ^{2
}*which, as we have seen, occurs when there is complete disagreement.

Rank of A | 1 | 2 | 3 | ... | ... | ... | n-1 | n |

Rank of B | n | n-1 | n-2 | ... | ... | ... | 2 | 1 |

*= (n ^{3} — n)/3*

Standardisation takes place as follows:

Back to Contents of *The Best of Teaching Statistics*

Home

Back to main *Teaching
Statistics *Page