Have you ever wanted to do a quick sanity check on a long list of numbers? It might be a budget, worldwide sales by country or product, or a marketing forecast. There is a cute little trick that can possibly tell you if the numbers might be manufactured instead of real: Benford’s Law.
Benford’s Law, which is not really a “law of nature” but the result of more than 125 years of observation, states that the first digit of many real-life sets of numerical data is more likely to be a “1” then any other first digit, and the probability gets successively smaller for “2” through “9”. Intuitively, one might expect that the probability of the first digit would be evenly spread: about 11% for each possible first digit 1 through 9. Zero doesn’t count as a first digit in this case. The law works even with a set of numbers with vastly differently sized numbers based on the number of digits in the number. In fact, the more orders of magnitude covered by the data, the more accurately Benford’s Law seems to apply.
In other words, a list that spans numbers as small as 100,000 and as large as billions is likely to follow the law closely. For example, this chart shows how closely the population of the 237 countries in the world (red bars) match Benford’s Law (the black dots).
The American astronomer Simon Newcomb published a paper in 1881 based on the fact that in his logarithm tables the earlier pages were much more worn than the other pages, implying that he was looking up numbers starting with 1 and 2 more often than others. If you have no idea what I’m even talking about, check this out. He postulated the formula in Benford’s law for first digits of 1 and 2. In 1938, physicist Frank Benford tested the theory on twenty different sets of numbers and was thus credited with the law. His data sets included the surface areas of 335 rivers, the sizes of 3,259 US populations, 1,800 molecular weights, and 308 numbers contained in an issue of Reader’s Digest.
Benford’s Law is not a law, and will not apply to sets of numbers that are restricted in value, like the phone numbers in Philadelphia (since almost all will start with 2, 4, or 6). A set of numbers that does not match Benford’s Law is not necessarily wrong, but might be worth a second look. If someone is manufacturing numbers, they are likely to not match Benford’s Laws.
Why does this law work? It has to do with the distribution of numbers in a logarithm scale, and explains why the wear on Simon Newcomb’s logarithm tables led to his initial discovery of the relationship.
Some relationships do not obey Benford’s Lw, including distributions created from square roots or reciprocals. It does not apply to numbers that are the result of mathematics combinations, like quantity times price, or sequentially assigned numbers like check numbers.
At various times, evidence based on Benford’s Law has been admitted in criminal cases at US local, state and federal levels. It has been used as evidence of fraud in the 2009 Iranian elections, although experts tend to discount Benford’s Law as a indicator or election fraud.
Mark Nigrini, a well-known South African author of Forensic Analytics, has shown that Benford’s Law could be used in forensic accounting and auditing, which is how this post started.
The last word:
As I was talking about this post, my wife said that this law should also apply to the number of children in a family. In her genealogical research, it appeared to her that there are a lot of families with just a few children and, especially in the past, families with large number of children, more than 9. I could not find any overall statistics to support or deny this claim; most government statistics talk about 1, 2, and “3 or more” children. However, I did find one family tree that had the statistics I wanted covering 344 families with up to 15 children in a family.
Keep your sense of humor.