Benford’s Law!

In most cases, science discoveries start with a question related to an observation, this is true in the world of mathematics too. One interesting example is the story of Benford’s Law or the law of first digits. This law is quite interesting considering that for some people it is pretty much obvious, but for most is very counter-intuitive, as it is based on an observable fact rather than an analytical deduction; and also, because is one of those cases where a discovery is made, then is forgotten and found again.

Benford’s law presents the fact that in general all numbers in a data set (or group of numbers), are more likely to start with lower digits like 1 or 2, than with high digits as 8 or 9. This law presents a probability distribution stating exactly how likely it is for a certain digit to be the first in a given number (Example: the first digit of the value 203,543 is 2). This is applicable for a wide range of values like Country Population, the number of bytes in computer system files, bank account balances, numbers in nature, financial and trade figures, tax returns, weather data, experimental and cosmological data sets, etc.

 

The story.

The first to discover this phenomenon was Simon Newcomb a famous American Astronomer, as he published a paper about this phenomenon in a mathematical journal in the year 1881. But given that it didn’t present all the mathematical background, caused that it was practically forgotten.
Half a century later in 1938, Frank Benford, a General Electric Physicist (is interesting, how this law has far more attention from Physics related scientist than from mathematicians), re-discovered this phenomenon, pretty much in the same way as Newcomb did it. This discovery was due to the observed worn and griminess on logarithm table books (Logarithm is the opposed operation to exponentiation, as subtraction is the opposed to addition). At that time, without the availability of today’s computers, to make the multiplication of big numbers logarithm tables were used to facilitate and speed up calculations since these multiplications can be downgraded to sums of logarithms. These books have a list of logarithm values ranging from 1 to 9, as any number can be reduced to one of these values, let’s say the log of the number 5123 can be obtained as the logarithm of 5.123 (obtained from the book), plus 3, the value for the logarithm base 10 for 1000.
Benford noticed that pages at the beginning of these books, corresponding to the logarithms of low digits, showed signs of more use and tear than the pages at the end of these, corresponding to logarithms for high digits.

Benford's probability

Benford’s probability distribution (wikipedia)

With this into consideration, Benford did an experiment taking approximately 20,000 “natural occurring” numbers from very different sources, ranging from integer’s square roots, numerical constants, as well as some unusual data sets, like big river’s length or all the numbers included in an edition of reader’s digest magazine and he compared the first digit frequency for each data group and for all the data set as a whole, and the probability distributions were similar, having numbers starting with 1 close to one-third of the times, and in a proportion almost six times higher than for numbers starting with 9. The explanation for this distribution although it might seem odd is quite logic.

The best example I found to explain this distribution is shown in an excellent YouTube video, where they explain the case of a very generous (and totally hypothetical), bank, giving a daily 10% interest rate; so, if I start with one Dollar, then I’ll start earning 1.10 the first day, 1.21 the second and 1.33 the third, and will have an amount above of 1 dollar until the eighth day, when you move to 2.14, and from there it starts to increment in higher digits, 2.35, 2.59, etc.., till I reach 10.83, and from there the cycle starts again. now starting again in the teens; 11.92, 13.11, 14.42, etc. where all these numbers start with 1. After several iterations to hundreds or thousands, you will notice that in fact, the values starting with one happen around 30% of the times. Regardless of the number of times we repeat this exercise. This can be repeated with raffle tickets or any other unrestricted sequence of numbers.

 

Well, why the fuss?

Benford’s law is not just a mathematical curiosity; given that it is a demonstrable base of how numbers occur naturally, this law is used to identify cases where is suspected that the numbers aren’t natural; meaning, fraud detection, and is used to analyze data from different operations like stock prices, loan data, tax returns, credit card transactions, customer account balances, inventory prices, etc. When numbers in these records are manipulated they tend to deviate from the probability distribution defined by Benford, and this is an indicator used to do further analysis. And in several cases, this indicator is the trigger to detect fraud.

Fraud identification

Fraud identification using Benford’s Law (isaca.org)

Benford’s law has its limitations too, it works on natural and non-restricted data sets; therefore, for cases of small data sets (less than 500 values), calculated values like phone numbers, or sets with determined maximum or minimum values as people’s height, this law will not work. But for any other case, this law can be applied. You can revise several examples of this distribution in the site Testing Benford’s law, where several examples are presented.  

So, for those interested in “cooking books”, you better not, and pay attention to this law, remember “first digits rule!!”.

Regards, Alex – Science Kindle.

 

Subscribe to ScienceKindle!
I agree to have my personal information transfered to MailChimp ( more information )
Join my newsletter and be part of the community receiving the latest articles and news about my site.
We hate spam. Your email address will not be sold or shared with anyone else.
0 Shares

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.