Engee documentation
Notebook

Benford's Law

We are considering the implementation of Benford's law in the Julia programming language using the Fibonacci sequence.

Introduction

Benford's law is an empirical law that describes the distribution of the first digits in sets of numerical data from real life. According to this law, the number 1 occurs as the first significantly more often than other numbers (about 30%), and the number 9 is the least frequent (less than 5%). The law has wide application in the field of data audit, fraud detection and economic analysis.

In this example, we are implementing Benford's law in the Julia programming language. We will also create a sequence of Fibonacci numbers, apply the law to it, and then plot and compare the empirical data with the theoretical distribution.

Algorithm implementation

The Benford rule is given by the formula:

where - the first digit is from 1 to 9, and - the probability of this number being the first.

In [ ]:
"""
Calculates the probability of the first digit according to Benford's law.
"""
P(d) = log10(1 + 1 / d)
Out[0]:
P

benford(numbers) -- calculates the frequencies of the first digits in a set of numbers.

Returns an array of 9 elements, where the ith element indicates which fraction of numbers begin with the digit i.

In [ ]:
function benford(numbers)
    # Gets the first digit of the number n (the rightmost digit after the conversion)
    firstdigit(n) = last(digits(n))

    # Initialize the array of counters for each digit (1-9)
    counts = zeros(9)

    # For each number, we increase the corresponding counter.
    foreach(n -> counts[firstdigit(n)] += 1, numbers)

    # We normalize the frequencies so that their sum is equal to 1
    counts ./ sum(counts)
end
Out[0]:
benford

We define an iterator to generate a sequence of Fibonacci numbers. Base.iterate it is implemented in such a way that it is possible to work with an infinite stream using big.Int to prevent overflow.

In [ ]:
# Creating a type to denote the Fibonacci iterator
struct Fibonacci end

Implementation of the iteration protocol for the type Fibonacci

In [ ]:
Base.iterate(::Fibonacci, (a, b) = big.((0, 1))) = b, (b, a + b)

Generating the first 1000 numbers from the Fibonacci sequence

In [ ]:
sample = Iterators.take(Fibonacci(), 1000)
Out[0]:
Base.Iterators.Take{Fibonacci}(Fibonacci(), 1000)

We calculate the observed frequencies of the first digits and convert them to percentages.

In [ ]:
observed = benford(sample) .* 100
Out[0]:
9-element Vector{Float64}:
 30.099999999999998
 17.7
 12.5
  9.6
  8.0
  6.7
  5.6000000000000005
  5.3
  4.5

We calculate the theoretical values according to Benford's law and convert them to percentages.

In [ ]:
expected = P.(1:9) .* 100
Out[0]:
9-element Vector{Float64}:
 30.10299956639812
 17.609125905568124
 12.493873660829994
  9.691001300805642
  7.918124604762482
  6.694678963061322
  5.799194697768673
  5.115252244738129
  4.575749056067514

Creating a table for convenient data comparison

In [ ]:
table = Real[1:9 observed expected]
Out[0]:
9×3 Matrix{Real}:
 1  30.1  30.103
 2  17.7  17.6091
 3  12.5  12.4939
 4   9.6   9.691
 5   8.0   7.91812
 6   6.7   6.69468
 7   5.6   5.79919
 8   5.3   5.11525
 9   4.5   4.57575

We plot the bars for the observed data and the lines for the expected ones.

In [ ]:
using Plots

plot(
    [observed expected];
    title = "Benford's Law",
    seriestype = [:bar :line],
    linewidth = [0 5],
    xticks = 1:9,
    xlabel = "first digit",
    ylabel = "frequency %",
    label = ["1000 Fibonacci numbers" "P(d)=log10(1+1/d)"]
)
Out[0]:

We output the results to the console in a well-formatted form.

In [ ]:
using Printf

println("Benford's Law\nFrequency of first digit\nin 1000 Fibonacci numbers")
println("digit observed  expected")
foreach(i -> @printf("%3d%9.2f%%%9.2f%%\n", table[i, :]...), 1:9)
Benford's Law
Frequency of first digit
in 1000 Fibonacci numbers
digit observed  expected
  1    30.10%    30.10%
  2    17.70%    17.61%
  3    12.50%    12.49%
  4     9.60%     9.69%
  5     8.00%     7.92%
  6     6.70%     6.69%
  7     5.60%     5.80%
  8     5.30%     5.12%
  9     4.50%     4.58%

Conclusion

We have reviewed the implementation of Benford's law in the Julia language. We wrote a function to calculate the distribution of the first digits, applied it to a sample of the first 1000 Fibonacci numbers, and compared the data obtained with the theoretical values of the law. Visual graphs were also constructed, demonstrating the coincidence of the observed statistics with the mathematical law. Such methods can be used to verify the naturalness of the data or to identify possible distortions and fraud.

The example was developed using materials from Rosetta Code