Engee documentation
Notebook

Benford's Law

We are considering the implementation of Benford's law in the Julia programming language using the Fibonacci sequence.

Introduction

Benford's law is an empirical law that describes the distribution of the first digits in sets of numerical data from real life. According to this law, the number 1 occurs as the first significantly more often than other numbers (about 30%), and the number 9 is the least frequent (less than 5%). The law has wide application in the field of data audit, fraud detection and economic analysis.

In this example, we are implementing Benford's law in the Julia programming language. We will also create a sequence of Fibonacci numbers, apply the law to it, and then plot and compare the empirical data with the theoretical distribution.

Algorithm implementation

The Benford rule is given by the formula:

where - the first digit is from 1 to 9, and - the probability of this number being the first.

In [ ]:
"""
Вычисляет вероятность первой цифры по закону Бенфорда.
"""
P(d) = log10(1 + 1 / d)
Out[0]:
P

benford(numbers) -- calculates the frequencies of the first digits in a set of numbers.

Returns an array of 9 elements, where the ith element indicates which fraction of the numbers starts with the digit i.

In [ ]:
function benford(numbers)
    # Получает первую цифру числа n (самую правую цифру после обращения)
    firstdigit(n) = last(digits(n))

    # Инициализируем массив счетчиков для каждой цифры (1–9)
    counts = zeros(9)

    # Для каждого числа увеличиваем соответствующий счетчик
    foreach(n -> counts[firstdigit(n)] += 1, numbers)

    # Нормируем частоты, чтобы их сумма была равна 1
    counts ./ sum(counts)
end
Out[0]:
benford

We define an iterator to generate a sequence of Fibonacci numbers. Base.iterate it is implemented in such a way that it is possible to work with an infinite stream using big.Int to prevent overflow.

In [ ]:
# Создаем тип для обозначения итератора Фибоначчи
struct Fibonacci end

Implementation of the iteration protocol for the type Fibonacci

In [ ]:
Base.iterate(::Fibonacci, (a, b) = big.((0, 1))) = b, (b, a + b)

Generating the first 1000 numbers from the Fibonacci sequence

In [ ]:
sample = Iterators.take(Fibonacci(), 1000)
Out[0]:
Base.Iterators.Take{Fibonacci}(Fibonacci(), 1000)

We calculate the observed frequencies of the first digits and convert them to percentages.

In [ ]:
observed = benford(sample) .* 100
Out[0]:
9-element Vector{Float64}:
 30.099999999999998
 17.7
 12.5
  9.6
  8.0
  6.7
  5.6000000000000005
  5.3
  4.5

We calculate the theoretical values according to Benford's law and convert them to percentages.

In [ ]:
expected = P.(1:9) .* 100
Out[0]:
9-element Vector{Float64}:
 30.10299956639812
 17.609125905568124
 12.493873660829994
  9.691001300805642
  7.918124604762482
  6.694678963061322
  5.799194697768673
  5.115252244738129
  4.575749056067514

Creating a table for convenient data comparison

In [ ]:
table = Real[1:9 observed expected]
Out[0]:
9×3 Matrix{Real}:
 1  30.1  30.103
 2  17.7  17.6091
 3  12.5  12.4939
 4   9.6   9.691
 5   8.0   7.91812
 6   6.7   6.69468
 7   5.6   5.79919
 8   5.3   5.11525
 9   4.5   4.57575

We plot the bars for the observed data and the lines for the expected ones.

In [ ]:
using Plots

plot(
    [observed expected];
    title = "Benford's Law",
    seriestype = [:bar :line],
    linewidth = [0 5],
    xticks = 1:9,
    xlabel = "first digit",
    ylabel = "frequency %",
    label = ["1000 Fibonacci numbers" "P(d)=log10(1+1/d)"]
)
Out[0]:

We output the results to the console in a well-formatted form.

In [ ]:
using Printf

println("Benford's Law\nFrequency of first digit\nin 1000 Fibonacci numbers")
println("digit observed  expected")
foreach(i -> @printf("%3d%9.2f%%%9.2f%%\n", table[i, :]...), 1:9)
Benford's Law
Frequency of first digit
in 1000 Fibonacci numbers
digit observed  expected
  1    30.10%    30.10%
  2    17.70%    17.61%
  3    12.50%    12.49%
  4     9.60%     9.69%
  5     8.00%     7.92%
  6     6.70%     6.69%
  7     5.60%     5.80%
  8     5.30%     5.12%
  9     4.50%     4.58%

Conclusion

We have reviewed the implementation of Benford's law in the Julia language. We wrote a function to calculate the distribution of the first digits, applied it to a sample of the first 1000 Fibonacci numbers, and compared the data obtained with the theoretical values of the law. Visual graphs were also constructed, demonstrating the coincidence of the observed statistics with the mathematical law. Such methods can be used to verify the naturalness of the data or to identify possible distortions and fraud.

The example was developed using materials from Rosetta Code