Engee documentation
Notebook

Distribution of a random variable in different coordinate systems

The following topics will be covered in this example:

  • generating random numbers and arrays in Julia
  • use of probability distributions
  • the importance of choosing the right distribution of a random variable depending on the problem statement.

Default random number generation

function rand()

In order to generate a random number, you can use the function rand() without connecting libraries. It will return a random real number in the range of [0,1)

In [ ]:
Pkg.add(["Distributions"])
In [ ]:
rand()
Out[0]:
0.8968475595278809

If we need to create a matrix $A^{i\times j \times k}$ of a certain size with elements $a_{ijk}\in[0,1)$, we will use rand(i,j,k)

In [ ]:
rand(2,4,3)
Out[0]:
2×4×3 Array{Float64, 3}:
[:, :, 1] =
 0.105281  0.553249  0.571817  0.068993
 0.463789  0.510086  0.468581  0.599769

[:, :, 2] =
 0.845341  0.959642  0.488022  0.959579
 0.409156  0.49006   0.548231  0.899735

[:, :, 3] =
 0.957553  0.136052  0.885431  0.284849
 0.826566  0.92279   0.659209  0.384685

If we pass a tuple to rand():

rand((2,4,3)),

then (2,4,3) will be a set of elements from which a random element is selected

In [ ]:
rand((2,4,3))
Out[0]:
4
In [ ]:
rand(('a',3,"string",1+1im),2,3)
Out[0]:
2×3 Matrix{Any}:
   "string"  'a'    "string"
 1+1im       'a'  1+1im

You can also pass a dictionary as a set of elements:

In [ ]:
rand(Dict("pi" => 3.14, "e" => 2.71), 3)
Out[0]:
3-element Vector{Pair{String, Float64}}:
 "e" => 2.71
 "e" => 2.71
 "e" => 2.71

Visual distribution analysis

In order to visually assess which law the numbers are generated according to, let's use the library Plots. To do this, let's connect it using using Plots and then use the function histogram. By specifying gr() we will disable the possibility of interactive interaction with charts.

In [ ]:
using Plots             # подключаем библиотеку
gr()                    # отключаем интерактивное взаимодействие с графиком, т.к. точек много
histogram(rand(50000))  # строим гистограму 
title!("rand(50000)")   # подпись
Out[0]:

As we can see from the histogram, the distribution of random numbers generated using rand() is uniform. That is, the rand() function generates values with standard continuous uniform distribution (SNRD).

Normal distribution

It is also possible to generate random numbers with normal distribution in Julia without connecting other libraries.

The randn() function generates a normal distribution of numbers with mathematical expectation 0 and standard deviation 1. (i.e. the range of values will be $(-\infty,\infty$)

In [ ]:
histogram(randn(50000),bims=0.005) # bims - ширина стобца, в который попадает точка
title!("randn(50000)")
Out[0]:

Importance of choosing the correct distribution of a random variable

Let us consider such a problem:

Create a set of points inside a circle of radius R=1 and centre at (0,0), whose Cartesian coordinates are uniformly distributed along the abscissa and ordinate axes. To do this

  1. Create a set of points falling in a square with diagonal , passing through the vertices. (-1,1), (1,-1)
  2. Select the points that fall in the unit circle.

Step 1.

To solve this step, keep in mind that we want uniformly distributed values of X in the range [-1,1]. And rand() is SNRR (i.e., [0,1)).

SNRR has the scaling property: If the random variable $X \sim U[0,1]$ and $Y = a+(b-a)X$, then $Y \sim U[\min(a,b),\max(a,b)]$

In our case $a=-1$ and $b=1$ $\Rightarrow$

In [ ]:
POINTS_NUMBER = 3000
big_square_points = -1 .+ 2. .* rand(POINTS_NUMBER, 2)  # воспользовались свойством СНРР
Out[0]:
3000×2 Matrix{Float64}:
  0.0529451    -0.378398
 -0.926168     -0.595867
 -0.506066      0.586659
 -0.349772     -0.501738
 -0.199344     -0.742828
  0.869969      0.759997
  0.0879778    -0.111934
 -0.601932     -0.955655
  0.678965      0.0709128
 -0.873571     -0.671576
  0.161069      0.667479
  0.32568       0.848502
  0.14866      -0.0551039
  ⋮            
 -0.403644     -0.839549
 -0.103682     -0.11301
 -0.958197     -0.63964
  0.0131706     0.507728
  0.0394808     0.353285
 -0.948482     -0.794558
 -0.870387     -0.195976
  0.980847      0.716624
 -0.987589      0.0785435
  0.897117     -0.967298
  0.000805525  -0.625087
 -0.112197     -0.540335

In order not to use the same keywords every time when building charts, you can add PlotThemes and specify the necessary words "by default".

In [ ]:
Pkg.add("PlotThemes")
theme(:default, aspect_ratio=1, label = false, markersize=1)
scatter(big_square_points[:,1],big_square_points[:,2])  
   Resolving package versions...
  No Changes to `~/.project/Project.toml`
  No Changes to `~/.project/Manifest.toml`
Out[0]:

Step 2

We will go through all the points of our square and add to the array those coordinates that fall within the unit circle.

In [ ]:
circle_points_x = Float64[]
circle_points_y = Float64[]
R = 1
for (x, y) in eachrow(big_square_points) # (в каждой строке big_square_points 2 элемента: x и y)
    if x^2 + y^2 < R^2
        push!(circle_points_x, x)
        push!(circle_points_y, y)
    end
end
scatter(circle_points_x,circle_points_y)
Out[0]:

To solve our problem, a more natural approach would be to use polar coordinates instead of Cartesian coordinates.

Using polar coordinates

Then this problem will be reduced to one rather than two steps:

Step 1 Select $\rho \in [0,1]$ and $\alpha \in [0, 2\pi]$

In [ ]:
ρ = rand(POINTS_NUMBER)
α = 2π * rand(POINTS_NUMBER)
polar_original = scatter(ρ .* cos.(α), ρ .* sin.(α), aspect_ratio=1,markersize=1)  # перешли обратно в декартовы
Out[0]:

It is also possible to see our results at once, without switching to Cartesian coordinates. To do this, we disable aspect_ratio=1 and specify proj=:polar - polar display.

In [ ]:
theme(:default)
scatter(α, ρ , proj=:polar, markersize=1)
Out[0]:

We can notice that the density of points near the origin is higher than near the circle.

For illustration, let us cut out an inscribed square from the circle.

For this purpose we will use the function zip():

In [ ]:
for i in zip(1:5,'a':'e',["🟥","🟩","🟦"])
    println(i)
end
# 4,5 и 'd','e' не напечатаются, так как коллекция цветов закончилась на 3 шаге
(1, 'a', "🟥")
(2, 'b', "🟩")
(3, 'c', "🟦")

Visualisation of point density

In [ ]:
small_square_points_x = Float64[]
small_square_points_y = Float64[]

for (x, y) in zip(circle_points_x, circle_points_y)
    if abs(x)  2 / 2 && abs(y)  2 / 2
        push!(small_square_points_x, x)
        push!(small_square_points_y, y)
    end
end
In [ ]:
small_square_points_ρ = Float64[]
small_square_points_α = Float64[]

for (x, y) in zip(ρ .* cos.(α), ρ .* sin.(α))
    if abs(x)  2 / 2 && abs(y)  2 / 2
        push!(small_square_points_ρ, x)
        push!(small_square_points_α, y)
    end
end

In order to show several graphs on one canvas at once, it is necessary to use layout and specify the structure of their arrangement (in our case, the grid $2 \times 2$. With the help of histogram2d we will be able to see in which area the density of points is high.

In [ ]:
p1 = scatter(   small_square_points_x, small_square_points_y,
                markersize=1, legend = false, aspect_ratio=1)
p2 = scatter(   small_square_points_ρ, small_square_points_α,
                markersize=1, legend = false, aspect_ratio=1)
p3 = histogram2d(small_square_points_x,small_square_points_y,
                 title="плотность в декартовых координатах", aspect_ratio=1)
p4 = histogram2d(small_square_points_ρ,small_square_points_α,
                 title="плотность в полярных координатах",aspect_ratio=1)

plot(p1,p2,p3,p4,layout=(2,2),size=(950,950))
Out[0]:

Function repeat()

Consider the function repeat()

In [ ]:
repeat([1:3;],2)
Out[0]:
6-element Vector{Int64}:
 1
 2
 3
 1
 2
 3
In [ ]:
repeat([1:3;],1,2)
Out[0]:
3×2 Matrix{Int64}:
 1  1
 2  2
 3  3
In [ ]:
repeat([1:3;],2,3)
Out[0]:
6×3 Matrix{Int64}:
 1  1  1
 2  2  2
 3  3  3
 1  1  1
 2  2  2
 3  3  3
In [ ]:
repeat([1:3;],inner=2)
Out[0]:
6-element Vector{Int64}:
 1
 1
 2
 2
 3
 3

Comparison of polar and Cartesian coordinates

To understand why applying polar coordinates and a random uniform distribution over $\rho$ and $\alpha$ worked in this way rather than perhaps the expected way, consider the following picture. Let's take a fixed uniform distribution rather than a random one:

In [ ]:
lngt = 12
a = repeat(range(0, stop=2pi,length=lngt),1,lngt)
r = repeat(range(0, stop=1, length=lngt),1,lngt)'
scatter(r .* cos.(a), r .* sin.(a),aspect_ratio=1, markersize=3)
Out[0]:
In [ ]:
x = repeat(range(0, stop=1, length=lngt),1,lngt)'
y = repeat(range(0, stop=1,length=lngt),lngt)
scatter(x, y,aspect_ratio=1)
Out[0]:

As we can see, the density of the distribution of points in Cartesian coordinates is constant, and in polar coordinates your density near the origin.

To solve the problem we need to use such a distribution that will be more likely to put a point closer to the end of the radius vector than to the beginning. Suppose that such a distribution should be linear (the closer to the end, the more probable). It is called triangular.

Triangular distribution

To take advantage of the triangular distribution, let's plug in the library Distributions. This library has a large set of different distributions (in particular, one-dimensional ones). This set can be found in documentation.

In [ ]:
using Pkg   
Pkg.add("Distributions")
using Distributions
   Resolving package versions...
  No Changes to `~/.project/Project.toml`
  No Changes to `~/.project/Manifest.toml`

Now let's create a set of points with triangular distribution. For this purpose as the first argument of the function rand() pass TriangularDist(a,b,c) with parameters a=0,b=1,c=1. triangular_distribution_pmf.png

In [ ]:
histogram(rand(TriangularDist(0,1,1),POINTS_NUMBER))
Out[0]:

Having verified that the triangular distribution works as expected, let us check that its application will solve our problem:

In [ ]:
ρ_triang = rand(TriangularDist(0,1,1), POINTS_NUMBER)
α_triang = rand(Uniform(0,2π), POINTS_NUMBER)    # Uniform(0,2π) - равномерное распределение от 0 до 2π
polar_triang = scatter(ρ_triang .* cos.(α_triang), ρ_triang .* sin.(α_triang),aspect_ratio=1, markersize=1)
plot(polar_original, polar_triang,layout=(1,2))
Out[0]:

As an interesting fact, we show how without connecting the library Distributions we could solve this problem by adding only two symbols .√:

In [ ]:
ρ_sqrt = .√rand(POINTS_NUMBER)
α_sqrt = 2π * rand(POINTS_NUMBER)
scatter(α_sqrt, ρ_sqrt, proj=:polar, markersize=1)
Out[0]:

This is true because of the fact that:

Using the uniform distribution generator, we can get a triangular distribution generator: $$X = \begin{cases} a + \sqrt{U(b-a)(c-a)} & \text{ для } 0 < U < F(c) \\ & \\ b - \sqrt{(1-U)(b-a)(b-c)} & \text{ для } F(c) \le U < 1, \end{cases}$$ where $U$ is the uniformly distributed value and $F(c) = (c-a)/(b-a)$.

By substituting a=0,b=1,c=1, we get exactly .√rand()

Conclusion

This example has shown that the Julia language has both convenient built-in features and specialised packages that allow the use of domain-specific features by interfacing with the standard ones.

It also looked at how to generate arrays (repeat) and iterate over them (zip and eachrow)