Distribution of a random variable in different coordinate systems¶
The following topics will be covered in this example:
- generating random numbers and arrays in Julia
- use of probability distributions
- the importance of choosing the right distribution of a random variable depending on the problem statement.
Default random number generation¶
function rand()
¶
In order to generate a random number, you can use the function rand()
without connecting libraries. It will return a random real number in the range of [0,1)
Pkg.add(["Distributions"])
rand()
If we need to create a matrix $A^{i\times j \times k}$ of a certain size with elements $a_{ijk}\in[0,1)$, we will use rand(i,j,k)
rand(2,4,3)
If we pass a tuple to rand()
:
rand((2,4,3))
,
then (2,4,3)
will be a set of elements from which a random element is selected
rand((2,4,3))
rand(('a',3,"string",1+1im),2,3)
You can also pass a dictionary as a set of elements:
rand(Dict("pi" => 3.14, "e" => 2.71), 3)
Visual distribution analysis¶
In order to visually assess which law the numbers are generated according to, let's use the library Plots
. To do this, let's connect it using using Plots
and then use the function histogram
. By specifying gr()
we will disable the possibility of interactive interaction with charts.
using Plots # подключаем библиотеку
gr() # отключаем интерактивное взаимодействие с графиком, т.к. точек много
histogram(rand(50000)) # строим гистограму
title!("rand(50000)") # подпись
As we can see from the histogram, the distribution of random numbers generated using rand() is uniform. That is, the rand()
function generates values with standard continuous uniform distribution (SNRD).
Normal distribution¶
It is also possible to generate random numbers with normal distribution in Julia without connecting other libraries.
The randn() function generates a normal distribution of numbers with mathematical expectation 0
and standard deviation 1
. (i.e. the range of values will be $(-\infty,\infty$)
histogram(randn(50000),bims=0.005) # bims - ширина стобца, в который попадает точка
title!("randn(50000)")
Importance of choosing the correct distribution of a random variable¶
Let us consider such a problem:
Create a set of points inside a circle of radius R=1
and centre at (0,0)
, whose Cartesian coordinates are uniformly distributed along the abscissa and ordinate axes. To do this
- Create a set of points falling in a square with diagonal , passing through the vertices.
(-1,1), (1,-1)
- Select the points that fall in the unit circle.
Step 1.
To solve this step, keep in mind that we want uniformly distributed values of X
in the range [-1,1]
. And rand()
is SNRR (i.e., [0,1)
).
SNRR has the scaling property: If the random variable $X \sim U[0,1]$ and $Y = a+(b-a)X$, then $Y \sim U[\min(a,b),\max(a,b)]$
In our case $a=-1$ and $b=1$ $\Rightarrow$
POINTS_NUMBER = 3000
big_square_points = -1 .+ 2. .* rand(POINTS_NUMBER, 2) # воспользовались свойством СНРР
In order not to use the same keywords every time when building charts, you can add PlotThemes
and specify the necessary words "by default".
Pkg.add("PlotThemes")
theme(:default, aspect_ratio=1, label = false, markersize=1)
scatter(big_square_points[:,1],big_square_points[:,2])
Step 2
We will go through all the points of our square and add to the array those coordinates that fall within the unit circle.
circle_points_x = Float64[]
circle_points_y = Float64[]
R = 1
for (x, y) in eachrow(big_square_points) # (в каждой строке big_square_points 2 элемента: x и y)
if x^2 + y^2 < R^2
push!(circle_points_x, x)
push!(circle_points_y, y)
end
end
scatter(circle_points_x,circle_points_y)
To solve our problem, a more natural approach would be to use polar coordinates instead of Cartesian coordinates.
Using polar coordinates¶
Then this problem will be reduced to one rather than two steps:
Step 1 Select $\rho \in [0,1]$ and $\alpha \in [0, 2\pi]$
ρ = rand(POINTS_NUMBER)
α = 2π * rand(POINTS_NUMBER)
polar_original = scatter(ρ .* cos.(α), ρ .* sin.(α), aspect_ratio=1,markersize=1) # перешли обратно в декартовы
It is also possible to see our results at once, without switching to Cartesian coordinates. To do this, we disable aspect_ratio=1
and specify proj=:polar
- polar display.
theme(:default)
scatter(α, ρ , proj=:polar, markersize=1)
We can notice that the density of points near the origin is higher than near the circle.
For illustration, let us cut out an inscribed square from the circle.
For this purpose we will use the function zip()
:
for i in zip(1:5,'a':'e',["🟥","🟩","🟦"])
println(i)
end
# 4,5 и 'd','e' не напечатаются, так как коллекция цветов закончилась на 3 шаге
Visualisation of point density¶
small_square_points_x = Float64[]
small_square_points_y = Float64[]
for (x, y) in zip(circle_points_x, circle_points_y)
if abs(x) ≤ √2 / 2 && abs(y) ≤ √2 / 2
push!(small_square_points_x, x)
push!(small_square_points_y, y)
end
end
small_square_points_ρ = Float64[]
small_square_points_α = Float64[]
for (x, y) in zip(ρ .* cos.(α), ρ .* sin.(α))
if abs(x) ≤ √2 / 2 && abs(y) ≤ √2 / 2
push!(small_square_points_ρ, x)
push!(small_square_points_α, y)
end
end
In order to show several graphs on one canvas at once, it is necessary to use layout
and specify the structure of their arrangement (in our case, the grid $2 \times 2$. With the help of histogram2d
we will be able to see in which area the density of points is high.
p1 = scatter( small_square_points_x, small_square_points_y,
markersize=1, legend = false, aspect_ratio=1)
p2 = scatter( small_square_points_ρ, small_square_points_α,
markersize=1, legend = false, aspect_ratio=1)
p3 = histogram2d(small_square_points_x,small_square_points_y,
title="плотность в декартовых координатах", aspect_ratio=1)
p4 = histogram2d(small_square_points_ρ,small_square_points_α,
title="плотность в полярных координатах",aspect_ratio=1)
plot(p1,p2,p3,p4,layout=(2,2),size=(950,950))
Function repeat()
¶
Consider the function repeat()
repeat([1:3;],2)
repeat([1:3;],1,2)
repeat([1:3;],2,3)
repeat([1:3;],inner=2)
Comparison of polar and Cartesian coordinates¶
To understand why applying polar coordinates and a random uniform distribution over $\rho$ and $\alpha$ worked in this way rather than perhaps the expected way, consider the following picture. Let's take a fixed uniform distribution rather than a random one:
lngt = 12
a = repeat(range(0, stop=2pi,length=lngt),1,lngt)
r = repeat(range(0, stop=1, length=lngt),1,lngt)'
scatter(r .* cos.(a), r .* sin.(a),aspect_ratio=1, markersize=3)
x = repeat(range(0, stop=1, length=lngt),1,lngt)'
y = repeat(range(0, stop=1,length=lngt),lngt)
scatter(x, y,aspect_ratio=1)
As we can see, the density of the distribution of points in Cartesian coordinates is constant, and in polar coordinates your density near the origin.
To solve the problem we need to use such a distribution that will be more likely to put a point closer to the end of the radius vector than to the beginning. Suppose that such a distribution should be linear (the closer to the end, the more probable). It is called triangular.
Triangular distribution¶
To take advantage of the triangular distribution, let's plug in the library Distributions
. This library has a large set of different distributions (in particular, one-dimensional ones). This set can be found in documentation.
using Pkg
Pkg.add("Distributions")
using Distributions
Now let's create a set of points with triangular distribution. For this purpose as the first argument of the function rand()
pass TriangularDist(a,b,c)
with parameters a=0
,b=1
,c=1
.
histogram(rand(TriangularDist(0,1,1),POINTS_NUMBER))
Having verified that the triangular distribution works as expected, let us check that its application will solve our problem:
ρ_triang = rand(TriangularDist(0,1,1), POINTS_NUMBER)
α_triang = rand(Uniform(0,2π), POINTS_NUMBER) # Uniform(0,2π) - равномерное распределение от 0 до 2π
polar_triang = scatter(ρ_triang .* cos.(α_triang), ρ_triang .* sin.(α_triang),aspect_ratio=1, markersize=1)
plot(polar_original, polar_triang,layout=(1,2))
As an interesting fact, we show how without connecting the library Distributions
we could solve this problem by adding only two symbols .√
:
ρ_sqrt = .√rand(POINTS_NUMBER)
α_sqrt = 2π * rand(POINTS_NUMBER)
scatter(α_sqrt, ρ_sqrt, proj=:polar, markersize=1)
This is true because of the fact that:
Using the uniform distribution generator, we can get a triangular distribution generator: $$X = \begin{cases} a + \sqrt{U(b-a)(c-a)} & \text{ для } 0 < U < F(c) \\ & \\ b - \sqrt{(1-U)(b-a)(b-c)} & \text{ для } F(c) \le U < 1, \end{cases}$$ where $U$ is the uniformly distributed value and $F(c) = (c-a)/(b-a)$.
By substituting a=0
,b=1
,c=1
, we get exactly .√rand()
Conclusion¶
This example has shown that the Julia language has both convenient built-in features and specialised packages that allow the use of domain-specific features by interfacing with the standard ones.
It also looked at how to generate arrays (repeat
) and iterate over them (zip
and eachrow
)