Engee documentation
Notebook

Averaging closely spaced points within tolerance limits

Let us show how to reduce the number of points on the graph, leaving only those points that differ from the rest by some criterion.

Initial data

Let's use the function from the file peaks.jl, which will create a smooth surface with several protrusions in the interval [-3,3], [-3,3]. Let's add some noise to the data along the Z axis. Now the point cloud no longer lies on one surface, but is scattered around it.

In [ ]:
Pkg.add(["Statistics", "GroupSlices"])
In [ ]:
gr( format=:png ) # Обратите внимание, теперь все графики, в рамках
                  # текущей рабочей сессии, будут выводиться в PNG

include( "$(@__DIR__)/peaks.jl" );

xy = rand(10000, 2) .* 6 .- 3; 
z = peaks.( xy[:,1], xy[:,2] ) .+ 0.5 .- rand(10000,1);
A = [xy z]

scatter( A[:,1], A[:,2], A[:,3], camera=(45, 20), ms=2, markerstrokewidth=0.1, leg=false )
Out[0]:
No description has been provided for this image

Restoring a smooth surface

Within this point cloud, we will look for those points that could be combined based on the following features:

  • the distance between the points does not exceed tol,
  • we are not interested in the Z axis, we compare only X and Y.
In [ ]:
tol = 0.33;

To highlight identical regions, we coarsen the original data set by rounding all coordinates of each point to an integer, previously multiplied by 1/tol.

In [ ]:
aA = round.( (1/tol) .* A[:,1:2], digits=0);

Our matrix now consists of a large number of groups of vectors with identical values. How do we group them? We can compare the vectors with each other element by element, but there is a shorter way.

This operation can be performed using the function groupslices from the library GroupSlices (if necessary, to put it, uncomment and execute the next cell):

In [ ]:
#]add GroupSlices
In [ ]:
using GroupSlices
C = groupslices( aA, dims=1 );

The function groupslices for each row r of the matrix returns the index of the *first encountered row identical to r.

Let's average the points in each subgroup.

In [ ]:
using Statistics
avgA = [  mean( A[C .== c, :], dims=1) for c in unique( C ) ]
avgA = vcat( avgA... ); # Мы получили матрицу из матриц; осуществим вертикальную конкатенацию строчек

Now we can plot the averaged points, noting that although the data set has not become structured, we have eliminated variability along the Z axis, preserving the general appearance of the original data.

In [ ]:
scatter( A[:,1], A[:,2], A[:,3], camera=(45, 20), ms=2, markerstrokewidth=0.1, leg=false )
scatter!( avgA[:,1], avgA[:,2], avgA[:,3], color=:red, ms=3, legend=false )
Out[0]:
No description has been provided for this image

Individual regions can be seen by colouring them as follows:

In [ ]:
scatter( A[:,1], A[:,2], A[:,3], camera=(45, 20), ms=2, markerstrokewidth=0.1, leg=false, zcolor=C./maximum(C), c=:prism )
scatter!( avgA[:,1], avgA[:,2], avgA[:,3], color=:white, ms=3, legend=false, camera=(0,90) )
Out[0]:
No description has been provided for this image

Conclusion

We filtered the noisy function by selecting some unique points. In their selection we relied on the acceptable proximity between the reference points on the X and Y axes.