Engee documentation
Notebook

Interpolation and extrapolation

Interpolation is a technique that allows you to supplement a data set with missing values calculated at points between existing points in the sample. Interpolation can be used to fill in gaps in the data, smooth outliers, predict the behaviour of a process, and so on. To estimate values outside the domain of definition of the experiment, extrapolation methods are used.

Unlike various kinds of regression or polynomial approximation, for interpolation it is desirable for the original data to be on a regular grid. That is, the function arguments must be specified via the data type Range. The function $y = F(x)$, which performs interpolation of the output data y defined on the grid x, we will call interpolating function or interpolant.

Multivariate interpolation on the grid

With enough data points, you can get a prediction at any intermediate point, including on a regular grid or on a random sample of points.

In [ ]:
Pkg.add(["Interpolations"])
In [ ]:
using Interpolations, Plots, Random
Random.seed!(12);
gr( fmt=:png, size=(500,400) );

Suppose our empirical data form a 4 by 4 matrix:

In [ ]:
xs = range( 1, 10, length=4 );
ys = range( 1, 10, length=4 );
zs = rand( length(xs), length(ys) );

By creating the interpolant we want (e.g., via a cubic spline), we can calculate its value at any point within the range of the input data:

In [ ]:
itp = cubic_spline_interpolation( (xs,ys), zs );
itp( 2, 2 )
Out[0]:
0.7444301350128467

This model will allow us to calculate the output values at any point within the argument space, bounded by the perimeter defined by the grid on which the empirical data are defined.

Let us plot the interpolation result on a new grid with a small step size between points.

In [ ]:
scatter( repeat(xs, inner=length(ys)),
         repeat(ys, outer=length(xs)),
         vec(reshape( zs, (1,:))), markercolor=:red, label="Данные" )

xs_new = range(1, 10, length=30)
ys_new = range(1, 10, length=30)
surface!( xs_new, ys_new, [itp(x,y) for x in xs_new, y in ys_new],
          colorbar = false, alpha = 0.5, c=:viridis )
Out[0]:
No description has been provided for this image

We used the command for linear interpolation: cubic_spline_interpolation and when creating it we passed two argument vectors and a grid of values as input. This function allows us to obtain forecasts for individual pairs of arguments or to create a whole matrix of forecasts.

In the same way you can perform piecewise linear interpolation (linear_interpolation) or nearest neighbour interpolation (constant_interpolation). Exactly the same actions can be realised with the commands ConstantInterpolation, LinearInterpolation and CubicSplineInterpolation. According to the documentation, they are left in the library for convenience.

In [ ]:
itp0 = constant_interpolation( (xs,ys), zs ); # Константа
itp1 = linear_interpolation( (xs,ys), zs ); # Линейная интерполяция

plot(
    (scatter( repeat(xs, inner=length(ys)), repeat(ys, outer=length(xs)), vec(reshape( zs, (1,:))), markercolor=:red );
     surface!( xs_new, ys_new, itp0(xs_new, ys_new), colorbar = false, alpha = 0.5, c=:viridis, legend=:none )
     ),
    (scatter( repeat(xs, inner=length(ys)), repeat(ys, outer=length(xs)), vec(reshape( zs, (1,:))), markercolor=:red );
     surface!( xs_new, ys_new, itp1(xs_new, ys_new), colorbar = false, alpha = 0.5, c=:viridis, legend=:none )
     ),
    size=(1000,400)
    )
Out[0]:
No description has been provided for this image

The type of the interpolation function is determined by the degree of the polynomial defining the spline: Constant, Linear, Quadratic, or Cubic. So far we have used the convenient, simplified syntax of these commands. But there is a more complex syntax:

In [ ]:
# Вычисляем интерполяцию квадратичным сплайном,
# указав граничное условие – отражение (Reflect)
# и интерпретируя точки на периферии как центроиды (OnCell)
# а также дополнительно масштабируем полученный интерполянт, который при таком синтаксисе поначалу имеет единичный диапазон
sitp4 = scale( interpolate(zs, BSpline(Quadratic(Reflect(OnCell())))), (xs, ys,))

# Линейная интерполяция по одной оси, ступенчатая – по другой оси
# аргумент Gridded означает, что точки на периметре интерпретируются как OnGrid()
itp5 = interpolate( (xs, ys), zs, (Gridded(Linear()),Gridded(Constant())) )

plot(
    (scatter( repeat(xs, inner=length(ys)), repeat(ys, outer=length(xs)), vec(reshape( zs, (1,:))), markercolor=:red );
     surface!( xs_new, ys_new, sitp4(xs_new, ys_new), colorbar = false, alpha = 0.5, c=:viridis, legend=:none )
    ),
    
    (scatter( repeat(xs, inner=length(ys)), repeat(ys, outer=length(xs)), vec(reshape( zs, (1,:))), markercolor=:red );
     surface!( xs_new, ys_new, itp5(xs_new, ys_new), colorbar = false, alpha = 0.5, c=:viridis, legend=:none )
    ),
    size=(1000,400)
)
Out[0]:
No description has been provided for this image

Not all combinations of conditions can produce results. For example, quadratic and cubic interpolation require boundary conditions from Flat, Line (same as Natural), Free, Periodic or Reflect. Points on the grid perimeter can be interpreted by the interpolation algorithm as boundary (OnGrid()) or as being halfway between the preceding row and a dummy next row of points (OnCell()).

Interpolation of data defined on an irregular grid can be performed using a piecewise linear spline and constants.

Information about additional parameters and interpolation modes can be found in the documentation.

Extrapolation

When we need to predict data outside the range of available experimental data, we need to add another hypothesis to the interpolation parameters - how we think the function behaves outside the available data, and here we have the options Throw (give an error), Flat, Line, Periodic and Reflect, or we can pass a constant as an argument, the value of which will be substituted during interpolation.

Let's demonstrate different methods of extrapolation on the example of a one-dimensional signal.

In [ ]:
x = 1:5;
y = [2,7,3,8,6];

Let's set different boundary conditions (boundary conditions) through the argument extrapolation_bc.

In [ ]:
new_x = range(-2, 8, length=100)

plot(
    plot(x, y, title="Исходные данные"),
    plot(new_x, cubic_spline_interpolation(x, y, extrapolation_bc=Line())(new_x), title="Line()"),
    plot(new_x, cubic_spline_interpolation(x, y, extrapolation_bc=Flat())(new_x), title="Flat()"),
    plot(new_x, cubic_spline_interpolation(x, y, extrapolation_bc=Periodic())(new_x), title="Periodic()"),
    plot(new_x, cubic_spline_interpolation(x, y, extrapolation_bc=Reflect())(new_x), title="Reflect()"),
    plot(new_x, cubic_spline_interpolation(x, y, extrapolation_bc=5)(new_x), title="fill(5)"),
    size=(1000,400), layout=(2,3)
)
Out[0]:
No description has been provided for this image

Of course, the same methods allow interpolation and extrapolation of higher dimensional data. For example, let us place 64 data points in 3-dimensional space and construct a smooth function based on them.

In [ ]:
a = range( 1, 10, length=4 );
b = range( 1, 10, length=4 );
c = range( 1, 10, length=4 );
d = rand( length(a), length(b), length(c) );

# 3D-интерполяция (легко записать в одну строку)
itp = extrapolate(
          scale(
              interpolate( d, BSpline(
                                  Quadratic(
                                      Reflect(OnCell())
                                  )
                              )
                ), (a,b,c,)
      ), Flat() )

# Задаем новую сетку
new_a = new_b = -2:0.1:13; new_c = 1:10;

vPlots = [ contour( new_a, new_b, itp(new_a, new_b, ic), colorbar = false, alpha = 0.5, c=:viridis, legend=:none, title="c = $ic", titlefont = font(8)) for ic in new_c ] 
plot!( vPlots..., layout=(2,:), size=(1000,400) )
Out[0]:
No description has been provided for this image

Conclusion

We have looked at several data interpolation methods that can be used with data of any dimension. The interpolation methods we have studied allow us to work with raw data defined on grids. For unordered data, it is better to use classical machine learning methods or advanced visualisation methods (e.g. Delaunay triangulation).

There are dozens of other packages in the Julia ecosystem for interpolating data in other problem settings, and they are listed on this page.