Engee documentation

Subarrays (SubArrays)

The Julia SubArray type is a container encoding a representation of the parent type. AbstractArray. This page describes some principles of designing and implementing subarrays (`SubArray').

One of the main design goals is to provide high performance for representations of arrays of types IndexLinear and IndexCartesian. Moreover, the IndexLinear array representations themselves should be of the IndexLinear type, as far as possible.

Replacing the index

Let’s consider the possibility of creating two-dimensional slices of a three-dimensional array.

julia> A = rand(2,3,4);

julia> S1 = view(A, :, 1, 2:3)
2×2 view(::Array{Float64, 3}, :, 1, 2:3) with eltype Float64:
 0.839622  0.711389
 0.967143  0.103929

julia> S2 = view(A, 1, :, 2:3)
3×2 view(::Array{Float64, 3}, 1, :, 2:3) with eltype Float64:
 0.839622  0.711389
 0.789764  0.806704
 0.566704  0.962715

view discards individual dimensions (those that are set using Int), so S1 and S2 are two-dimensional subarrays (SubArray). Therefore, the natural way to index them is to use S1[i,j]. To extract a value from the parent array A, you can use a natural approach and replace S1[i,j] with A[i,1,(2:3)[j]] and S2[i,j] with A[1,i,(2:3)[j]].

A key feature of subarray design is that this index replacement can be performed at no cost at runtime.

Designing a subarray

Type parameters and fields

The adopted strategy is expressed, first of all, in the definition of the type:

struct SubArray{T,N,P,I,L} <: AbstractArray{T,N}
    parent::P
    indices::I
    offset1::Int       # для линейного индексирования и указателя, допустимо только при L==true
    stride1::Int       # используется только для линейного индексирования
    ...
end

A subarray (SubArray') has five type parameters. The first two are the standard type and dimension of the element. The following is the type of the parent `AbstractArray'. The most commonly used is the fourth parameter, which is a tuple of index types for each dimension. The last one, `L, is provided only for the convenience of dispatching. This is a boolean value that determines whether the index types support fast linear indexing. More details about this later.

If in the example above A is an Array{Float64, 3}, then in our case S1 will have the value SubArray{Float64,2,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,UnitRange{Int64}},false}. Pay attention, in particular, to the tuple parameter, which stores the types of indexes used to create 'S1'. In a similar way:

julia> S1.indices
(Base.Slice(Base.OneTo(2)), 1, 2:3)

Storing these values allows you to replace indexes, and the presence of types encoded as parameters allows you to send them to efficient algorithms.

Index Conversion

To convert indexes, you need to perform different actions for different specific types of a subarray (SubArray). For example, for S1 you need to apply the indexes i,j to the first and third dimensions of the parent array, and for S2 you need to apply them to the second and third. The easiest way to perform indexing would be by analyzing types at runtime.

parentindices = Vector{Any}()
for thisindex in S.indices
    ...
    if isa(thisindex, Int)
        # Не использовать ни один из входных индексов
        push!(parentindices, thisindex)
    elseif isa(thisindex, AbstractVector)
        # Использовать входной индекс
        push!(parentindices, thisindex[inputindex[j]])
        j += 1
    elseif isa(thisindex, AbstractMatrix)
        # Использовать два входных индекса
        push!(parentindices, thisindex[inputindex[j], inputindex[j+1]])
        j += 2
    elseif ...
end
S.parent[parentindices...]

Unfortunately, this would be disastrous in terms of performance: each time an element is accessed, memory will be allocated, which will entail executing a large amount of poorly typed code.

The best approach is to dispatch into specific methods to handle each type of index being stored. This is exactly what the reindex' function does: it dispatches the type of the first stored index and uses the appropriate number of input indexes, and then performs recursion on the remaining indexes. In the case of `S1, it expands to the following format:

Base.reindex(S1, S1.indices, (i, j)) == (i, S1.indices[2], S1.indices[3][j])

for any pair of indexes (i,j) (except Cartesian indexes (`CartesianIndex') and their arrays, see below).

This is the essence of a subarray (SubArray). Indexing methods depend on the reindex function to implement such index conversion. But sometimes we can do without indirect actions and perform the conversion even faster.

Linear indexing

Linear indexing can be effectively implemented when the entire array has a single step that separates consecutive elements starting from a certain offset. This means that we can pre-calculate these values and represent linear indexing simply as addition and multiplication, completely eliminating the indirect effect of the `reindex' function and (more importantly) the slow calculation of Cartesian coordinates.

For the SubArray' types, the availability of efficient linear indexing is based solely on the types of indexes and does not depend on values such as the size of the parent array. You can find out if a given set of indexes supports fast linear indexing using the internal `Base' function.viewindexing.

julia> Base.viewindexing(S1.indices)
IndexCartesian()

julia> Base.viewindexing(S2.indices)
IndexLinear()

The result is calculated during the construction of the subarray (SubArray) and stored in an L type parameter as a boolean value that encodes support for fast linear indexing. While not strictly mandatory, this means that it is possible to define dispatching directly in the `SubArray'.{T,N,A,I,true}"without any intermediaries.

Since this calculation does not depend on the values of the execution time, it may skip some cases when the step of the array turns out to be uniform.

julia> A = reshape(1:4*2, 4, 2)
4×2 reshape(::UnitRange{Int64}, 4, 2) with eltype Int64:
 1  5
 2  6
 3  7
 4  8

julia> diff(A[2:2:4,:][:])
3-element Vector{Int64}:
 2
 2
 2

A view constructed as view(A, 2:2:4, :), It has a uniform array pitch, and therefore linear indexing can indeed be performed efficiently. However, success in this case depends on the size of the array: if the first dimension were odd,

julia> A = reshape(1:5*2, 5, 2)
5×2 reshape(::UnitRange{Int64}, 5, 2) with eltype Int64:
 1   6
 2   7
 3   8
 4   9
 5  10

julia> diff(A[2:2:4,:][:])
3-element Vector{Int64}:
 2
 3
 2

in A[2:2:4,:] there would be no uniform step, so we could not guarantee efficient linear indexing. Since this solution should be based solely on the types encoded in the parameters of the subarray (SubArray), S = view(A, 2:2:4, :) cannot implement efficient linear indexing.

A few details

  • Note that the Base.reindex function does not depend on the types of input indexes. It simply defines how and where the stored indexes should be reindexed. It supports not only integer indexes, but also non-scalar indexing. This means that representations of representations do not need two levels of indirect actions. They can simply recalculate the indexes into the original parent array.

  • Perhaps it is already clear enough that slice support means that the dimension set by the N parameter is not necessarily equal to the dimension of the parent array or the length of the tuple indices'. The user-specified indexes also do not necessarily match the entries in the `indexes tuple (for example, the second user-specified index may correspond to the third dimension of the parent array and the third element in the indexes tuple).

    Less obvious may be the fact that the dimension of the stored parent array should be equal to the number of effective indexes in the indexes tuple. Some examples:

    A = reshape(1:35, 5, 7) # Двумерный родительский массив
    S = view(A, 2:7)         # Одномерное представление, созданное с помощью линейного индексирования
    S = view(A, :, :, 1:1)   # Поддерживается добавление дополнительных индексов

    It is naive to believe that you can simply set S.parent = A and S.indexes = (:,:,1:1), but supporting such an expression makes the reindexing process much more difficult, especially for view representations. Not only do you need to dispatch the types of stored indexes, but you also need to check if a given index is the final one and merge all remaining stored indexes. It’s not an easy task. Moreover, it is slow because it implicitly depends on linear indexing.

    Fortunately, this is exactly the calculation that the ReshapedArray array performs, and if possible, it does it linearly. Therefore, the view ensures that the parent array has the appropriate dimension for the specified indexes, changing its shape if necessary. The internal constructor of `SubArray' enforces this invariant.

  • CartesianIndex' and its arrays make unpleasant adjustments to the `reindex scheme. Recall that the reindex function simply dispatches the types of stored indexes to determine how many passed indexes should be used and where. But the Cartesian index (CartesianIndex) does not maintain a one-to-one correspondence between the number of arguments passed and the number of dimensions they are indexed in. If we go back to the example above Base.reindex(S1, S1.indices, (i, j)), we will see that the extension is incorrect for i, j = CartesianIndex(), CartesianIndex(2,1)'. It should completely _ skip_ `CartesianIndex() and return the following.

    (CartesianIndex(2,1)[1], S1.indices[2], S1.indices[3][CartesianIndex(2,1)[2]])

    However, instead we get:

    (CartesianIndex(), S1.indices[2], S1.indices[3][CartesianIndex(2,1)])

    With the right approach, you will need to perform combined dispatching of both stored and transmitted indexes in all combinations of dimensions in a complex way. Therefore, the reindex function should never be called with CartesianIndex indexes. Fortunately, the scalar case is easy to handle by first decomposing the arguments of CartesianIndex' into simple integers. However, the CartesianIndex arrays cannot be so easily divided into independent parts. Before attempting to use the `reindex function, the view must make sure that there are no CartesianIndex arrays in the argument list. If there are any, it can simply completely eliminate the reindex calculation by constructing a nested subarray (SubArray) with two levels of indirection instead.