Checking boundaries
Like many modern programming languages, Julia uses boundary checking to ensure program security when accessing arrays. In continuous inner loops or other performance-critical situations, boundary checks can be skipped to improve runtime performance. For example, to execute vectorized (SIMD) instructions, the loop body cannot contain branches, which means it cannot contain boundary checks. In this regard, Julia contains the macro @inbounds(...)
, instructing the compiler to skip such boundary checks in a given block. User-defined array types can use the macro @boundscheck(...)
for context-dependent code selection.
Bypassing border checks
Macro @boundscheck(...)
marks code blocks that perform boundary checks. When such blocks are inserted into the block @inbounds(...)
, the compiler can remove them. The compiler removes the @boundscheck
block only if it is embedded in the calling function. For example, you can write the sum
method as follows.
function sum(A::AbstractArray)
r = zero(eltype(A))
for i in eachindex(A)
@inbounds r += A[i]
end
return r
end
Here the custom array-like type myArray
has the following form.
@inline getindex(A::MyArray, i::Real) = (@boundscheck checkbounds(A, i); A.data[to_index(i)])
Then, after inserting getindex
into sum
, the call to checkbounds(A, i)
will be omitted. If the function contains several levels of embedding, only the @boundscheck
blocks located no more than one level of embedding below are excluded. This rule prevents unintended changes to the behavior of the program from the code located further on the stack.
Attention!
With the help of @inbounds
, it is easy to accidentally reveal unsafe operations. You may be tempted to write the above example in the following form.
function sum(A::AbstractArray)
r = zero(eltype(A))
for i in 1:length(A)
@inbounds r += A[i]
end
return r
end
It assumes indexing based on 1 and therefore opens up dangerous memory access when usage with OffsetArrays
.
julia> using OffsetArrays
julia> sum(OffsetArray([1, 2, 3], -10))
9164911648 # Несогласованные результаты или аварийное завершение
Although the initial source of the error is 1:length(A)
, the usage of @inbounds
exacerbates the consequences from boundary error to dangerous memory accesses that are not so easily detected and debugged. It is often difficult or impossible to prove that a method using @inbounds
is safe, so it is necessary to balance the benefits of performance improvements with the risk of crashes and hidden incorrect behavior, especially in public APIs.
Distribution within borders
In certain cases, for reasons related to code organization, you may need more than one level between the '@inbounds` and @boundscheck
declarations. For example by default getindex
methods have a chain: getindex(A::AbstractArray, i::Real)
calls getindex(IndexStyle(A), A, i)
, calls `_getindex(::IndexLinear, A, i)'.
To override the rule of one level of embedding, the function can be marked using a macro Base.@propagate_inbounds
to propagate the context inside the boundaries (or the context outside the boundaries) through one additional level of embedding.
Hierarchy of boundary check calls
The general hierarchy is as follows:
-
function
checkbounds(A, I...)
, which calls the ** functioncheckbounds(Bool, A, I...)
, which calls-
the function
checkbounds_indices(Bool, axes(A), I)
, which recursively calls-
'checkindex` function for each dimension.
-
-
Here A
is an array, and I
contains the requested indexes. axes(A)
returns a tuple of allowed indices `A'.
The function checkbounds(A, I...)
returns an error if indexes are invalid, whereas the function checkbounds(Bool, A, I...)
returns false
in this case. The checkbounds_indices
function discards any information about the array except its tuple axes
and performs a pure index-to-index comparison: this allows a relatively small number of compiled methods to serve a huge variety of array types. Indexes are set as tuples and are usually compared according to the "1-1" scheme, while individual measurements are processed by calling another important function, 'checkindex'.
checkbounds_indices(Bool, (IA1, IA...), (I1, I...)) = checkindex(Bool, IA1, I1) &
checkbounds_indices(Bool, IA, I)
Therefore, the 'checkindex` function checks one dimension. For all these functions, including the non-exportable checkbounds_indices
, there is documentation available when entering the ?
character.
If you need to configure boundary checking for a specific array type, you should specialize checkbounds(Bool, A, I...)
. However, in most cases you can use the checkbounds_indices
function as long as you provide useful axes
values for your array type.
If you have new types of index, first consider the possibility of specialization checkindex
, which handles one index for a specific dimension of the array. If you have a custom multidimensional index type (similar to CartesianIndex
), you may need to consider the possibility of specializing `checkbounds_indices'.
Note that this hierarchy was designed to reduce the likelihood of method ambiguity. We try to make the checkbounds
function a place of specialization of the array type and try to avoid specialization of index types. Conversely, the 'checkindex` function is designed to specialize only the index type (especially the last argument).