Engee documentation

Missing values

Julia provides support for representing missing values in a statistical sense. This is intended for situations where no value is available for the observed variable, but a valid value theoretically exists. Missing values are represented through an object missing, which is a single instance of the type Missing. missing is equivalent to https://en.wikipedia.org/wiki/NULL_ (SQL)[NULL in SQL] and https://cran.r-project.org/doc/manuals/r-release/R-lang.html#NA-handling [NA in R] and behaves like them in most situations.

Propagation of missing values

Missing values are propagated automatically when passed to standard mathematical operators and functions. For these functions, the uncertainty of the value of one of the operands causes uncertainty of the result. In practice, this means that a mathematical operator that uses a missing value usually returns missing.:

julia> missing + 1
missing

julia> "a" * missing
missing

julia> abs(missing)
missing

Since missing is a regular Julia object, this distribution rule only works for functions that explicitly agreed to implement this behavior. This can be achieved in the following ways:

  • by adding a specific method defined for arguments of type Missing;

  • by taking arguments of this type and passing them to functions that propagate them (for example, standard mathematical operators). Packages should consider whether it makes sense to distribute missing values when defining new functions and, if so, define methods accordingly. Passing a missing (missing) value to a function that does not have a method that accepts arguments of the Missing type outputs MethodError is exactly the same as for any other type.

Functions that do not distribute missing values can be made to do so by wrapping them in the passmissing function provided by the package. https://github.com/JuliaData/Missings.jl [Missings.jl]. For example, f(x) becomes `passing(f)(x)'.

Equality and comparison operators

The standard equality and comparison operators follow the above distribution rule: if any of the operands are missing, the result is missing. Here are some examples:

julia> missing == 1
missing

julia> missing == missing
missing

julia> missing < 1
missing

julia> 2 >= missing
missing

In particular, note that missing == missing returns missing, so == cannot be used to check if the value is missing. To check if the x is missing, use ismissing(x).

Special comparison operators isequal and === are an exception to the distribution rule. They will always return the value Bool, even if there are missing values, treating missing as equal to missing and as distinct from any other value. Therefore, they can be used to check whether the value is missing (missing).:

julia> missing === 1
false

julia> isequal(missing, 1)
false

julia> missing === missing
true

julia> isequal(missing, missing)
true

Another exception is the operator isless: missing' is considered greater than any other value. This operator is used `sort!, which consequently places the missing values after all other values:

julia> isless(1, missing)
true

julia> isless(missing, Inf)
false

julia> isless(missing, missing)
false

Logical operators

Logical (or Boolean) operators |, & and 'xor` is another special case, as they propagate missing values only when it is logically required. For these operators, whether the result is undefined or not depends on the specific operation. This follows from firmly established rules. https://en.wikipedia.org/wiki/Three-valued_logic [three-digit logic], which are implemented, for example, NULL in SQL and NA in R. This abstract representation corresponds to relatively natural behavior, which is best explained by concrete examples.

Let’s illustrate this principle with the logical operator "or". |. Following the rules of Boolean logic, if one of the operands has the value true, the value of the other operand does not affect the result, which will always be true.:

julia> true | true
true

julia> true | false
true

julia> false | true
true

Based on this observation, we can conclude that if one of the operands has the value true and the other has the value missing, we know that the result is true, despite the uncertainty about the actual value of one of the operands. If we could observe the actual value of the second operand, its possible values could be true or false, and in both cases the result would be true. Therefore, in this particular case, the absence of the __ value does not apply.:

julia> true | missing
true

julia> missing | true
true

Conversely, if one of the operands has the value false, the result can be either true or false', depending on the value of the other operand. Therefore, if the operand has the value `missing, the result must also have the value missing.:

julia> false | true
true

julia> true | false
true

julia> false | false
false

julia> false | missing
missing

julia> missing | false
missing

Behavior of the logical operator "and" & is similar to the behavior of the operator |, with the difference that the absence of a value does not apply when one of the operands has the value `false'. For example, if this is the case for the first operand:

julia> false & false
false

julia> false & true
false

julia> false & missing
false

On the other hand, the absence of a value is propagated when one of the operands has the value `true', for example, the first:

julia> true & true
true

julia> true & false
false

julia> true & missing
missing

Finally, the logical operator is the exclusive "or" 'xor` always distributes missing values, since both operands always affect the result. Also note that the operation of logical negation ! returns the value missing when the operand is also missing (missing), like any other unary operator.

The order of execution and the calculation operator according to the abbreviated scheme

Execution order operators, including if, while and ternary operator x ? y : z, do not allow missing values. This is due to uncertainty about whether the actual value would be true or false' if we could observe it. This implies that we don’t know how the program should behave. In this case, it is issued `TypeError, as soon as the value missing is encountered in this context:

julia> if missing
           println("here")
       end
ERROR: TypeError: non-boolean (Missing) used in boolean context

For the same reason, unlike the logical operators presented above, logical operators are calculated using an abbreviated scheme. && and [||](../ base/math.md#

) do not allow missing values in situations in which the value of the operands determines whether the next operand is evaluated or not. For example:

julia> missing || false
ERROR: TypeError: non-boolean (Missing) used in boolean context

julia> missing && false
ERROR: TypeError: non-boolean (Missing) used in boolean context

julia> true && missing && false
ERROR: TypeError: non-boolean (Missing) used in boolean context

On the contrary, an error is not returned when the result can be determined without missing values. This is the case when the code uses calculations according to a shortened scheme before calculating the missing (missing) operand and when the missing (missing) operand is the last one.:

julia> true && missing
missing

julia> false && missing
false

Arrays with missing values

You can create arrays containing missing values, just like other arrays.:

julia> [1, missing]
2-element Vector{Union{Missing, Int64}}:
 1
  missing

As this example shows, the element type of such arrays is Union{Missing, T}, with type T missing values. This reflects the fact that the entries in this array can be either of type T (here Int64) or of type Missing'. This type of array uses efficient memory storage, similar to `+Array'.{T}+, which contains the actual values combined with Array{UInt8}, indicating the type of record (that is, whether it is missing (Missing) or T).

Arrays that allow missing values can be constructed using the standard syntax. Use an Array{Union{Missing, T}}(missing, dims) to create arrays filled with missing values:

julia> Array{Union{Missing, String}}(missing, 2, 3)
2×3 Matrix{Union{Missing, String}}:
 missing  missing  missing
 missing  missing  missing

Using undef or similar may currently produce an array filled with missing values, but this is not the correct way to obtain such an array. Use the missing constructor instead, as shown above.

An array with an element type that allows for missing entries (for example, Vector{Union{Missing, T}}), which does not contain missing entries, can be converted to an array type that does not allow missing entries (for example, Vector{T}) using convert. If the array contains the values missing, a MethodError is issued during conversion:

julia> x = Union{Missing, String}["a", "b"]
2-element Vector{Union{Missing, String}}:
 "a"
 "b"

julia> convert(Array{String}, x)
2-element Vector{String}:
 "a"
 "b"

julia> y = Union{Missing, String}[missing, "b"]
2-element Vector{Union{Missing, String}}:
 missing
 "b"

julia> convert(Array{String}, y)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type String

Skipping missing values

Since missing values are propagated using standard mathematical operators, reduction functions return missing when called for arrays containing missing values.:

julia> sum([1, missing])
missing

In this situation, use the function skipmissing to skip missing values:

julia> sum(skipmissing([1, missing]))
1

This auxiliary function returns an iterator that effectively filters the missing values. Therefore, it can be used with any function that supports iterators.:

julia> x = skipmissing([3, missing, 2, 1])
skipmissing(Union{Missing, Int64}[3, missing, 2, 1])

julia> maximum(x)
3

julia> sum(x)
6

julia> mapreduce(sqrt, +, x)
4.146264369941973

Objects created by calling 'skipmissing` on an array can be indexed using indexes from the parent array. Indexes corresponding to missing values are invalid for these values, and an error is returned when trying to use them (they also skip keys and `eachindex'):

julia> x[1]
3

julia> x[2]
ERROR: MissingException: the value at index (2,) is missing
[...]

This allows functions that work with indexes to work in combination with skipmissing'. This is, to a large extent, a case of searching and finding functions. These functions return the indexes allowed for the object returned by `skipmissing, as well as the indexes of matching entries in the parent array._:

julia> findall(==(1), x)
1-element Vector{Int64}:
 4

julia> findfirst(!iszero, x)
1

julia> argmax(x)
1

Use collect to extract non-missing values and store them in an array:

julia> collect(x)
3-element Vector{Int64}:
 3
 2
 1

Logical operations with arrays

The three-digit logic for logical operators described above is also used by logical functions applied to arrays. Thus, checking the equality of arrays using the operator '== returns `missing in all cases where the result cannot be determined without knowing the actual value of the missing record. In practice, this means that missing is returned if all non-missing values of the compared arrays are equal, but one or both arrays contain missing values (possibly in different positions):

julia> [1, missing] == [2, missing]
false

julia> [1, missing] == [1, missing]
missing

julia> [1, 2, missing] == [1, missing, 2]
missing

As for individual values, use 'isequal` to treat missing (missing) values as equal to other missing (missing) values, but different from non-missing values:

julia> isequal([1, missing], [1, missing])
true

julia> isequal([1, 2, missing], [1, missing, 2])
false

Functions any and all also follow the rules of three-digit logic. Thus, missing is returned when the result cannot be determined.:

julia> all([true, missing])
missing

julia> all([false, missing])
false

julia> any([true, missing])
true

julia> any([false, missing])
missing