Engee documentation
Notebook

Combining categorical arrays

This example shows how to combine arrays of categorical variables.

Creating categorical arrays

Let's create an array of categorical values where the lunch break beverage preferences are stored for 25 students in a group. A.

In [ ]:
Pkg.add("CategoricalArrays")
In [ ]:
using Random, CategoricalArrays
Random.seed!(123)

A = rand(["milk", "juice", "water"], 25)
A = categorical(A, levels=["milk", "juice", "water"], ordered=true) # Passing a vector of labels to set their order
Out[0]:
25-element CategoricalArray{String,1,UInt32}:
 "juice"
 "juice"
 "water"
 "milk"
 "juice"
 "juice"
 "milk"
 "water"
 "juice"
 "milk"
 "juice"
 "water"
 "milk"
 "juice"
 "juice"
 "milk"
 "water"
 "juice"
 "juice"
 "milk"
 "milk"
 "water"
 "water"
 "milk"
 "juice"

Summary statistics for the categorical array:

In [ ]:
Pkg.add( "FreqTables" )
In [ ]:
using FreqTables
freqtable(A)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
milk │ 8
Juice │ 11
Water │ 6

Let's create another categorical array with the wishes of 28 students from the group. B.

In [ ]:
B = rand(["milk", "juice", "water"], 28) # More concise syntax
B = categorical(B)
Out[0]:
28-element CategoricalArray{String,1,UInt32}:
 "milk"
 "milk"
 "milk"
 "juice"
 "water"
 "milk"
 "milk"
 "juice"
 "milk"
 "milk"
 "milk"
 "juice"
 "milk"
 ⋮
 "water"
 "juice"
 "water"
 "milk"
 "water"
 "water"
 "water"
 "milk"
 "juice"
 "water"
 "milk"
 "water"

Summary statistics:

In [ ]:
freqtable(B)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
water │ 9
milk │ 13
Juice │ 6

Combining categorical arrays

Let's combine the data from the classes A and B into one categorical array Group1.

In [ ]:
Group1 = vcat(A, B)
Out[0]:
53-element CategoricalArray{String,1,UInt32}:
 "juice"
 "juice"
 "water"
 "milk"
 "juice"
 "juice"
 "milk"
 "water"
 "juice"
 "milk"
 "juice"
 "water"
 "milk"
 ⋮
 "water"
 "juice"
 "water"
 "milk"
 "water"
 "water"
 "water"
 "milk"
 "juice"
 "water"
 "milk"
 "water"

Summary statistics:

In [ ]:
freqtable(Group1)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
milk │ 21
Juice │ 17
Water │ 15

Creating a categorical array with other categories

Creating a categorical array Group2, containing the wishes of 50 students with an additional drink option: * soda*.

In [ ]:
Group2 = rand(["juice", "milk", "soda", "water"], 50)
Group2 = categorical( Group2 )
Out[0]:
50-element CategoricalArray{String,1,UInt32}:
 "milk"
 "soda"
 "water"
 "soda"
 "soda"
 "water"
 "milk"
 "milk"
 "juice"
 "soda"
 "soda"
 "milk"
 "water"
 ⋮
 "water"
 "soda"
 "juice"
 "juice"
 "juice"
 "soda"
 "water"
 "juice"
 "water"
 "soda"
 "juice"
 "soda"

Summary statistics:

In [ ]:
freqtable(Group2)
Out[0]:
4-element Named Vector{Int64}
Dim1      │ 
──────────┼───
water │ 13
soda , 18
milk │ 7
Juice │ 12

Combining arrays with different categories

Combine the data from Group1 and Group2.

In [ ]:
students = [Group1; Group2]
Out[0]:
103-element CategoricalArray{String,1,UInt32}:
 "juice"
 "juice"
 "water"
 "milk"
 "juice"
 "juice"
 "milk"
 "water"
 "juice"
 "milk"
 "juice"
 "water"
 "milk"
 ⋮
 "water"
 "soda"
 "juice"
 "juice"
 "juice"
 "soda"
 "water"
 "juice"
 "water"
 "soda"
 "juice"
 "soda"

Summary statistics. When combining, the categories unique to the second array (soda) are added to the end of the list of categories in the first array (milk, water, juice, soda).

In [ ]:
freqtable(students)
Out[0]:
4-element Named Vector{Int64}
Dim1      │ 
──────────┼───
milk │ 28
Juice │ 29
Water │ 28
soda , 18

To change the order of categories in the categorical array, use the function levels!.

In [ ]:
levels!(students, ["juice", "milk", "water", "soda"])
levels(students)
Out[0]:
4-element Vector{String}:
 "juice"
 "milk"
 "water"
 "soda"

Combining categorical arrays

To find the unique values of the categories present in Group1 and Group2, you can use the function union.

In [ ]:
C = union(Group1, Group2)
Out[0]:
4-element Vector{CategoricalValue{String, UInt32}}:
 "juice"
 "water"
 "milk"
 "soda"

Conclusion

All categorical arrays in this example were unordered. To combine ordered categorical arrays, they must have the same sets of categories, including their order.