Engee documentation
Notebook

Combining categorical arrays

This example shows how to combine arrays of categorical variables.

Creating categorical arrays

Let's create an array of categorical values where the lunch break beverage preferences are stored for 25 students in a group. A.

In [ ]:
Pkg.add("CategoricalArrays")
In [ ]:
using Random, CategoricalArrays
Random.seed!(123)

A = rand(["молоко", "сок", "вода"], 25)
A = categorical(A, levels=["молоко", "сок", "вода"], ordered=true) # Передаем вектор меток чтобы задать их порядок
Out[0]:
25-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "молоко"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "сок"
 "молоко"
 "молоко"
 "вода"
 "вода"
 "молоко"
 "сок"

Summary statistics for the categorical array:

In [ ]:
Pkg.add( "FreqTables" )
In [ ]:
using FreqTables
freqtable(A)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
молоко │  8
сок    │ 11
вода   │  6

Let's create another categorical array with the wishes of 28 students from the group B.

In [ ]:
B = rand(["молоко", "сок", "вода"], 28) # Более сжатый синтаксис
B = categorical(B)
Out[0]:
28-element CategoricalArray{String,1,UInt32}:
 "молоко"
 "молоко"
 "молоко"
 "сок"
 "вода"
 "молоко"
 "молоко"
 "сок"
 "молоко"
 "молоко"
 "молоко"
 "сок"
 "молоко"
 ⋮
 "вода"
 "сок"
 "вода"
 "молоко"
 "вода"
 "вода"
 "вода"
 "молоко"
 "сок"
 "вода"
 "молоко"
 "вода"

Summary statistics:

In [ ]:
freqtable(B)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
вода   │  9
молоко │ 13
сок    │  6

Combining categorical arrays

Let's combine the data from the classes A and B into one categorical array Group1.

In [ ]:
Group1 = vcat(A, B)
Out[0]:
53-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "молоко"
 "сок"
 "вода"
 "молоко"
 ⋮
 "вода"
 "сок"
 "вода"
 "молоко"
 "вода"
 "вода"
 "вода"
 "молоко"
 "сок"
 "вода"
 "молоко"
 "вода"

Summary statistics:

In [ ]:
freqtable(Group1)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
молоко │ 21
сок    │ 17
вода   │ 15

Creating a categorical array with other categories

Creating a categorical array Group2, containing the wishes of 50 students with an additional drink option: * soda*.

In [ ]:
Group2 = rand(["сок", "молоко", "газировка", "вода"], 50)
Group2 = categorical( Group2 )
Out[0]:
50-element CategoricalArray{String,1,UInt32}:
 "молоко"
 "газировка"
 "вода"
 "газировка"
 "газировка"
 "вода"
 "молоко"
 "молоко"
 "сок"
 "газировка"
 "газировка"
 "молоко"
 "вода"
 ⋮
 "вода"
 "газировка"
 "сок"
 "сок"
 "сок"
 "газировка"
 "вода"
 "сок"
 "вода"
 "газировка"
 "сок"
 "газировка"

Summary statistics:

In [ ]:
freqtable(Group2)
Out[0]:
4-element Named Vector{Int64}
Dim1      │ 
──────────┼───
вода      │ 13
газировка │ 18
молоко    │  7
сок       │ 12

Combining arrays with different categories

Combine the data from Group1 and Group2.

In [ ]:
students = [Group1; Group2]
Out[0]:
103-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "молоко"
 "сок"
 "вода"
 "молоко"
 ⋮
 "вода"
 "газировка"
 "сок"
 "сок"
 "сок"
 "газировка"
 "вода"
 "сок"
 "вода"
 "газировка"
 "сок"
 "газировка"

Summary statistics. When combining, the categories unique to the second array (soda) are added to the end of the list of categories in the first array (milk, water, juice, soda).

In [ ]:
freqtable(students)
Out[0]:
4-element Named Vector{Int64}
Dim1      │ 
──────────┼───
молоко    │ 28
сок       │ 29
вода      │ 28
газировка │ 18

To change the order of categories in the categorical array, use the function levels!.

In [ ]:
levels!(students, ["сок", "молоко", "вода", "газировка"])
levels(students)
Out[0]:
4-element Vector{String}:
 "сок"
 "молоко"
 "вода"
 "газировка"

Combining categorical arrays

To find the unique values of the categories present in Group1 and Group2, you can use the function union.

In [ ]:
C = union(Group1, Group2)
Out[0]:
4-element Vector{CategoricalValue{String, UInt32}}:
 "сок"
 "вода"
 "молоко"
 "газировка"

Conclusion

All categorical arrays in this example were unordered. To combine ordered categorical arrays, they must have the same sets of categories, including their order.