Engee documentation
Notebook

Merging categorical arrays

This example shows how to merge arrays of categorical variables.

Creating categorical arrays

Let's create an array of categorical values that stores lunchtime drink requests for 25 students in a group A.

In [ ]:
using Random, CategoricalArrays
Random.seed!(123)

A = rand(1:3, 25)
A = categorical(A, levels=[1,2,3], labels=["молоко", "сок", "вода"]) # Назначаем метки значениям вручную
Out[0]:
25-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "молоко"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "сок"
 "молоко"
 "молоко"
 "вода"
 "вода"
 "молоко"
 "сок"

Summary statistics on the categorical array:

In [ ]:
Pkg.add( "FreqTables" )
In [ ]:
using FreqTables
freqtable(A)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
вода   │  6
молоко │  8
сок    │ 11

Let's create another categorical array with the wishes of 28 students from the group B.

In [ ]:
B = rand(["молоко", "сок", "вода"], 28) # Более сжатый синтаксис
B = categorical(B)
Out[0]:
28-element CategoricalArray{String,1,UInt32}:
 "сок"
 "вода"
 "вода"
 "вода"
 "вода"
 "вода"
 "сок"
 "сок"
 "молоко"
 "сок"
 "сок"
 "сок"
 "вода"
 ⋮
 "вода"
 "молоко"
 "вода"
 "вода"
 "вода"
 "вода"
 "сок"
 "сок"
 "молоко"
 "сок"
 "вода"
 "вода"

Summary statistics:

In [ ]:
freqtable(B)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
вода   │ 14
молоко │  3
сок    │ 11

Merging categorical arrays

Let's merge data from A and B classes into one categorical array Group1.

In [ ]:
Group1 = [A; B]
Out[0]:
53-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "молоко"
 "сок"
 "вода"
 "молоко"
 ⋮
 "вода"
 "молоко"
 "вода"
 "вода"
 "вода"
 "вода"
 "сок"
 "сок"
 "молоко"
 "сок"
 "вода"
 "вода"

Summary statistics:

In [ ]:
freqtable(Group1)
Out[0]:
3-element Named Vector{Int64}
Dim1   │ 
───────┼───
вода   │ 20
молоко │ 11
сок    │ 22

Creating a categorical array with other categories

Let's create a categorical array Group2, containing the wishes of 50 students with an additional drink option: gazarivka.

In [ ]:
Group2 = rand(["сок", "молоко", "газировка", "вода"], 50)
Group2 = categorical( Group2 )
Out[0]:
50-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "сок"
 "сок"
 "газировка"
 "газировка"
 "молоко"
 "газировка"
 "молоко"
 "газировка"
 "вода"
 "газировка"
 "сок"
 ⋮
 "сок"
 "вода"
 "газировка"
 "газировка"
 "газировка"
 "газировка"
 "сок"
 "газировка"
 "вода"
 "сок"
 "сок"
 "газировка"

Summary statistics:

In [ ]:
freqtable(Group2)
Out[0]:
4-element Named Vector{Int64}
Dim1      │ 
──────────┼───
вода      │  8
газировка │ 19
молоко    │  6
сок       │ 17

Combining arrays with different categories

Let's combine data from Group1 and Group2.

In [ ]:
students = [Group1; Group2]
Out[0]:
103-element CategoricalArray{String,1,UInt32}:
 "сок"
 "сок"
 "вода"
 "молоко"
 "сок"
 "сок"
 "молоко"
 "вода"
 "сок"
 "молоко"
 "сок"
 "вода"
 "молоко"
 ⋮
 "сок"
 "вода"
 "газировка"
 "газировка"
 "газировка"
 "газировка"
 "сок"
 "газировка"
 "вода"
 "сок"
 "сок"
 "газировка"

Summary statistics. When merging, the categories unique to the second array (gazarovka) are added to the end of the list of categories of the first array (milk, water, juice, soda).

In [ ]:
freqtable(students)
Out[0]:
4-element Named Vector{Int64}
Dim1      │ 
──────────┼───
вода      │ 28
газировка │ 19
молоко    │ 17
сок       │ 39

To change the order of categories in a categorical array, we use the function levels!.

In [ ]:
levels!(students, ["сок", "молоко", "вода", "газировка"])
levels(students)
Out[0]:
4-element Vector{String}:
 "сок"
 "молоко"
 "вода"
 "газировка"

Merging categorical arrays

To find unique values of categories present in Group1 and Group2, you can use the function union.

In [ ]:
C = union(Group1, Group2)
Out[0]:
4-element Vector{CategoricalValue{String, UInt32}}:
 "сок"
 "вода"
 "молоко"
 "газировка"

Conclusion

All the categorical arrays in this example were unordered. To merge ordered categorical arrays, they must have the same sets of categories, including their order.