Engee documentation

Working with big data via WorkspaceArray

Purpose of WorkspaceArray

WorkspaceArray is a structure for working with large time series (e.g. simulation results). It implements methods that allow iterating without completely dumping the data into memory. This is achieved by dividing the data into parts (chunks) that can be processed independently from the rest.

Big Data are large data arrays that are often inefficient to handle using conventional methods. The size of such data can significantly exceed the maximum memory capacity of Engee. The WorkspaceArray structure helps not to fill up RAM by splitting large data arrays into parts.

WorkspaceArray divides data into parts no larger than 200 MB and works with only one of them at a time, which allows you not to be afraid of Engee memory overflow. WorkspaceArray is created after model simulation with To Workspace block or manually by user.

The fact that you can create a WorkspaceArray when working with data does not mean that the data is big. One of the purposes of a WorkspaceArray is to provide a convenient representation of the data, which is also useful when working with regular-sized data arrays.

Therefore, even if your data arrays are not large, you can work with the WorkspaceArray structure in the following scenarios:

  1. Export the WorkspaceArray to CSV format.

  2. Use functions to manage the WorkspaceArray depending on your tasks.

  3. Use WorkspaceArray in modelling with the To Workspace and From Workspace blocks.

Creating a WorkspaceArray

It is possible to create a WorkspaceArray either directly, via a DataFrame, or from a CSV file. Engee uses two constructors for this purpose:

  • DataFrame constructor - writes a WorkspaceArray through a DataFrame structure.

  • CSSV constructor - writes the WorkspaceArray using a CSV file.

Constructors in Julia are special functions that are used to create new objects of certain types. Constructors define how objects should be initialised when created.

Let’s talk about these constructors in more detail, but first let’s create a DataFrame.

Creating a DataFrame

DataFrame is a data structure provided by the DataFrames.jl package and is a table with row and column labels. A DataFrame is used to organise and process data into a table where each column represents a variable or attribute and each row represents a separate observation or record (such as a value at a point in time). A DataFrame provides convenient access to data and can be processed using various methods in Julia.

Create a DataFrame by setting the time (time values) and value (values) columns:

using DataFrames # подключаем модуль DataFrames
times = [2 ^ i for i in LinRange(1, 3, 1000)]
values = [sin(i ^ 2 + 1) * 2 + cos(i) for i in times]
data_frame = DataFrame(time = times, value = values) # создадим DataFrame с двумя столбцами — time и value

Let’s break down the code for creating a DataFrame:

  1. An array times is created which contains 1000 elements. The values of the elements are generated using the 2 ^ i function, where i varies in the range from 1 to 3 at equal intervals. Hence, LinRange(1, 3, 1000) creates an evenly spaced range from 1 to 3 with 1000 elements and then for each value from this range, the value is calculated .

  2. An array values is created which contains 1000 elements. The values of the elements are calculated based on the values from the times array. For each value i from the times array, the sine of the square of that value is calculated, 1 is added to the result, then the resulting number is multiplied by 2 and added to the cosine of the value i.

  3. Arrays are used to create a DataFrame with two columns, time and value, where each column is the values of the times and values arrays respectively.

The WorkspaceArray structure contains several custom fields:

  • :time and :value - provide a new WorkspaceArray structure with the appropriate time and value labelling. The value field can be not only a scalar but also a multidimensional array.

  • :name - an identifier that helps to identify which run the variable belongs to in case the WorkspaceArray is derived from the To Workspace block.

Having obtained a DataFrame, we use its constructor to create a WorkspaceArray. Let’s take a closer look at it.

Creating a WorkspaceArray from the DataFrame constructor

Before we start working with WorkspaceArray, we need to create a DataFrame data structure represented as a table.

This constructor cannot be used without creating a DataFrame.

The DataFrame constructor is designed to write the WorkspaceArray through the DataFrame. The two columns time and value must satisfy the conditions:

  • time must not contain values of type Any.

  • value - must not be of type Any.

  • value - must be a descendant of one of the supported data types: Matrix{<:Number}, Vector{<:Number}, or Number.

Let’s apply the DataFrame constructor to create a WorkspaceArray via the my_workspaceArray variable:

my_workspacearray = WorkspaceArray("my_data_frame", data_frame)
Output
WorkspaceArray("my_data_frame")

Consider the code of the constructor:

  • A my_workspacearray variable of type WorkspaceArray is created.

  • The variable contains data from data_frame (the DataFrame we created).

  • The array of data from data_frame is now available in Engee workspace under the name "my_data_frame".

As a result of the code execution, the data from data_frame will be available in the Engee workspace under the name my_data_frame, allowing this data to be accessed using the my_workspacearray variable.

Creating a WorkspaceArray from the CSV constructor

The CSV constructor is designed to create a WorkspaceArray from a CSV format file. Let’s consider this constructor:

workspacecsv = WorkspaceArray("workspacearray_csv", "/user/workspacearray_csv.csv") # где "/user/workspacearray_csv.csv" — путь до CSV-файла, а "workspacearray_csv" — его имя
Output
WorkspaceArray("workspacearray_csv")

The code creates a new WorkspaceArray named workspacecsv into which data from the CSV file workspacearray_csv.csv is loaded.

The CSV file from which the WorkspaceArray is created must contain the delimiters specified in section workspaceArray_csv.csv. Exporting WorkspaceArray to CSV.

Usage of WorkspaceArray

Exporting WorkspaceArray to CSV

To export a WorkspaceArray to a CSV file, we use a DataFrame structure (our data_frame) as follows:

using CSV # подключаем модуль CSV
CSV.write("/user/workspacearray_csv.csv", delim="\t", data_frame)
Usage of the delim="\t parameter (creating a tab delimiter) is a prerequisite for creating a CSV file from a WorkspaceArray. Absence of parameters will generate an error Error when creating WorkspaceArray from csv: BoundsError and create an empty CSV file.

Consider the code:

  • The CSV.write() function writes the data from data_frame (called DataFrame by us) to a CSV file named "workspacearray_csv.csv".

  • The parameters delim="\t" specifies that tabulation will be used as a column separator.

Exporting data from WorkspaceArray to a CSV file can be useful for several reasons:

  • Data Sharing - Exporting data to CSV allows you to share information from the WorkspaceArray with other users or programmes that can handle this format.

  • Backup - exporting data to CSV format can be used to create a backup copy of data in case of programme failure.

Downloading data to memory

You can save all data from WorkspaceArray to Engee RAM using collect procedure.

Usage of the collect procedure is not recommended when working with large data, as it may cause memory overflow in Engee.

Depending on the parameters specified, collect returns an object of type DataFrame with columns time and value (time and value fields). You can also specify to return only one of the columns via a call to the time or value field. To do this, bind all WorkspaceArray data to a variable. For example:

workspace_collected = collect(my_workspacearray)

After initialising a variable with WorkspaceArray data, the following commands become available:

collect(workspace_collected) # выводит WorkspaceArray с двумя столбцами time и value

collect(workspace_collected.time) # выводит один столбец time (время)

collect(workspace_collected.value) # выводит один столбец value (значение)

collect(workspace_collected.time[1]) # выводит первое значение из столбца time

collect(workspace_collected.value[1]) # выводит первое значение из столбца value

collect(pairs(workspace_collected)) # возвращает DataFrame с двумя столбцами индекс-значение

WorkspaceArray functions

The WorkspaceArray structure implements the functions described in AbstractVector. Let’s consider the main functions of interaction with WorkspaceArray:

length - returns the number of elements in the collection.

The length function in Julia takes as an argument the collection for which you want to determine the length.

If you want to know the length of a string variable, array or any other collection, you pass this collection as an argument to the length function.

Uses the lastindex indexing construct to get the last valid index of a collection.

*Example:

length(my_workspacearray) # my_workspacearray — переменная, содержащая данные о WorkspaceArray
Collections in Julia are data structures containing a set of items organised into a single entity. Collections in Julia can be of different types and are designed to store, organise and manage data. Depending on the type of collection, it can be either changeable or immutable.
size - defines dimensions of arrays and other collections.

Returns a tuple containing the dimensions of my_workspacearray. Optionally, you can specify a dimension to get the length of only that dimension.

Note that size may not be defined for arrays with non-standard indices.

For index size, the indexing style IndexStyle = IndexLinear() is used.

*Example:

size(my_workspacearray) # my_workspacearray — переменная, содержащая данные о WorkspaceArray
axes - gets index ranges.

Returns a tuple in which each element represents an index range for the corresponding array dimension.

Example:

axes(my_workspacearray) # my_workspacearray — переменная, содержащая данные о WorkspaceArray
getindex - returns a subset.

Used to access elements of containers (arrays, tuples, dictionaries, etc.) by their index or key.

Example:

getindex(my_workspacearray.time[1]) # выдаст значение времени первого time в WorkspaceArray
IndexStyle - specifies own indexing style.

When defining a new AbstractArray type, you can implement either linear indexing (using IndexLinear) or Cartesian indexing (using IndexCartesian).

If you choose to implement only linear indexing, you must specify this point for the array type as = IndexLinear():

*Example:

Base.IndexStyle(my_workspacearray) = IndexLinear() # указание линейного индексирования

The indexing style is always IndexLinear (the structure used to represent the linear indices of array elements).

The internal indexing mechanism in Julia will automatically recalculate all indexing operations to the preferred style. This will allow you to access array elements using either indexing style, even if no explicit methods have been provided.

*Example:

IndexStyle(my_workspacearray) # выведет IndexLinear() (использует линейные индексы для доступа элементам)

IndexStyle(typeof(my_workspacearray)) # выведет IndexCartesian() (использует декартовы индексы для доступа к его элементам)
sizeof - getting the size of an object in memory.

It is used to get the size (in bytes) of an object in memory. Object size can be useful information when working with large data or optimising memory usage.

Example:

sizeof(collect(my_workspacearray))
iterate - continues execution of the iterator.

Continues execution of the iterator to get the next element. If there are no elements left, nothing is returned. Otherwise, returns a double tuple of the next element and a new iteration state. For more information on defining a custom iterated type, see the manual section on iteration interface.

*Example:

iterate(my_workspacearray, 5)
empty! - removes all items from collection.

Deletes all items from collection:

*Example:

empty!(my_workspacearray) # создаст пустой упорядоченный словарь с ключами из целых чисел(Int64) и значений из массивов (Array)

*Conclusion.

OrderedCollections.OrderedDict{Int64, Array}()
copy - creates a surface copy.

Creates a surface copy. The external structure is copied, but not the internal values. For example, copying an array creates an array with the same elements as the original one.

*Example:

copy(my_workspacearray)
similar - creating a new array.

Used to create a new array with the same size, type and, if necessary, the same memory allocation properties as an existing array or other similar object.

Example:

my_workspacearray1 = similar(my_workspacearray) # создает переменную my_workspacearray1, аналогичную my_workspacearray

WorkspaceArray for modelling

WorkspaceArray is also used to transfer data between the model and the workspace. You can use blocks in the model for this purpose:

  • The To Workspace block writes input data to the Engee workspace. The result of writing the data is a variable with the WorkspaceArray type. The variable name is set by the Variable name parameters in the block settings, and the block itself supports only scalar and multidimensional data numbers.

  • The From Workspace block reads data from the Workspace and represents it as a signal. The From Workspace block reads data from the workspace only with the WorkspaceArray type. Depending on the data to be loaded, the output signal can be scalar, vector or multidimensional. The From Workspace block supports loading scalar and multidimensional data, as does the To Workspace block. From Workspace is also used to transfer data to any model or subsystem in the workspace available to that model or subsystem.

Separately, let’s note the simout variable, which stores the simulation results of the model and has the SimulationResult data type. This data type stores key-value pairs, where the key is a path of type "model/system/block/port number" and the value is a WorkspaceArray. Such pairs are created for each recorded signal and form a DataFrame data structure automatically populated with simulation results data. Read more about the simout variable in Software processing of simulation results in Engee.

The To Workspace and From Workspace blocks work only with variables of WorkspaceArray type (read data only from this particular type).

For an example of usage of WorkspaceArray with To Workspace and From Workspace blocks see here.