Engee documentation
Notebook

Reading EDF files

This example demonstrates the process of downloading, analyzing, and visualizing recorded data in the EDF (European Data Format) format, a standard format for storing biomedical signals.

About the EDF format

EDF (European Data Format) is an open standard for storing and exchanging multichannel biosignals, widely used in medicine and scientific research. It is used to record EEG, ECG, EMG, respiratory signals, eye movements, blood oxygen saturation and other physiological data.

EDF is designed to ensure compatibility between equipment from different manufacturers and different software. Due to its fixed structure, this format is easily processed by many data analysis tools.

Using the EDF format

Function engee.clear() cleans up workspaces:

In [ ]:
engee.clear()

We will connect using the function include the file "edfread.jl" for reading EDF files:

In [ ]:
include("$(@__DIR__)/edfread.jl")
Out[0]:
edfread (generic function with 1 method)

Function edfread It is intended for reading data in EDF format. Structure hdr The information returned by this function contains the full meta information about the record:

General recording parameters:

  • ver — EDF format version

  • patientID — patient ID

  • recordID — record ID

  • startdate and starttime — date and time of the start of recording

  • bytes — header size in bytes

  • records — the number of data blocks in the file

  • duration — the duration of one block in seconds

  • ns — the number of channels in the recording

Parameters of each channel:

  • labels — channel names

  • transducers — type of sensors

  • physicalDims — physical units of measurement

  • physicalMins and physicalMaxs — minimum and maximum physical values

  • digitalMins and digitalMaxs — minimum and maximum digital values

  • prefilters — filters applied during recording

  • samples — the number of samples in one block for each channel

Checking on test data

Standardized test data from the [EDF/BDF Test Files] resource is used to verify the correctness of the algorithms for reading and processing EDF files (https://teuniz.net/edf_bdf_testfiles /).

A file is used to demonstrate how to work with the EDF format. test_generator.edf. This file contains multi-channel data for testing and verifying reading algorithms.

To work with data in the EDF format, use the function edfread, which will perform file reading, metadata extraction, and signal loading. As a result of her work, we will get two objects.:

  • Header structure hdr with recording parameters;
  • array record, containing data from all channels.
In [ ]:
hdr, record = edfread("$(@__DIR__)/test_generator.edf")
Out[0]:
(EDFHeader(0.0, "test file", "EDF generator", "02.10.08", "14.27.00", 4352, 900, 1.0, 16, ["F4", "F3", "X10", "FP2", "P4", "C4", "P3", "C3", "X9", "FP1", "F8", "F7", "DC01", "DC04", "DC03", "DC02"], ["AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "AgAgCl", "Respiration", "SaO2", "BPM", ""], ["uV", "uV", "mV", "uV", "uV", "uV", "uV", "uV", "uV", "uV", "uV", "mV", "V", "%", "BPM", ""], [-3200.0, -3200.0, -1.6, -3200.0, -3200.0, -3200.0, -3200.0, -3200.0, -3200.0, -3200.0, -3200.0, -16.0, 0.0, -1200.0, -1200.0, -32768.0], [3200.0, 3200.0, 1.6, 3200.0, 3200.0, 3200.0, 3200.0, 3200.0, 3200.0, 3200.0, 3200.0, 16.0, 12.0, 1200.0, 1200.0, 32767.0], [-32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, 0.0, -32768.0, -32768.0, -32768.0], [32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0], ["HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz", "HP:0.015Hz"], [200, 100, 200, 200, 50, 100, 200, 200, 200, 200, 200, 200, 200, 25, 25, 25]), [-799.9633783474479 -799.9633783474479 … 800.0610360875868 800.0610360875868; -800.7446402685588 -750.646219577325 … NaN NaN; … ; 60.004577706569066 60.004577706569066 … NaN NaN; 16384.0 16384.0 … NaN NaN])

For ease of review and verification of uploaded data from the header structure hdr The key parameters of the record are extracted and displayed.

In [ ]:
println("Format version:        ", hdr.ver)
println("Patient ID:           ", strip(hdr.patientID))
println("Description of the record:       ", strip(hdr.recordID))
println("Start date/time:     ", hdr.startdate, " ", hdr.starttime)
println("Number of channels:    ", hdr.ns)
println("Number of entries:    ", hdr.records)
println("Total duration:    ", hdr.records * hdr.duration, " sec")
Версия формата:        0.0
ID пациента:           test file
Описание записи:       EDF generator
Дата/время начала:     02.10.08 14.27.00
Количество каналов:    16
Количество записей:    900
Общая длительность:    900.0 сек

Let's create a table with the characteristics of each channel: its name, physical range of values, units of measurement, and sampling rate.

In [ ]:
println("  No. | Channel | Range (min/max) | Unit | Frequency, Hz")
println("----------------------------------------------------------")

for ch in 1:hdr.ns
    label = hdr.labels[ch]

    # we take a row of the matrix and remove NaN (padding)
    row = record[ch, :]
    row = row[.!isnan.(row)]

    dataMin = round(minimum(row); digits = 2)
    dataMax = round(maximum(row); digits = 2)

    units = hdr.physicalDims[ch]
    units = units == "" ? "-" : units

    fs = round(hdr.samples[ch] / hdr.duration; digits = 2)

    println(
        lpad(ch, 2), " | ",
        rpad(label, 8), " | ",
        lpad(string(dataMin), 8), " / ",
        rpad(string(dataMax), 8), " | ",
        rpad(units, 6), " | ",
        fs
    )
end
 № |  Канал   | Диапазон (мин/макс) | Ед.изм | Частота, Гц
----------------------------------------------------------
 1 | F4       |  -799.96 / 800.06   | uV     | 200.0
 2 | F3       |  -800.74 / 800.84   | uV     | 100.0
 3 | X10      |     -0.8 / 0.8      | mV     | 200.0
 4 | FP2      |  -3200.0 / 3200.0   | uV     | 200.0
 5 | P4       |   -798.3 / 798.4    | uV     | 50.0
 6 | C4       |   -798.3 / 798.4    | uV     | 100.0
 7 | P3       |  -798.99 / 799.08   | uV     | 200.0
 8 | C3       |   -798.3 / 798.4    | uV     | 200.0
 9 | X9       |   -798.3 / 798.4    | uV     | 200.0
10 | FP1      |  -799.96 / 800.06   | uV     | 200.0
11 | F8       |  -692.74 / 692.83   | uV     | 200.0
12 | F7       |     -4.0 / 4.0      | mV     | 200.0
13 | DC01     |      0.0 / 6.0      | V      | 200.0
14 | DC04     |    100.0 / 100.0    | %      | 25.0
15 | DC03     |     60.0 / 60.0     | BPM    | 25.0
16 | DC02     |  16384.0 / 16384.0  | -      | 25.0

To verify the correctness of reading and interpreting the data, we will compare the uploaded metadata with the reference information provided on the [EDF/BDF Test Files] page (https://teuniz.net/edf_bdf_testfiles /).

 signal label  waveform       physical range         f         sf
 --------------------------------------------------------------------
    1    F4     block          +800uV/-800uV          1Hz       200Hz
    2    F3     triangle       +800uV/-800uV          3Hz       100Hz
    3    X10    impulse        +0.8mV/-0.8mV          5Hz       200Hz
    4    FP2    noise          +3200uV/-3200uV        -Hz       200Hz
    5    P4     sine           +800uV/-800uV          1Hz        50Hz
    6    C4     sine           +800uV/-800uV          2Hz       100Hz
    7    P3     sine           +800uV/-800uV          3Hz       200Hz
    8    C3     sine           +800uV/-800uV          4Hz       200Hz
    9    X9     sine           +800uV/-800uV          8Hz       200Hz
   10    FP1    sine           +800uV/-800uV         16Hz       200Hz
   11    F8     sine           +800uV/-800uV         32Hz       200Hz
   12    F7     triangle       +4mV/-4mV              5Hz       200Hz
   13    DC01   sine square    +6V/-0V                5Hz       200Hz
   14    DC04   DC             +100%                  -Hz        25Hz
   15    DC03   DC             +60BPM                 -Hz        25Hz
   16    DC02   DC             +16384                 -Hz        25Hz

Thus, the uploaded data corresponds to the description of the test file, which confirms that the function is working correctly. edfread.

Let's build waveforms of the first 5 seconds of multi-channel recording to compare the resulting graphs with the test image.

In [ ]:
t_max = 5.0     # time limit, with
nchan = size(record, 1) # number of channels

plt = plot(
    layout = (nchan, 1),
    size   = (1000, 200*nchan),
    margin = 20*Plots.px
)

for ch in 1:nchan
    # Channel sampling rate
    fs = hdr.samples[ch] / hdr.duration

    # Maximum number of samples per channel
    n_time = min(Int(round(t_max * fs)), hdr.samples[ch] * hdr.records)

    # Time and signal
    t = (0:n_time-1) ./ fs
    y = record[ch, 1:n_time]

    # Units of measurement
    units = hdr.physicalDims[ch]
    units = units == "" ? "-" : units

    plot!(
        plt[ch],
        t, y,
        label  = "$(hdr.labels[ch])",
        xlabel = "Time, from",
        ylabel = units,
        legend = :topright
    )
end

display(plt)
image.png

To test the function operation edfread using the extended EDF+ format, we will upload the file test_generator_2.edf.

In [ ]:
hdr, record = edfread("$(@__DIR__)/test_generator_2.edf")
Out[0]:
(EDFHeader(0.0, "X X X X", "Startdate 10-DEC-2009 X X test_generator", "10.12.09", "12.44.02", 3328, 600, 1.0, 12, ["squarewave", "ramp", "pulse", "ECG", "noise", "sine1Hz", "sine8Hz", "sine85Hz", "sine15Hz", "sine17Hz", "sine50Hz", "EDFAnnotations"], ["", "", "", "", "", "", "", "", "", "", "", ""], ["uV", "uV", "uV", "uV", "uV", "uV", "uV", "uV", "uV", "uV", "uV", ""], [-1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1000.0, -1.0], [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1.0], [-32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0, -32768.0], [32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0, 32767.0], ["", "", "", "", "", "", "", "", "", "", "", ""], [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 51]), [99.99237048905174 99.99237048905174 … -99.96185244525817 -99.96185244525817; -99.96185244525817 -98.95475700007621 … 98.0086976424812 98.98527504386978; … ; 99.99237048905174 0.015259021896781633 … -99.96185244525817 0.015259021896781633; 0.37633325703822385 0.1568780041199359 … NaN NaN])

Similarly to the previous file, we extract and analyze the key parameters of the record.

In [ ]:
println("Format version:         ", hdr.ver)
println("Patient ID:            ", strip(hdr.patientID))
println("Description of the record:        ", strip(hdr.recordID))
println("Start date/time:      ", hdr.startdate, " ", hdr.starttime)
println("Number of channels:     ", hdr.ns)
println("Number of entries:     ", hdr.records)
println("Total duration:     ", hdr.records * hdr.duration, " sec")
Версия формата:         0.0
ID пациента:            X X X X
Описание записи:        Startdate 10-DEC-2009 X X test_generator
Дата/время начала:      10.12.09 12.44.02
Количество каналов:     12
Количество записей:     600
Общая длительность:     600.0 сек

Let's create a table with the characteristics of each channel: its name, physical range of values, units of measurement, and sampling rate.

In [ ]:
println("  No. | Channel | Range (min/max) | Unit | Frequency, Hz")
println("------------------------------------------------------------")

for ch in 1:hdr.ns-1
    label = hdr.labels[ch]

    # we take a row of the matrix and remove NaN (padding)
    row = record[ch, :]
    row = row[.!isnan.(row)]

    dataMin = round(minimum(row); digits = 2)
    dataMax = round(maximum(row); digits = 2)

    units = hdr.physicalDims[ch]
    units = units == "" ? "-" : units

    fs = round(hdr.samples[ch] / hdr.duration; digits = 2)

    println(
        lpad(ch, 2), " | ",
        rpad(label, 10), " | ",
        lpad(string(dataMin), 8), " / ",
        rpad(string(dataMax), 8), " | ",
        rpad(units, 6), " | ",
        fs
    )
end
 № |   Канал    | Диапазон (мин/макс) | Ед.изм | Частота, Гц
------------------------------------------------------------
 1 | squarewave |   -99.96 / 99.99    | uV     | 200.0
 2 | ramp       |   -99.96 / 98.99    | uV     | 200.0
 3 | pulse      |     0.02 / 99.99    | uV     | 200.0
 4 | ECG        |   -17.32 / 61.48    | uV     | 200.0
 5 | noise      |     0.02 / 98.99    | uV     | 200.0
 6 | sine1Hz    |   -99.96 / 99.99    | uV     | 200.0
 7 | sine8Hz    |   -99.78 / 99.81    | uV     | 200.0
 8 | sine85Hz   |   -99.96 / 99.99    | uV     | 200.0
 9 | sine15Hz   |   -99.96 / 99.99    | uV     | 200.0
10 | sine17Hz   |   -99.96 / 99.99    | uV     | 200.0
11 | sine50Hz   |   -99.96 / 99.99    | uV     | 200.0

To verify the correctness of reading and interpreting the data, we will compare the uploaded metadata with the reference information provided on the [EDF/BDF Test Files] page (https://teuniz.net/edf_bdf_testfiles /).

 signal label/waveform  amplitude    f       sf
---------------------------------------------------
   1    squarewave        100 uV    0.1Hz   200 Hz
   2    ramp              100 uV    1 Hz    200 Hz
   3    pulse             100 uV    1 Hz    200 Hz
   4    ECG               100 uV    1 Hz    200 Hz
   5    noise             100 uV    - Hz    200 Hz
   6    sine 1 Hz         100 uV    1 Hz    200 Hz
   7    sine 8 Hz         100 uV    8 Hz    200 Hz
   8    sine 8.5 Hz       100 uV    8.5Hz   200 Hz
   9    sine 15 Hz        100 uV   15 Hz    200 Hz
  10    sine 17 Hz        100 uV   17 Hz    200 Hz
  11    sine 50 Hz        100 uV   50 Hz    200 Hz

Thus, the uploaded data corresponds to the description of the test file, which confirms that the function is working correctly. edfread.

To check the correct functioning of the download and interpretation of EDF+ data, we will plot the graphs of the first 10 seconds of recording.

In [ ]:
t_max = 10.0     # time limit, with
nchan = size(record, 1)-1

plt = plot(
    layout = (nchan, 1),
    size   = (1000, 200*nchan),
    margin = 30*Plots.px
)

for ch in 1:(nchan)
    # Channel sampling rate
    fs = hdr.samples[ch] / hdr.duration

    # Maximum number of samples per channel
    n_time = min(Int(round(t_max * fs)), hdr.samples[ch] * hdr.records)

    # Time and signal
    t = (0:n_time-1) ./ fs
    y = record[ch, 1:n_time]

    # Units of measurement
    units = hdr.physicalDims[ch]
    units = units == "" ? "-" : units

    plot!(
        plt[ch],
        t, y,
        label  = "$(hdr.labels[ch])",
        xlabel = "Time, from",
        ylabel = units,
        legend = :topright
    )
end

display(plt)
image.png

Conclusion

In this example, the principle of working with data in the EDF and EDF+ formats was considered. Using the example of test files (test_generator.edf and test_generator_2.edf), taken from [EDF/BDF Test Files](https://teuniz.net/edf_bdf_testfiles /).