Engee documentation
Notebook

Visualisation and analysis of the electrocardiogram signal

Introduction

This step-by-step exercise is an introduction to reading, visualising and analysing an electrocardiogram signal. An electrocardiogram, also known as an ECG, is an electrical signal that is produced when the heart contracts. ECG is widely used because it can quickly detect the health of the heart as well as various abnormalities such as arrhythmias, heart attack or tachycardia.

The image below shows a typical ECG.

ecg_001.jpg

Internet connection is required to run this demonstration

ECG Analysis

Loading and displaying ECGs from a file on your computer

The first step is to import data from the electrocardiogram.dat file.

Then we import the data from the ecgdata.dat file into the ecg variable as a matrix (1x50000).

In [ ]:
Pkg.add(["CSV", "Findpeaks"])
In [ ]:
using CSV, DataFrames
ecg = Matrix( CSV.read("$(@__DIR__)/electrocardiogram.dat", DataFrame, header = 0, delim=';') );

The typeof() command allows us to get the characteristics of the variable.

In [ ]:
typeof(ecg)
Out[0]:
Matrix{Float64} (alias for Array{Float64, 2})

This file contains 250 seconds of ECG recording, and since the sampling rate is 200 Hz, the file contains 50,000 samples. We can now plot the ECG using the plot command:

In [ ]:
using Plots
gr(size=(1700, 600), legend=false)
plot(ecg,label=false)
title!("Электрокардиограмма")
Out[0]:

This signal was read from http://people.ucalgary.ca/~ranga/. Under Lecture notes, lab exercises, and signal data files you will find a large number of signals from various data acquisition methods (ECG, EEG, EDS, etc.). You can save the files from your browser to your hard drive and then open them using the download command as above.

Independent work

Try reading and displaying other files from the website or from the folder containing the materials in this lab.

Approximation

As you can see from the graph, the number of samples can make it difficult to observe the details of each cardiac cycle. We can set the graph approximation parameters by adding ylimits and xlimits to the plot command.

In [ ]:
plot(ecg, label=false, ylimits=(1500,2800), xlimits=(3000, 3800))
Out[0]:

Independent work

Examine the ECG. You can change the ylimits and xlimits to examine different regions of the signal. Is the ECG the same everywhere? For what reasons might the ECG look different?

Detecting ECG peaks using thresholding

One interesting and simple exercise with this signal is to detect the peaks of each cycle - areas of the signal that exceed a certain value. For example, we can arbitrarily set the threshold at a signal level of 2400 and generate a new signal at to compare the ECG to the threshold:

In [ ]:
peaks_1 = ecg .> 2400;

At the same time, we can generate a time-matched signal, which will be useful later:

In [ ]:
time_1 = (1:length(ecg))/200;

Now we can plot the original signal already in the time domain:

In [ ]:
plot(time_1, ecg)
title!("Электрокардиограмма")
Out[0]:

We can compare the two signals, source and threshold, by plotting them together. We can overlay one graph on the other using the exclamation mark symbol after the command.

Since the threshold signal will have values between 0 and 1, we need to scale it so that the two signals are in the same ranges. We can do this within the catter command by multiplying the logic vector by the peak value of interest.

In [ ]:
plot(time_1, ecg, label=false, ylimits=(1500,2800), xlimits=(15, 20))
scatter!(time_1, 2600*(peaks_1), mc=:red, ms=2, ma=0.5)
Out[0]:

Independent work

Change the value of the threshold (2400) to other values. What happens? Do you detect more or fewer peaks?

Peak size

Note that since we are now using time, the axis instruction has changed. If we look closely, we can spot the problem:

In [ ]:
plot(time_1, ecg, label=false, ylimits= (2000, 2800), xlimits = (16.4, 17.2))
scatter!(time_1, 2600*(peaks_1), mc=:red, ms=2, ma=0.5)
Out[0]:

Peak detection using the Findpeaks package

Threshold provides a value for every single place where the signal exceeds a given value, which is not quite the same as peak detection. Fortunately, Julia has a package (always check if Julia has packages for what you need. It is very likely that you can find a package that will make your job easier) for peak detection, called Findpeaks.

To get started, you will need to import the Findpeaks package.

In [ ]:
using Pkg
Pkg.add(url="https://github.com/tungli/Findpeaks.jl")
     Cloning git-repo `https://github.com/tungli/Findpeaks.jl`
    Updating git-repo `https://github.com/tungli/Findpeaks.jl`
    Updating registry at `~/.julia/registries/EngeeJuliaRegistry`
    Updating git-repo `https://gitlab.kpm-ritm.ru/engee/backend/kernels/simcg/engeejuliaregistry.jl.git`
    Updating registry at `~/.julia/registries/ProgrammaticModeling`
    Updating git-repo `https://gitlab.kpm-ritm.ru/engee/backend/kernels/ProgrammaticModelingRegistry.jl.git`
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
   Installed EpollShim_jll ───────── v0.0.20230411+0
   Installed DifferentialEquations ─ v7.6.0
   Installed PyCall ──────────────── v1.94.1
   Installed StaticArrays ────────── v1.5.12
   Installed ModelingToolkit ─────── v8.43.0
    Updating `~/.julia/environments/v1.8/Project.toml`
  [d9b5fb6a] + Findpeaks v0.1.0 `https://github.com/tungli/Findpeaks.jl#master`
    Updating `~/.julia/environments/v1.8/Manifest.toml`
  [d9b5fb6a] + Findpeaks v0.1.0 `https://github.com/tungli/Findpeaks.jl#master`
  [2702e6a9] + EpollShim_jll v0.0.20230411+0
    Building PyCall → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/53b8b07b721b77144a0fbbbc2675222ebf40a02d/build.log`
Precompiling project...
EpollShim_jll
Findpeaks
  2 dependencies successfully precompiled in 9 seconds. 211 already precompiled.

Once the import of the package is complete, we need to change the ecg variable type to AbstractArray and then the Findpeaks package can be used as follows:

In [ ]:
using Findpeaks
ecg = vec(ecg::AbstractArray)
peaks_3 = findpeaks(ecg)
plot(time_1, ecg, labels=false)
scatter!(time_1[peaks_3], ecg[peaks_3])
Out[0]:

There seems to be a lot of "peaks" where there shouldn't be any. Let's zoom in:

In [ ]:
plot(time_1, ecg, label=false, ylimits=(1500,2800), xlimits=(15, 17))
scatter!(time_1[peaks_3], ecg[peaks_3])
Out[0]:

This is a good example of the specificity of some processing methods. Peak detection is detecting ANY point that is higher than its neighbours, which is different from what we are trying to detect. This leads to the question, what "peaks" do we want?

Clarification of peak detection

We need relatively large peaks (not ones that are likely to be related to noise), and we need sharp and high peaks that should be closely related to heart rate. This can be summarised in these rules: peaks that:

(a) are above a certain level (again, 2400 might be a good threshold),

b) are not too close to each other.

The second rule may not be immediately obvious, so let's look at them one by one. Firstly, let's discard all peaks below the threshold and display the result:

In [ ]:
peaks_4 = findpeaks(ecg, min_height = 2400.)
plot(time_1, ecg, label=false, ylimits=(1500,3300), xlimits=(20, 23))
scatter!(time_1[peaks_4], ecg[peaks_4])
Out[0]:

Now we can see how the second rule will affect the result, since two positions are found in one "peak". Let's repeat and find only those peaks that are at least 10 positions away from each other:

In [ ]:
peaks_5 = findpeaks(ecg, min_height = 2400., min_dist = 10)
plot(time_1, ecg, label=false, ylimits=(1500,3300), xlimits=(20, 23))
scatter!(time_1[peaks_5], ecg[peaks_5])
Out[0]:

As a last step, we can display the ECG and below it we can display the time differences between the peaks, that is, we can see the heart rate inferred from these peaks. We can display two signals in one figure using the subplot command as follows:

In [ ]:
peaks_5 = sort(peaks_5);
s = 60*diff(peaks_5/200);
a = peaks_5[2:end];
In [ ]:
p1 = plot(time_1, ecg, label=false, title = "ЭКГ");
s1 = scatter!(time_1[peaks_5], ecg[peaks_5]);
p2 = plot(time_1[a], s, legend=false, title = "Сердечный ритм");
plot(p1, p2, layout=(2,1))

Finally, we have plotted the heart rate extracted from the ECG.

Analysis verification

Note that the heart rate is relatively stable around 40 beats per minute, which is lower than the usual 60 beats per minute. However, there is one sudden peak around 100, this is likely due to a processing error.

Independent work:

  1. Try to analyse the code to understand why there is a peak around 100 bpm.
  2. Try to improve the code to avoid such problems.
  3. In addition to the above peak, you may notice significant heart rate fluctuations. Do you think this is due to the patient's irregular heartbeat? Or with signal processing? Or with problems in the signal acquisition phase? Near the end there is an area where the value drops below 20, what is happening there?
  4. Try reading other signals and apply this simple step-by-step analysis.
  5. Can you think of other ways to visualise ECG and other biomedical signals?