Engee documentation
Notebook

Reading GEDCOM files

This example discusses reading and printing personal data from a GEDCOM format file in Engee.

Introduction

GEDCOM (from English Genealogical Data Communications) is a specification for the exchange of genealogical data between different genealogical programs. Most modern genealogical computer programs support the import/export of data in the GEDCOM format.
In this example, we will create a simple program that will read the GEDCOM file and output information about the characters specified in the file.

The reading function .ged-file

Function parse_gedcom() it works as follows:

  • reads the GEDCOM file and extracts information about the characters;

  • uses a dictionary individuals, where the key is the person's ID, and the value is another dictionary
    with information about the person (name, date of birth, etc.);

  • searches for lines starting with 0 @ and containing INDI to determine the beginning of a record about a person;

  • Extracts the person's name and date of birth, if present.

In [ ]:
function parse_gedcom(file_path::String)
    individuals = Dict{String, Dict{String, String}}()
    current_id = ""

    open(file_path, "r") do file
        for line in eachline(file)
            line = strip(line)
            if startswith(line, "0 @") && contains(line, "INDI")

                # Найдем ID персоны
                current_id = match(r"@I\d+@", line).match
                individuals[current_id] = Dict{String, String}()

            elseif startswith(line, "1 NAME")

                # Извлечем имя персоны
                name = replace(line, r"1 NAME " => "")
                individuals[current_id]["NAME"] = name

            elseif startswith(line, "1 BIRT")
                
                # Извлечем дату рождения
                birth_date = ""
                for inner_line in eachline(file)
                    inner_line = strip(inner_line)
                    if startswith(inner_line, "2 DATE")
                        birth_date = replace(inner_line, r"2 DATE " => "")
                        break
                    end
                end
                individuals[current_id]["BIRT"] = birth_date
            end
        end
    end

    return individuals
end
Out[0]:
parse_gedcom (generic function with 1 method)

The function of printing the received data

Function print_individuals() displays information about the characters in a human-readable format.

In [ ]:
function print_individuals(individuals::Dict{String, Dict{String, String}})
    for (id, data) in individuals
        println("ID: $id")
        println("Name: $(data["NAME"])")
        if haskey(data, "BIRT")
            println("Birth Date: $(data["BIRT"])")
        end
        println()
    end
end
Out[0]:
print_individuals (generic function with 1 method)

Reading example .ged-file

The example folder contains the GEDCOM file. example.ged with an example of the personal data of one family. We will pass the path to this file to the GEDCOM reader function, and then print the received data.

In [ ]:
# Пример использования
file_path = "$(@__DIR__)/example.ged"
individuals = parse_gedcom(file_path)
print_individuals(individuals)
ID: @I3@
Name: Лея /Органа/
Birth Date: 04 MAY 0019

ID: @I4@
Name: Падме /Амидала/
Birth Date: 04 MAY 0046

ID: @I2@
Name: Энакин /Скайуокер/
Birth Date: 04 MAY 0041

ID: @I1@
Name: Люк /Скайуокер/
Birth Date: 04 MAY 0019

As can be seen from the results cell of the code cell, the required data for all family members has been printed.

Conclusion

In this example, reading and printing personal data from a GEDCOM format file in Engee was considered.