Engee documentation
Notebook

Reading GEDCOM files

This example deals with reading and printing in Engee personal data from a GEDCOM file.

Introduction

GEDCOM (Genealogical Data Communications) is a specification for exchanging genealogical data between different genealogical programs. Most modern genealogical computer programs support import/export of data in GEDCOM format. In this example we will create a simple programme that will read a GEDCOM file and output information about the persons specified in the file.

Function of reading .ged-file

The function parse_gedcom() works as follows:

  • reads the GEDCOM file and extracts information about persons;

  • uses the dictionary individuals, where the key is the person ID and the value is another dictionary with information about the person (name, date of birth, etc.); * uses the dictionary , where the key is the person ID and the value is another dictionary with information about the person (name, date of birth, etc.);

  • searches for strings starting with 0 @ and containing INDI, to determine the beginning of the person record;

  • retrieves the person's name and date of birth, if present.

In [ ]:
function parse_gedcom(file_path::String)
    individuals = Dict{String, Dict{String, String}}()
    current_id = ""

    open(file_path, "r") do file
        for line in eachline(file)
            line = strip(line)
            if startswith(line, "0 @") && contains(line, "INDI")

                # Найдем ID персоны
                current_id = match(r"@I\d+@", line).match
                individuals[current_id] = Dict{String, String}()

            elseif startswith(line, "1 NAME")

                # Извлечем имя персоны
                name = replace(line, r"1 NAME " => "")
                individuals[current_id]["NAME"] = name

            elseif startswith(line, "1 BIRT")
                
                # Извлечем дату рождения
                birth_date = ""
                for inner_line in eachline(file)
                    inner_line = strip(inner_line)
                    if startswith(inner_line, "2 DATE")
                        birth_date = replace(inner_line, r"2 DATE " => "")
                        break
                    end
                end
                individuals[current_id]["BIRT"] = birth_date
            end
        end
    end

    return individuals
end
Out[0]:
parse_gedcom (generic function with 1 method)

The print_individuals() function outputs person information in an easy-to-read format.

In [ ]:
function print_individuals(individuals::Dict{String, Dict{String, String}})
    for (id, data) in individuals
        println("ID: $id")
        println("Name: $(data["NAME"])")
        if haskey(data, "BIRT")
            println("Birth Date: $(data["BIRT"])")
        end
        println()
    end
end
Out[0]:
print_individuals (generic function with 1 method)

Example of reading .ged-file

In the example folder there is a GEDCOM file example.ged with an example of personal data of one family. Let's pass the path to this file to the GEDCOM reading function and then print the obtained data.

In [ ]:
# Пример использования
file_path = "$(@__DIR__)/example.ged"
individuals = parse_gedcom(file_path)
print_individuals(individuals)
ID: @I3@
Name: Лилия /Органова/
Birth Date: 04 MAY 0019

ID: @I4@
Name: Рада /Амидалова/
Birth Date: 04 MAY 0046

ID: @I2@
Name: Аникей /Скайволков/
Birth Date: 04 MAY 0041

ID: @I1@
Name: Лука /Скайволков/
Birth Date: 04 MAY 0019

As you can see from the cell with the results of the code cell execution, the required data for all family members were printed.

Conclusion

This example covered reading and printing to Engee personal data from a GEDCOM format file.