Engee documentation
Notebook

ABC-word search algorithm

This code implements an algorithm that finds words in a file containing the letters "a", "b" and "c" in that order (not necessarily in a row).

Introduction

In this example, we consider the task of searching for special words called ABC words. These are words in which the letters a, b, and c occur ** in the specified order**, but not necessarily one after the other. For example, the word "abacus" — ABC is a word because the letters a, b, and c are in the correct order. And the word "cab" — is not an ABC word, because the order is broken.

Such tasks can be used in linguistic research, natural language processing tasks, or simply in interesting alphabet and word analysis exercises.

The essence of the algorithm is to check for each word whether the letters 'a', 'b' and 'c' exist in it, and whether their positions in the word are arranged in a non—decreasing order.

The main part

Function isabcword It is used to determine whether a word is an ABC word.

Input: word w, the second parameter _ ignored

In [ ]:
function isabcword(w, _)
    # Создаем список позиций букв 'a', 'b', 'c' в слове, если они есть
    # Если буква не найдена, findfirst вернет nothing
    positions = [findfirst(c -> c == ch, w) for ch in "abc"]

    # Проверяем, все ли из трех букв найдены (никто не равен nothing)
    # И проверяем, что позиции этих букв расположены в порядке возрастания
    return all(!isnothing, positions) && issorted(positions) ? w : ""
end
Out[0]:
isabcword (generic function with 1 method)

Explanation of the function isabcword

Imagine the word "abacus". The letters are in these positions:

  • a — the first (position 1)
  • b — second (position 2)
  • c — third (position 3)

List [1, 2, 3] satisfies the condition issorted, since the numbers are ascending.

But if there was a word "cabbage":

  • a on the 3rd position,
  • b on the 4th,
  • c on the 1st,

then the list would be [1, 3, 4], but the positions are not sorted (1<3 <4 is correct). It is important that the letters are in the correct order.

The function returns either the word itself (if it is an ABC word) or an empty string, if not.

Using the function

Function foreachword opens the file and applies the passed function to each word.

She looks like map or foreach, but adapted to work with files with a list of words.

In [ ]:
function foreachword(filename, func)
    open(filename) do file
        for line in eachline(file)
            word = strip(line)  # убираем пробелы
            result = func(word, nothing)
            if !isempty(result)
                println(result)
            end
        end
    end
end
Out[0]:
foreachword (generic function with 1 method)

Calling our function for the file "unixdict.txt "

In [ ]:
foreachword("unixdict.txt", isabcword)
aback
abacus
abc
abdicate
abduct
abeyance
abject
abreact
abscess
abscissa
abscissae
absence
abstract
abstracter
abstractor
adiabatic
aerobacter
aerobic
albacore
alberich
albrecht
algebraic
alphabetic
ambiance
ambuscade
aminobenzoic
anaerobic
arabic
athabascan
auerbach
diabetic
diabolic
drawback
fabric
fabricate
flashback
halfback
iambic
lampblack
leatherback
metabolic
nabisco
paperback
parabolic
playback
prefabricate
quarterback
razorback
roadblock
sabbatical
snapback
strabismic
syllabic
tabernacle
tablecloth

What words will be printed? Only those for which isabcword returns a non-empty string, i.e. those that match our definition of an ABC word.

Conclusion

We have considered an example of a simple but useful program in the Julia language that solves the problem of searching ABC words. The algorithm uses the basic functions of working with strings and collections (findfirst, all, issorted) and allows you to effectively filter words based on a specific condition.

This example demonstrates Julia's flexibility and expressiveness for solving text data processing tasks without involving complex libraries. It can also be a convenient starting point for more complex language experiments.

The example was developed using materials from Rosetta Code