Anagram generation algorithm
This guide will help you understand how the algorithm for searching and composing anagram phrases from a given word or phrase works using the Julia programming language.
Introduction
An anagram generation algorithm is a way of rearranging the letters of an original word (or phrase) in such a way as to obtain new meaningful words or phrases. It is widely used in linguistics, cryptography, mindfulness games, and entertainment tasks.
For example, the word Listen can be converted to Silent because both of these combinations use the same letters, but in a different order. Our application will search for such anagrams among a set of English words from a file. unixdict.txt.
The main part of the program
Reading the dictionary
Reading the word list file (unixdict.txt ) and convert everything to lowercase. Then we divide the string into words using a regular expression. \s+ for spaces, to remove unnecessary characters and turn them into an internal collection - an array.
const unixwords = split(read("unixdict.txt", String) |> lowercase, r"\s+")
Function findphrases
Function findphrases accepts the string (anastring) — what needs to be anagrammed — as well as a dictionary array choices. You can also specify a minimum word length (sizelong, by default 4 characters) and how many short words can be added to the result (n_shortpermitted, zero by default). The returned result is a list of matching phrases.
function findphrases(anastring::AbstractString, choices, sizelong = 4, n_shortpermitted = 1)
# Counting the frequencies of each letter of the input string (a-z only)
anadict = Dict{Char, Int}()
for c in lowercase(anastring)
if 'a' <= c <= 'z'
anadict[c] = get(anadict, c, 0) + 1
end
end
# We will store the found anagrams here.
phrases = String[]
# Recursively substituting words
function addword(remaining, phrase, numshort)
# We go through every word in our dictionary.
for w in unixwords
len = length(w)
# If it is not possible to use more short words and this word is too small → skip
numshort < 1 && len < sizelong && continue
# We check whether there are enough letters from the remaining ones to use the word W.
any(c -> get(remaining, c, 0) < count(==(c), w), w) && @goto nextword
# Creating a new version of the remaining letters without those that have already been used
cdict = copy(remaining)
for c in w
cdict[c] -= 1
end
# Do all the letters match? So there are only zeros left?
if all(==(0), values(cdict))
# We have collected a complete anagram! Adding it to the result
return strip(phrase * " " * w)
elseif (newphrase = addword(cdict, phrase * " " * w, numshort - (len < sizelong))) != nothing
push!(phrases, newphrase)
end
@label nextword
end
# Didn't find anything at this step → return nothing
return nothing
end
# Starting a recursive process of collecting results
addword(anadict, "", n_shortpermitted)
# We are returning all the collected phrases
return phrases
end
Calling a function with test data
for s in ["Engee the best", "Simulink", "Exponenta"]
println("\nFrom '$s':")
# Removing duplicate results and displaying sorted results
foreach(println, findphrases(s, unixwords, 4, 0) |> unique |> sort!)
end
Conclusion
We have reviewed in detail the implementation of the algorithm for searching for anagrams in the Julia language. Using the file unixdict.txt We were able to find meaningful phrases that are anagrams of certain phrases.
This technique has applications in writing puzzles, analyzing texts, and even verifying the uniqueness of passwords. The main advantage of this code is the purity of the implementation and the use of recursion to accurately match letter combinations. Efficiency depends on the size of the dictionary and the strategy of selecting new words, so if necessary, it can always be improved by selective filtering or optimization of crawling.
The example was developed using materials from Rosetta Code