Text search and replacement
Processing text data often involves searching and replacing substrings. There are several functions that find text and return various information: some functions confirm that the text exists, others count the number of repetitions of a text fragment, find indexes or extract substrings.
Text search
To determine if a text fragment is present, you can use the function occursin(). Logical values of 1 correspond to true, and 0 to false.
txt = "she sells seashells by the seashore"
TF = occursin("sea", txt)
You can calculate how many times this text occurs using the function count().
n = count("sea", txt)
To determine where the text is located, use the function findall(), which returns the indexes of characters that match the text fragment "sea".
idx = findall("sea", txt)
Searching for text in arrays of strings
The search and replace functions also allow you to find text in multi-element arrays. For example, find the names of colors in the names of several songs.
songs = ["Penny Lane", "Yellow Submarine","Blackbird"]
colors = ["Red", "Yellow", "Black"]
TF = occursin.(colors,songs)
To display a list of songs containing color names, use the TF logical array as indexes in the original array of songs. This method is called logical indexing.
songs[TF]
Matching patterns
In addition to searching for literal text such as “sea” or “yellow”, you can search for text matching the pattern. There are many predefined patterns, such as digits, to search for a sequence of digits.
address = " Sesame Street, New York, NY 10128"
nums = match(r"\d+", address)
You can combine templates to make your search more accurate. For example, find words starting with the letter “S". Use a string to specify the "S” character, and lettersPattern to find additional letters after that character.
lettersPattern = r"[a-zA-Z]+"
pat = "N" * lettersPattern
StartWithS = match.(pat, address).match
Other functions for working with text in Engee can be found in the [Text strings] section (https://engee.com/helpcenter/stable/julia/base/strings.html ).