Finding and replacing text¶
Processing text data often involves finding and replacing substrings. There are several functions that find text and return different information: some functions confirm that text exists, others count the number of times a text fragment is repeated, find indexes, or extract substrings.
Text search¶
To determine if a piece of text is present, you can apply the function occursin()
. Logical values 1 corresponds to true and 0 corresponds to false.
txt = "she sells seashells by the seashore"
TF = occursin("sea", txt)
To count how many times this text occurs, you can use the function count()
.
n = count("sea", txt)
To determine where the text is located, use the function findall()
, which returns the character indices that match the text fragment "sea ".
idx = findall("sea", txt)
Searching for text in arrays of strings¶
The search and replace functions also allow you to find text in multi-element arrays. For example, find the names of colours in the titles of several songs.
songs = ["Penny Lane", "Yellow Submarine","Blackbird"]
colors = ["Red", "Yellow", "Black"]
TF = occursin.(colors,songs)
To display a list of songs containing colour names, use the TF logical array as indexes in the original array of songs. This method is called logical indexing.
songs[TF]
Matching patterns¶
In addition to searching for literal text such as "sea" or "yellow", you can search for text that matches a pattern. There are many predefined patterns such as digits to search for a sequence of digits.
address = " Sesame Street, New York, NY 10128"
nums = match(r"\d+", address)
You can combine patterns to make your search more accurate. For example, search for words that begin with the letter "S". Use string to specify the "S" character and lettersPattern to find additional letters after that character.
lettersPattern = r"[a-zA-Z]+"
pat = "N" * lettersPattern
StartWithS = match.(pat, address).match
Other functions for working with text in Engee can be found in Text Strings.