Advanced file management
Let's imagine a situation where we have several data sets that are spread out in folders, and there may also be garbage in the folders (pictures with funny raccoons, text documents). I would like to be able to find the necessary files for later work with them. Julia implements only basic work with files - <https://engee.com/helpcenter/stable/ru-en/julia/base/file.html >.
In this post, I will make a smarter tool for working with files.
How will we work?
We won't go into the details of how the file system works - we don't need it. It is enough to observe that the structure of directories and files strikingly resembles a tree. Trees are a special kind of graphs. In such a graph, all nodes except the "root" have one parent.:
Optimal algorithms for graph traversal and graph modification are known for such graphs, and they are often already implemented. My idea is as follows: I will represent the contents of the folder as a tree, each node of the tree is a file or folder. I will keep the path, name, extension, date and time of creation and modification, as well as the folder attribute separately. And in order to organize a tree structure, I will store the "descendants" of this node.:
using Dates
struct FileTreeNode
path::String
name::String
ext::String
isdir::Bool
created::DateTime
modified::DateTime
children::Vector{FileTreeNode}
end
Additionally, we will create auxiliary functions to get the date and time when the file was created and modified, as well as to get the file name, path, and extension.:
function get_metadata(path::String)
st = stat(path)
created = unix2datetime(st.ctime)
modified = unix2datetime(st.mtime)
return created, modified
end
function split_name_ext(path::String)
name = basename(path)
base, ext = splitext(name)
return base, ext
end
Growing a tree
We have everything to implement, and we can create a tree based on the directory structure.
Using readdir we will get a list of files and folders inside the current folder, and then repeat this operation for the detected folders. This is called recursion.
To put it programmatically, we perform a recursive traversal of the tree of files and folders.**
Let's add one more limitation to our function: the search depth.
function build_tree(path::String; maxdepth=typemax(Int), depth=0)
is_dir = isdir(path)
name, ext = split_name_ext(path)
created, modified = get_metadata(path)
if is_dir && depth < maxdepth
entries = readdir(path; join=true)
children = [
build_tree(e; maxdepth, depth=depth+1)
for e in sort(entries)
]
else
children = FileTreeNode[]
end
return FileTreeNode(path, name, ext, is_dir, created, modified, children)
end
import AbstractTrees: children, printnode
children(node::FileTreeNode) = node.children
function printnode(io::IO, node::FileTreeNode)
if node.isdir
print(io, "📁 ", node.name)
else
print(io, "📄 ", node.name, node.ext)
end
end
tree = build_tree(".",maxdepth=2)
using AbstractTrees: print_tree
print_tree(tree)