Engee documentation

Calculating the Julia code

One of the most difficult moments in learning how to execute code in Julia is understanding how all the parts work together to execute a block of code.

Each piece of code usually goes through many stages with potentially unfamiliar names, such as (in no particular order): using the flisp, AST, C interpreter++, LLVM, eval, typeinf, macroexpand, sysimg (or system image), bootstrapping, compilation, analysis, execution, JIT, interpretation, packaging, unpacking, using an embedded function and a primitive function before becoming the desired result (hopefully).

Definitions
  • REPL

    REPL stands for read—compute—output cycle. That's what we call the command line environment for short.
  • AST

    Abstract syntax tree. AST is a digital representation of the code structure. In this form, the code has been marked up in meaning to be more suitable for processing and execution.

Julia code Execution

The following is a general description of the process.

  1. The user launches `julia'.

  2. The C function main() is called from cli/loader_exe.c. This function processes command line arguments by filling in the jl_options structure and setting the ARGS variable. Then she initializes Julia (by calling the function https://github.com/JuliaLang/julia/blob/master/src/init.c [julia_init in init.c], which can load a previously compiled system image, sysimg). Finally, she hands over control to Julia, calling https://github.com/JuliaLang/julia/blob/master/base/client.jl [Base._start()].

  3. When _start() takes control, the subsequent sequence of commands depends on the specified command line arguments. For example, if a file name was specified, this file will be executed. Otherwise, an interactive REPL loop will be started.

  4. Omitting details about how the REPL interacts with the user, let’s just say that the program ends with a block of code that it wants to execute.

  5. If the block of code to execute is in a file, it is called https://github.com/JuliaLang/julia/blob/master/src/toplevel.c [jl_load(char *filename)] to upload a file and its analysis. Each piece of code is then passed to eval for execution.

  6. Each piece of code (or AST) is passed to eval() to convert to a result.

  7. eval() takes each code fragment and tries to run it in jl_toplevel_eval_flex().

  8. jl_toplevel_eval_flex() determines whether the code is a top-level action (for example, using or module) that will not be allowed inside the function. If this is the case, the code is passed to the top-level interpreter.

  9. Then jl_toplevel_eval_flex() expands the code to eliminate any macros and lower the AST to simplify its execution.

  10. After that, 'jl_toplevel_eval_flex() It uses some simple heuristic procedures to decide whether to perform JIT compilation for the AST or interpret it directly.

  11. The main part of the work on interpreting the code is performed by https://github.com/JuliaLang/julia/blob/master/src/interpreter.c [eval in `interpreter.c'].

  12. If the code is compiled, it does most of the work. `codegen.cpp `. Whenever the Julia function is called for the first time with a given set of argument types, it will be executed type inference. This information is used during the code generation stage (codegen) to create faster code.

  13. Eventually, the user exits the REPL or the end of the program is reached and the _start() method returns.

  14. Just before exiting, main() calls https://github.com/JuliaLang/julia/blob/master/src/init.c [jl_atexit_hook(exit_code)]. This calls the function Base._atexit() (which calls any functions registered in atexit() inside Julia). Then the function is called https://github.com/JuliaLang/julia/blob/master/src/gc.c [jl_gc_run_all_finalizers()]. As a result, it correctly clears all the libuv handlers and waits until they are reset and closed.

Analysis

The Julia analyzer is a small lisp program written in the femtolisp language, the source code of which is located inside Julia in the folder https://github.com/JuliaLang/julia/tree/master/src/flisp [src/flisp].

Its interface functions are defined mainly in https://github.com/JuliaLang/julia/blob/master/src/jlfrontend.scm [jlfrontend.scm]. The code in https://github.com/JuliaLang/julia/blob/master/src/ast.c ['ast.c`] handles this transfer on Julia’s side.

Other important files at this stage are https://github.com/JuliaLang/julia/blob/master/src/julia-parser.scm [julia-parser.scm], which processes the markup of Julia code and converts it to AST, and https://github.com/JuliaLang/julia/blob/master/src/julia-syntax.scm [julia-syntax.scm], which handles the transformation of complex AST representations into simple reduced AST representations more suitable for analysis and execution.

If you want to test the analyzer without completely rebuilding Julia, you can run the interface part yourself as follows.

$ cd src
$ flisp/flisp
> (load "jlfrontend.scm")
> (jl-parse-file "<filename>")

Macro Expansion

When the function eval() detects a macro, it expands this AST node before trying to evaluate the expression. Macro expansion involves transferring from eval() (in Julia) to the analyzer function jl_macroexpand() (written in flisp) to the Julia macro itself (written somewhere in Julia) using the function fl_invoke_julia_macro() and back.

The extension is usually activated as the first step during a call. Meta.lower()/jl_expand()', although it can also be initiated directly by calling `macroexpand()/jl_macroexpand().

Type inference

In Julia, type inference is implemented using the function https://github.com/JuliaLang/julia/blob/master/base/compiler/typeinfer.jl [typeinf() in the file compiler/typeinfer.jl]. Type inference is the process of examining a Julia function and determining the type boundaries of each of its variables, as well as the type boundaries of the function’s return value. This allows for the implementation of many future optimization measures, such as unpacking known immutable values, and raising various computational operations during compilation, such as calculating field offsets and function pointers. Type inference may also contain other steps such as constant propagation and embedding.

Even more definitions
  • JIT

    Online compilation (JIT compilation). The process of generating your own machine code into memory exactly when it is needed.
  • LLVM

    A low-level virtual machine (compiler). The Julia JIT compiler is a program or library called libLLVM. Code generation in Julia refers both to the process of converting Julia's AST into LLVM instructions, and to the process of optimizing LLVM instructions and converting them into native assembly instructions.
  • C++

    The programming language in which the LLVM compiler is implemented, which means that code generation is also implemented in this language. The rest of the Julia library is implemented in C, partly because its smaller feature set makes it more convenient to use as an interface layer for multiple languages.
  • Packaging

    This term is used to describe the process of taking a value and wrapping data that is tracked by the garbage collector and marked with an object type.
  • Unpacking

    A term that is the reverse of the packaging of the meaning. This operation allows for more efficient data management when the type of this data is fully known at compile time (via type inference).
  • Universal function

    A Julia function consisting of several methods that are selected for dynamic dispatch based on the signature of the argument type.
  • Anonymous function or method

    The Julia function has no name and no type dispatching capabilities.
  • Primitive function

    A function implemented in C, but introduced in Julia as a named function method (although without universal function dispatching capabilities: similar to an anonymous function).
  • Internal function

    A low-level operation presented in Julia as a function. These pseudo-functions implement operations with raw bits, such as addition and sign expansion, which cannot be expressed directly in any other way. Since they work with bits directly, they must be compiled into a function and surrounded by a `Core' call.Intrinsics.box(T, ...)`to rewrite type information for a value.

JIT code generation

Code generation is the process of converting Julia’s AST into native machine code.

The JIT environment is initialized with an advance call https://github.com/JuliaLang/julia/blob/master/src/codegen.cpp ['jl_init_codegen` in `codegen.cpp `].

Upon request, the Julia method is converted to its own function using the function emit_function(jl_method_instance_t*). (Note that when using MCJIT (in LLVM v3.4 and later), each function must be JIT into a new module.) This function recursively calls the emit_expr() function until the entire function is issued.

The remaining majority of this document is devoted to various manual optimizations of specific code patterns. For example, the function emit_known_call() knows how to embed many primitive functions (defined in https://github.com/JuliaLang/julia/blob/master/src/builtins.c [`builtins.c']) for various combinations of argument types.

Other parts of the code generation process are handled by various auxiliary files.

  • debuginfo.cpp

    Handles backtracking for JIT functions

  • ccall.cpp

    Handles FFI ccall and llvmcall, as well as various 'abi_*.cpp` files

  • intrinsics.cpp

    Handles the output of various low-level internal functions

Bootstrapping

The process of creating a system image is called bootstrapping.

This word comes from the English phrase pulling yourself up by the bootstraps (to achieve everything on your own) and means the idea to start with a very limited set of available functions and definitions and end up creating a fully functional environment.

System Image

The system image is a precompiled archive of a set of Julia files. The file sys.ji' distributed with Julia is one of these system images created by executing the file https://github.com/JuliaLang/julia/blob/master/base/sysimg.jl [`sysimg.jl] and serialize the resulting environment (including types, functions, modules, and all other defined values) into a file. Therefore, it contains a static version of the Main, Core, and Base modules (and everything else that was in the environment at the end of bootstrapping). This serializer or deserializer is implemented using the function https://github.com/JuliaLang/julia/blob/master/src/staticdata.c ['jl_save_system_image` or jl_restore_system_image in the staticdata.c file].

If the sysimg file is missing ('jl_options.image_file == NULL`), it also means that the --build option was specified on the command line, so the end result should be a new sysimg file. During Julia initialization, the minimum modules Core and Main are created. Then a file named boot.jl is calculated from the current directory. After that, Julia computes any file specified as a command line argument until it reaches the end. Finally, it saves the resulting environment to a sysimg file to use as a starting point for future execution.