Fixed-point arithmetic (Fixed-point) in Julia
Page in progress. |
In the world of numerical computing, the vast majority of problems are solved by usage of floating-point numbers (Float32
, Float64
). However, in real-world systems - such as microcontrollers, DSPs, FPGAs or ASICs - usage of Float types may be undesirable or impossible. For example, in the STM32 family of microcontrollers, the base models do not have hardware support for float', and floating-point operations are much slower than integer operations. In such cases, fixed-point arithmetic (`Fixed-point
) is used, where fractional values are encoded in integer format with a predetermined length of the fractional part.
*Fixed-point arithmetic (Fixed-point') is a way of representing fractional values using ordinary integers and a predetermined scale (number of bits for the fractional part). Instead of resource-intensive floating-point arithmetic (`Float32
, Float64
), it uses a simple integer format where the "binary comma" is shifted by a given number of bits.
For example, if the length of the fractional part is , then the number is stored as an integer value , because the formula for converting the internal value (stored_integer
) to the real value (real_value
) will result in . The formula shows that the internal value is stored and is equal to , but during calculations it is interpreted as .
Such arithmetic has the following advantages:
-
Less resource consumption (relevant for microcontrollers, FPGAs and ASICs);
-
Controllable accuracy and range of values;
-
Support for code generator in Verilog (HDL) and C.
Fixed point operation in Engee
To work with fixed points, Engee uses its own EngeeFixedPoint.jl
package, which replaces the Julia standard FixedPointNumbers.jl
package. Unlike the classic package, EngeeFixedPoint.jl
provides advanced features and precise control over the representation and behaviour of fixed-point numbers - especially important in resource-constrained systems, when moving computations to HDL, and in problems of strict precision.
The EngeeFixedPoint.jl package is a standard Engee package and is included in the user environment by default, so does not need to be explicitly called (via import /using ) in code.
|
In Engee, the type of a fixed-point number is as follows:
Fixed{S, W, f, T} <: FixedPoint
Where:
This format allows you to specify exactly how the number will be stored, interpreted and participate in calculations due to safe typing and clear behaviour at all stages of data processing.
For convenience, ``EngeeFixedPoint.jl'' offers several ways to specify the fixed point type, from full manual specification to automatic output.
S, W, f, T = 1, 25, 10, Int32
dt1 = Fixed{S, W, f, T}
dt2 = fixdt(S, W, f)
dt3 = fixdt(Fixed{S, W, f})
dt4 = fixdt(dt2)
println(dt1 == dt2 == dt3 == dt4) # true
Where:
-
dt1 = Fixed{S, W, f, T}
- full manual description; -
dt2 = fixdt(S, W, f)
- simplified creation, type is automatically selected; -
dt3 = fixdt(Fixed{S, W, f})
- obtaining a type based on an existing description; -
dt4 = fixdt(dt2)
- repeated usage, creates a copy from an existing type.
All these options create the same type Fixed{1, 25, 10, Int32}
, and can be used depending on the task:
-
The full description (
dt1
) is useful when control over all parameters is needed; -
The simplified way (
dt2
) is suitable for typical cases and shortens the code; -
Getting a type from a type (
dt3
) is useful when generating code or typing data; -
The reusable usage (
dt4
) helps to work with parameterised structures without re-entering parameters.
Constructors of type Fixed
Next, let’s look at specific scenarios for working with fixed points in Engee.
For example, you can directly set the type and pass the value:
x = Fixed{1, 15, 2}(25)
Conclusion:
fi(6.25, 1, 15, 2)
This means that is an integer representation (stored_integer
) and the real value (real_value
) will be equal to according to the formula .
Fixed{S, W, f}(i::T)
.
The constructor of creating a fixed point by integer representation Fixed{S, W, f}(i::T)
takes:
-
Format parameters:
S
(signedness),W
(width in bits),f
(fractional part); -
Integer value
i
of typeT
(internal representation).
S, W, f = 1, 15, 2 # знаковый, 15 бит, 2 бита дробной части
i = 25
x = Fixed{S, W, f}(i) # создание из целого числа
Conclusion:
fi(6.25, 1, 15, 2) # эквивалентное представление
Fixed{S, W, f, T1}(i::T2)
.
Similar constructor to the previous one, with the ability to explicitly specify the storage type. The type will be automatically matched according to the parameters S
, W
, f
, regardless of the specified T1
.
T = Int128
x = Fixed{S, W, f, T}(i) # с указанием типа хранения
Conclusion:
fi(6.25, 1, 15, 2) # результат идентичен
Constructors from FixedPointNumbers.jl
Despite usage of the new EngeeFixedPoint.jl
package, it retains compatibility with the FixedPointNumbers.jl
package to support a number of constructors. Only signed types are supported.
Supported:
-
Fixed{T, f}(i::Integer, _)
- constructor by integer representation. Accepts typeT
and parameterf
; -
Fixed{T, f}(value)
- constructor by real value (float
).
Example:
T = Int32
x1 = Fixed{T, f}(i, nothing) # из целого числа
x2 = Fixed{T, f}(i) # из вещественного числа
Conclusion:
6.25 # результат первого конструктора
25.0 # результат второго конструктора
Auxiliary methods fi
The main convenient way to create fixed-point numbers is through fi
auxiliary methods. Unlike constructors, they automatically determine the parameters of the representation.
x1 = fi(3.37, 0, 63, 4) # Полный формат с явным указанием параметров
x2 = fi(3.37, fixdt(0, 63, 4)) # Через тип данных
x3 = fi(3.37, 0, 63) # С автоматическим определением дробной части
x4 = fi(100, 1, 8, 5) # Демонстрация обработки переполнения
Conclusion:
3.375 # значение с учетом округления
true # x1 и x2 идентичны
3.37 # с автоматическим подбором
3.96875 # результат насыщения при переполнении
Complex numbers
Full support for fixed-point complex numbers with the same methods of creation via fi
:
s, w, f = 1, 62, 7;
v = 2.5 - 3.21im
x1 = fi(v, s, w, f)
x2 = fi(v, fixdt(s, w, f))
x3 = fi(v, s, w)
println(x1)
println(x1 == x2)
println(x3)
println()
Output:
fi(2.5, 1, 62, 7) - fi(3.2109375, 1, 62, 7)*im
true
fi(2.5, 1, 62, 59) - fi(3.21, 1, 62, 59)*im
Working with arrays and matrices
The library provides full support for vector and matrix operations with fixed point numbers. All operations preserve the element type and automatically apply the specified precision parameters to all array elements.
Vectors
Create and work with one-dimensional arrays. Fixed point parameters are applied to all elements:
s, w, f = 1, 62, 7 # знаковый тип, 62 бита, 7 бит дробной части
v = [1, 2, 3] # исходный вектор
# Разные способы создания:
x1 = fi(v, s, w, f) # с явным указанием параметров
x2 = fi(v, fixdt(s, w, f)) # через тип данных
x3 = fi(v, s, w) # с автоматическим определением дробной части
println(x1)
println(x1 == x2)
println(x3)
Output:
Fixed{1, 62, 7}[1.0, 2.0, 3.0]
true
Fixed{1, 62, 59}[1.0, 2.0, 3.0]
Complex matrices
Full support for complex numbers in multidimensional arrays:
s, w, f = 1, 62, 7
m = [im 2.5; -1.2im 25-im]
# Рабочие способы создания:
x1 = fi(m, s, w, f) # с явным указанием параметров
x2 = fi(m, fixdt(s, w, f)) # через тип данных
println(x1)
println(x1 == x2)
Output:
Complex{Fixed{1, 62, 7, Int64}}[fi(0.0, 1, 62, 7) + fi(1.0, 1, 62, 7)*im fi(2.5, 1, 62, 7) + fi(0.0, 1, 62, 7)*im; fi(0.0, 1, 62, 7) - fi(1.203125, 1, 62, 7)*im fi(25.0, 1, 62, 7) - fi(1.0, 1, 62, 7)*im]
true
Basic operations and methods
Describes methods for working with fixed point numbers, allowing you to define the allowable range of values and basic properties.
Boundary values
The typemax
and typemin
methods allow you to define the maximum and minimum possible values for a particular fixed point type.
dt = fixdt(0, 25, -2) # беззнаковый тип с 25 битами и дробной частью -2
x = fi(1.5, dt) # создаем число фиксированной точки
println(typemax(x)) # 1.34217724e8 - максимальное представимое значение
println(typemin(x)) # 0.0 - минимальное значение для беззнакового типа
Mathematical operations
The system automatically selects the optimal format for the result of operations, maintaining accuracy and preventing overflow. All basic arithmetic operations (addition, subtraction, multiplication, division) are supported:
x1 = fi(1.5, 0, 15, 3)
x2 = fi(1.5, 1, 25, 14)
y1 = x1+x2
y2 = x1-x2
y3 = x1*x2
y4 = x1/x2
println(y1)
println(y2)
println(y3)
println(y4)
println(typeof(y1))
println(typeof(y2))
println(typeof(y3))
println(typeof(y4))
println(x1 == x2)
println(x1 <= x2)
println(x1 > x2)
Conclusion:
3.0
0.0
2.25
0.0
Fixed{1, 28, 14, Int32}
Fixed{1, 28, 14, Int32}
Fixed{1, 40, 17, Int64}
Fixed{1, 25, -11, Int32}
true
true
false
Rounding
Various rounding strategies allow you to control the accuracy of your calculations. By default, RoundNearestTiesUp rounding is used.
x = fi(1.5, 1, 14, 3) # знаковый, 14 бит, 3 бита дробной части
println(round(x)) # 2.0 - округление к ближайшему целому (1.5 → 2)
println(trunc(x)) # 1.0 - отбрасывание дробной части
println(ceil(x)) # 2.0 - округление вверх к большему целому
println(floor(x)) # 1.0 - округление вниз к меньшему целому
Where:
-
round
- bank rounding (to the nearest even at0.5
); -
trunc
- discard fractional part; -
ceil
- always upwards; -
floor
- always downward.
Type conversion (conversion)
Conversion to standard data types is useful when interacting with other libraries. When converting, rounding rules are taken into account.
x = fi(1.5, 1, 12, 4)
y1 = Int64(x)
y2 = UInt8(x)
y3 = Float64(x)
y4 = convert(fixdt(0, 5, 2), x)
println(y1)
println(y2)
println(y3)
println(y4)
println(typeof(y1))
println(typeof(y2))
println(typeof(y3))
println(typeof(y4))
Conclusion:
1
1
1.5
1.5
Int64
UInt8
Float64
Fixed{0, 5, 2, UInt8}
Conclusion
In summary, the EngeeFixedPoint.jl
package provides the following benefits:
-
*Expanded type system:
-
Full support for both signed and unsigned numbers;
-
Arbitrary bit size (any bit size, not just 8/16/32/64/128);
-
Flexible fractional part setting (including negative values and cases when , fractional part length is greater than word length ).
-
-
*Improved type output rules:
-
*Platform-dependent code generation:
-
Different type inheritance rules for target platforms (C or Verilog);
-
Predictable behaviour on 128-bit boundary overflow (unlike analogues).
-
-
*Expanded functionality:
-
Optimised handling of arrays and matrices (cf. Working with arrays and matrices);
-
Full support for complex numbers (see ). Complex numbers);
-
Efficient rounding methods (round/trunc/ceil/floor, see ). Rounding).
-
Support for basic methods (zero/one/typemin/typemax, see ). Boundary values.
-
f
). Since fixed-point numbers cannot accurately represent all possible fractional values, operations round the result according to a given strategy (e.g., to the nearest value or with truncation).
W
) and signability (S
). In such cases, an overflow handling strategy is applied: saturation, where the value is limited to the maximum/minimum allowed, or truncation or error output