Fixed-Point Arithmetic (Fixed-Point) in Engee
In the world of numerical computing, the vast majority of problems are solved by usage of floating-point numbers (Float32
, Float64
). However, in real-world systems - such as microcontrollers, DSPs, FPGAs or ASICs - usage of Float types may be undesirable or impossible. For example, in the STM32 family of microcontrollers, the base models do not have hardware support for float', and floating-point operations are much slower than integer operations. In such cases, fixed-point arithmetic (`Fixed-Point
) is used, where fractional values are encoded in integer format with a predetermined length of the fractional part.
Fixed-Point (Fixed-Point') is a way of representing fractional values using ordinary integers and a predetermined scale (number of bits for the fractional part). Instead of resource-intensive floating-point arithmetic (`Float32
, Float64
), it uses a simple integer format where the "binary comma" is shifted by a given number of bits.
For example, if the length of the fractional part is , then the number is stored as an integer value , because the formula for converting the internal value (stored_integer
) to the real value (real_value
) will result in . The formula shows that the internal value is stored and is equal to , but during calculations it is interpreted as .
Such arithmetic has the following advantages:
-
Less resource consumption (relevant for microcontrollers, FPGAs and ASICs);
-
Controllable accuracy and range of values;
-
Support for code generator in Verilog (HDL) and C.
Read more about Fixed Point arithmetic in the article guide/hdl-fixed-point-arithmetic.adoc#fixed-point-arithmetic
Fixed Point Arithmetic in Engee
To work with fixed points, Engee uses its own package EngeeFixedPoint.jl
, which replaces the standard Julia package FixedPointNumbers.jl
. Unlike the classic package, EngeeFixedPoint.jl
provides advanced features and precise control over the representation and behaviour of fixed-point numbers - especially important in resource-constrained systems, when moving computations to HDL, and in problems of strict precision.
The EngeeFixedPoint.jl package is a standard Engee package and is included in the user environment by default, so does not need to be explicitly called (via import /using ) in code.
|
In Engee, the type of a fixed-point number is as follows:
Fixed{S, W, f, T} <: FixedPoint
Where:
This format allows you to specify exactly how the number will be stored, interpreted and participate in calculations due to safe typing and clear behaviour at all stages of data processing.
For convenience, ``EngeeFixedPoint.jl'' offers several ways to specify the fixed point type, from full manual specification to automatic output.
S, W, f, T = 1, 25, 10, Int32
dt1 = Fixed{S, W, f, T}
dt2 = fixdt(S, W, f)
dt3 = fixdt(Fixed{S, W, f})
dt4 = fixdt(dt2)
println(dt1 == dt2 == dt3 == dt4) # true
Where:
-
dt1 = Fixed{S, W, f, T}
- full manual description; -
dt2 = fixdt(S, W, f)
- simplified creation, type is automatically selected; -
dt3 = fixdt(Fixed{S, W, f})
- obtaining a type based on an existing description; -
dt4 = fixdt(dt2)
- repeated usage, creates a copy from an existing type.
All these options create the same type Fixed{1, 25, 10, Int32}
, and can be used depending on the task:
-
The full description (
dt1
) is useful when control over all parameters is needed; -
The simplified way (
dt2
) is suitable for typical cases and shortens the code; -
Getting a type from a type (
dt3
) is useful when generating code or typing data; -
The reusable usage (
dt4
) helps to work with parameterised structures without re-entering parameters.
Constructors of type Fixed
Next, let’s look at specific scenarios for working with fixed points in Engee.
For example, you can directly set the type and pass the value:
x = Fixed{1, 15, 2}(25)
Conclusion:
fi(6.25, 1, 15, 2)
This means that is an integer representation (stored_integer
) and the real value (real_value
) will be equal to according to the formula .
Fixed{S, W, f}(i::T)
.
The constructor of creating a fixed point by integer representation Fixed{S, W, f}(i::T)
takes:
-
Format parameters:
S
(signedness),W
(width in bits),f
(fractional part); -
Integer value
i
of typeT
(internal representation).
S, W, f = 1, 15, 2 # signed, 15 bits, 2 fractional bits
i = 25
x = Fixed{S, W, f}(i) # creation from an integer
Conclusion:
fi(6.25, 1, 15, 2) # equivalent representation
Fixed{S, W, f, T1}(i::T2)
.
Similar constructor to the previous one, with the ability to explicitly specify the storage type. The type will be automatically matched according to the parameters S
, W
, f
, regardless of the specified T1
.
T = Int128
x = Fixed{S, W, f, T}(i) # indicating the type of storage
Conclusion:
fi(6.25, 1, 15, 2) # the result is identical
Constructors from FixedPointNumbers.jl
Despite usage of the new EngeeFixedPoint.jl
package, it retains compatibility with the FixedPointNumbers.jl
package to support a number of constructors. Only signed types are supported.
Supported:
-
Fixed{T, f}(i::Integer, _)
- constructor by integer representation. Accepts typeT
and parameterf
; -
Fixed{T, f}(value)
- constructor by real value (float
).
Example:
T = Int32
x1 = Fixed{T, f}(i, nothing) # from an integer
x2 = Fixed{T, f}(i) # from a real number
Conclusion:
6.25 # result of the first constructor
25.0 # result of the second constructor
Auxiliary methods fi
The main convenient way to create fixed-point numbers is through fi
auxiliary methods. Unlike constructors, they automatically determine the parameters of the representation.
x1 = fi(3.37, 0, 63, 4) # Full format with explicit parameters
x2 = fi(3.37, fixdt(0, 63, 4)) # Via the data type
x3 = fi(3.37, 0, 63) # With automatic fractional part detection
x4 = fi(100, 1, 8, 5) # Demonstration of overflow handling
Conclusion:
3.375 # value adjusted for rounding
true # x1 and x2 are identical
3.37 # with automatic selection
3.96875 # saturation result at overflow
Complex numbers
Full support for fixed-point complex numbers with the same methods of creation via fi
:
s, w, f = 1, 62, 7;
v = 2.5 - 3.21im
x1 = fi(v, s, w, f)
x2 = fi(v, fixdt(s, w, f))
x3 = fi(v, s, w)
println(x1)
println(x1 == x2)
println(x3)
println()
Output:
fi(2.5, 1, 62, 7) - fi(3.2109375, 1, 62, 7)*im
true
fi(2.5, 1, 62, 59) - fi(3.21, 1, 62, 59)*im
Working with arrays and matrices
The library provides full support for vector and matrix operations with fixed point numbers. All operations preserve the element type and automatically apply the specified precision parameters to all array elements.
Vectors
Create and work with one-dimensional arrays. Fixed point parameters are applied to all elements:
s, w, f = 1, 62, 7 # signed type, 62 bits, 7 bits of fractional part
v = [1, 2, 3] # the original vector
# Different ways to create:
x1 = fi(v, s, w, f) # with explicit parameters specified
x2 = fi(v, fixdt(s, w, f)) # via the data type
x3 = fi(v, s, w) # with automatic fractional part detection
println(x1)
println(x1 == x2)
println(x3)
Output:
Fixed{1, 62, 7}[1.0, 2.0, 3.0]
true
Fixed{1, 62, 59}[1.0, 2.0, 3.0]
Complex matrices
Full support for complex numbers in multidimensional arrays:
s, w, f = 1, 62, 7
m = [im 2.5; -1.2im 25-im]
# Working ways to create:
x1 = fi(m, s, w, f) # with explicit parameters specified
x2 = fi(m, fixdt(s, w, f)) # via the data type
println(x1)
println(x1 == x2)
Output:
Complex{Fixed{1, 62, 7, Int64}}[fi(0.0, 1, 62, 7) + fi(1.0, 1, 62, 7)*im fi(2.5, 1, 62, 7) + fi(0.0, 1, 62, 7)*im; fi(0.0, 1, 62, 7) - fi(1.203125, 1, 62, 7)*im fi(25.0, 1, 62, 7) - fi(1.0, 1, 62, 7)*im]
true
Basic operations and methods
Describes methods for working with fixed point numbers, allowing you to define the allowable range of values and basic properties.
Boundary values
The typemax
and typemin
methods allow you to define the maximum and minimum possible values for a particular fixed point type.
dt = fixdt(0, 25, -2) # an unsigned type with 25 bits and a fractional part of -2
x = fi(1.5, dt) # creating a fixed point number
println(typemax(x)) # 1.34217724e8 is the maximum representable value
println(typemin(x)) # 0.0 is the minimum value for an unsigned type.
Mathematical operations
The system automatically selects the optimal format for the result of operations, maintaining accuracy and preventing overflow. All basic arithmetic operations (addition, subtraction, multiplication, division) are supported:
x1 = fi(1.5, 0, 15, 3)
x2 = fi(1.5, 1, 25, 14)
y1 = x1+x2
y2 = x1-x2
y3 = x1*x2
y4 = x1/x2
println(y1)
println(y2)
println(y3)
println(y4)
println(typeof(y1))
println(typeof(y2))
println(typeof(y3))
println(typeof(y4))
println(x1 == x2)
println(x1 <= x2)
println(x1 > x2)
Conclusion:
3.0
0.0
2.25
0.0
Fixed{1, 28, 14, Int32}
Fixed{1, 28, 14, Int32}
Fixed{1, 40, 17, Int64}
Fixed{1, 25, -11, Int32}
true
true
false
Rounding
Various rounding strategies allow you to control the accuracy of your calculations. By default, RoundNearestTiesUp rounding is used.
x = fi(1.5, 1, 14, 3) # signed, 14 bits, 3 fractional bits
println(round(x)) # 2.0 – rounding to the nearest integer (1.5 → 2)
println(trunc(x)) # 1.0 – dropping the fractional part
println(ceil(x)) # 2.0 – rounding up to a larger integer
println(floor(x)) # 1.0 – rounding down to a smaller integer
Where:
-
round
- bank rounding (to the nearest even at0.5
); -
trunc
- discard fractional part; -
ceil
- always upwards; -
floor
- always downward.
Type conversion (conversion)
Conversion to standard data types is useful when interacting with other libraries. When converting, rounding rules are taken into account.
x = fi(1.5, 1, 12, 4)
y1 = Int64(x)
y2 = UInt8(x)
y3 = Float64(x)
y4 = convert(fixdt(0, 5, 2), x)
println(y1)
println(y2)
println(y3)
println(y4)
println(typeof(y1))
println(typeof(y2))
println(typeof(y3))
println(typeof(y4))
Conclusion:
1
1
1.5
1.5
Int64
UInt8
Float64
Fixed{0, 5, 2, UInt8}
Conclusion
In summary, the EngeeFixedPoint.jl
package provides the following benefits:
-
*Expanded type system:
-
Full support for both signed and unsigned numbers;
-
Arbitrary bit size (any bit size, not just 8/16/32/64/128);
-
Flexible fractional part setting (including negative values and cases when , fractional part length is greater than word length ).
-
-
*Improved type output rules:
-
*Platform-dependent code generation:
-
Different type inheritance rules for target platforms (C or Verilog);
-
Predictable behaviour on 128-bit boundary overflow (unlike analogues).
-
-
*Expanded functionality:
-
Optimised handling of arrays and matrices (cf. Working with arrays and matrices);
-
Full support for complex numbers (see ). Complex numbers);
-
Efficient rounding methods (round/trunc/ceil/floor, see ). Rounding).
-
Support for basic methods (zero/one/typemin/typemax, see ). Boundary values.
-
f
). Since fixed-point numbers cannot accurately represent all possible fractional values, operations round the result according to a given strategy (e.g., to the nearest value or with truncation).
W
) and signability (S
). In such cases, an overflow handling strategy is applied: saturation, where the value is limited to the maximum/minimum allowed, or truncation or error output