Arithmetic in HDL and Fixed-Point
When generating code in Engee, as well as when working with Fixed-Point in the computing environment and in simulation, certain rules apply to arithmetic operations.
Rules arithmetic in C
have a set of implicit conversions, and the type of result of a mathematical operation is derived based on the signs and argument ranks.
Rules of arithmetic operations in HDL
(hardware description languages, e.g. Verilog, Chisel) are concise, but complicated by the arbitrary width of arguments, as well as a fixed point.
Rules: briefly and in words
-
If all types of arguments are unsigned, the result type is unsigned. Otherwise, the result type is iconic.
-
The type width is not limited to 8/16/32/64 bits, but can be any natural number up to and including 128.
-
The result type is selected so that there is no overflow.
Operation |
Result type |
Example |
|---|---|---|
Addition |
The minimum required type that accommodates the entire range of possible result values |
|
Multiplication |
The width of the result type is the sum of the widths of the argument types, the same with the fractional part. |
|
Fixed-Point
Along with floating-point numbers, fixed-point numbers are used in software and hardware development. They are less flexible than floating-point numbers, but operations with fixed-point numbers are usually faster. The basic concepts of fixed-point numbers are presented below.
| Engee currently uses a binary-point-only (power-of-two) variant of fixed-point numbers, rather than slope-bias. |
| Read about working with Fixed-Point in the Engee computing environment in the article Fixed-Point arithmetic in Engee. |
Binary point
Integers are represented in binary as the sum of powers of two. For example, 9 in binary is :
Bit number from MSB (most significant) to LSB (least significant) |
3 |
2 |
1 |
0 |
Meaning |
1 |
0 |
0 |
1 |
Power of two * bit value |
|
|
|
|
Contribution to the response |
8 |
0 |
0 |
1 |
To represent non-integers in binary, you can mentally put a dot after the zero bit and start counting in negative powers of two. For example, 9.625 encoded as :
Bit number from MSB (most significant) to LSB (least significant) |
3 |
2 |
1 |
0 |
-1 |
-2 |
-3 |
Meaning |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
Power of two * bit value |
|
|
|
|
|
|
|
Contribution to the response |
8 |
0 |
0 |
1 |
0.5 |
0 |
0.125 |
Definitions
-
Type width is the number of bits encoding the type values.
-
The number of "fractional" bits is the length of the fractional part (Fraction Length), it will also be designated as a Binary Point (BP).
-
The value in decimal is called the Real World Value (RWV).
-
The integer that would be obtained if the bits were numbered from LSB to MSB, starting from zero (even for non-integer real values), is denoted as stored integer, SI. For example, stored integer for — this is 9, and for — 77.
In this case, the following is performed .
Moreover, the length of the fractional part can be negative. In this case, the stored integer is multiplied by the modulus of the fractional part length. So, the number 112 can be represented using just three bits and scaling. :
Bit number from MSB (most significant) to LSB (least significant) |
6 |
5 |
4 |
Meaning |
1 |
1 |
1 |
Power of two * bit value |
|
|
|
Contribution to the response |
64 |
32 |
16 |
Also note that the length of the fractional part may exceed the width of the type. For example, it can be represented in the type of widths 3 and BP 11 as :
Bit number from MSB (most significant) to LSB (least significant) |
-9 |
-10 |
-11 |
Meaning |
0 |
1 |
0 |
Power of two * bit value |
|
|
|
Contribution to the response |
0 |
0.0009765625 |
0 |
Negative numbers are encoded as two’s complement (two’s complement).
Designations
The most general form is denoted by fixdt(S, W, f) according to the article Fixed-Point arithmetic in Engee. Here:
-
S- 0 means unsigned type, 1 means signed type; -
W— type width; -
f— the length of the fractional part (BP).
Examples of specific types of designations:
| Designation | Meaning |
|---|---|
|
An unsigned integer type with a width of 32 bits |
|
An unsigned integer type with a width of 123 |
|
Signed integer type of width 16 |
|
Fixed-point character type, width (total length) 32, fractional length 5 |
|
Unsigned fixed-point type, width 32, fractional part length 5 |
|
Unsigned fixed-point type, width 3, fractional part length -4 |
From the point of view of arithmetic, the following types are equivalent: uint42, ufix42, ufix42_En0, ufix42_E0.
|
Accuracy
The difference between numbers that differ only in the least significant bit is called precision. The accuracy depends on the length of the fractional part as follows: .
For example, ufix8_En3 It can represent numbers up to 0.125, and 1.0625 will round up to 1.125, and 1.0624 will round up to 1.0. A ufix8_E3 represents numbers up to 8.
Slope Bias
An alternative (and more flexible, but less performant) version of fixed-point numbers is the slope-bias representation, where numbers are encoded as:
Binary-point-only is a special case of slope-bias, where , .
Arithmetic
Before the operation, the binary points of the arguments are aligned, and then the arguments are added/multiplied. For example, for addition from ufix4_En3 and from ufix2_En1 you must first add two zeros after the imaginary point in the second number.:
1001 (ufix4_En3)
+ 01 (ufix2_En1)
= 1.001
+ 0.100
= 1.101
But with a fractional part length of 3, this is .
Rules: in detail
| Operation | Type of argument1 | Type of argument2 | Result type |
|---|---|---|---|
Addition |
|
|
The minimum required type that accommodates the entire range of possible result values, in simple cases |
Multiplication |
|
|
|
Addition
The width is selected to accommodate the possible range of values of the result of adding arguments.
The algorithm:
-
We find the minimum possible and maximum possible values that could be obtained by adding the operands.
-
To calculate the minimum possible value of the result, add the minimum values of the argument types (0 for unsigned numbers and for iconic characters).
-
To calculate the maximum possible value of the result, we add the maximum values of the argument types.
-
-
We find the minimum number of bits that are needed to preserve the range of the result.
Examples:
-
When adding two
sfix5_En2it will worksfix6_En2, since we need to expand the result type by one bit in order to save a possible transfer from the highest bit somewhere. When adding upufix8_En1andufix8_En3it will workufix11_En3:1111111.1 + 11111.111 = 1111111.100 + 0011111.111 = 11011111.001Since bit_width .
Other examples:
| Type of argument1 | Type of argument2 | Result type |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The sum of several numbers
For multiple arguments, the minimum possible width of the accumulator is found for all arguments at once, rather than sequentially first for the first two arguments, then for adding the previous result and the third argument, etc. Calculations taking into account the types of all arguments at the same time allow you to save the width needed to save the result.
The sum of the vector elements
For the block Sum (summation of the elements of the incoming vector, special mode of the Add block) optimization is applied to avoid calculations in a loop (as happens in Add), and the type of result is determined quickly. Namely, since the type of all elements is the same fixdt(S, W, f), it can be noted that:
Similarly, the minimum value of the result is determined, and then the required type width is selected.
It can be estimated that .
Multiplication
The rules for inferring the type of the multiplication result are fully described in the table in the Rules: in detail section, and apply by induction and associativity to any number of arguments. There are no exceptions or any additional difficulties.