Engee documentation

Arithmetic in HDL and Fixed-Point

When generating code in Engee, as well as when working with Fixed-Point in the computing environment and in simulation, certain rules apply to arithmetic operations.

Rules arithmetic in C codegen icon 1 have a set of implicit conversions, and the type of result of a mathematical operation is derived based on the signs and argument ranks.

Rules of arithmetic operations in HDL verilog icon (hardware description languages, e.g. Verilog, Chisel) are concise, but complicated by the arbitrary width of arguments, as well as a fixed point.

Rules: briefly and in words

  • If all types of arguments are unsigned, the result type is unsigned. Otherwise, the result type is iconic.

  • The type width is not limited to 8/16/32/64 bits, but can be any natural number up to and including 128.

  • The result type is selected so that there is no overflow.

Operation

Result type

Example

Addition

The minimum required type that accommodates the entire range of possible result values

uint8 + uint8 = uint9, because there may be a transfer from the senior category

Multiplication

The width of the result type is the sum of the widths of the argument types, the same with the fractional part.

uint8 * uint8 = uint16

Fixed-Point

Along with floating-point numbers, fixed-point numbers are used in software and hardware development. They are less flexible than floating-point numbers, but operations with fixed-point numbers are usually faster. The basic concepts of fixed-point numbers are presented below.

Engee currently uses a binary-point-only (power-of-two) variant of fixed-point numbers, rather than slope-bias.
Read about working with Fixed-Point in the Engee computing environment in the article Fixed-Point arithmetic in Engee.

Binary point

Integers are represented in binary as the sum of powers of two. For example, 9 in binary is :

Bit number from MSB (most significant) to LSB (least significant)

3

2

1

0

Meaning

1

0

0

1

Power of two * bit value

Contribution to the response

8

0

0

1

To represent non-integers in binary, you can mentally put a dot after the zero bit and start counting in negative powers of two. For example, 9.625 encoded as :

Bit number from MSB (most significant) to LSB (least significant)

3

2

1

0

-1

-2

-3

Meaning

1

0

0

1

1

0

1

Power of two * bit value

Contribution to the response

8

0

0

1

0.5

0

0.125

Definitions

  • Type width is the number of bits encoding the type values.

  • The number of "fractional" bits is the length of the fractional part (Fraction Length), it will also be designated as a Binary Point (BP).

  • The value in decimal is called the Real World Value (RWV).

  • The integer that would be obtained if the bits were numbered from LSB to MSB, starting from zero (even for non-integer real values), is denoted as stored integer, SI. For example, stored integer for — this is 9, and for — 77.

In this case, the following is performed .

Moreover, the length of the fractional part can be negative. In this case, the stored integer is multiplied by the modulus of the fractional part length. So, the number 112 can be represented using just three bits and scaling. :

Bit number from MSB (most significant) to LSB (least significant)

6

5

4

Meaning

1

1

1

Power of two * bit value

Contribution to the response

64

32

16

Also note that the length of the fractional part may exceed the width of the type. For example, it can be represented in the type of widths 3 and BP 11 as :

Bit number from MSB (most significant) to LSB (least significant)

-9

-10

-11

Meaning

0

1

0

Power of two * bit value

Contribution to the response

0

0.0009765625

0

Negative numbers are encoded as two’s complement (two’s complement).

Designations

The most general form is denoted by fixdt(S, W, f) according to the article Fixed-Point arithmetic in Engee. Here:

  • S - 0 means unsigned type, 1 means signed type;

  • W — type width;

  • f — the length of the fractional part (BP).

Examples of specific types of designations:

Designation Meaning

uint32

An unsigned integer type with a width of 32 bits

uint123

An unsigned integer type with a width of 123

sint16

Signed integer type of width 16

sfix32_En5

Fixed-point character type, width (total length) 32, fractional length 5

ufix32_En5

Unsigned fixed-point type, width 32, fractional part length 5

ufix3_E4

Unsigned fixed-point type, width 3, fractional part length -4

From the point of view of arithmetic, the following types are equivalent: uint42, ufix42, ufix42_En0, ufix42_E0.

Accuracy

The difference between numbers that differ only in the least significant bit is called precision. The accuracy depends on the length of the fractional part as follows: .

For example, ufix8_En3 It can represent numbers up to 0.125, and 1.0625 will round up to 1.125, and 1.0624 will round up to 1.0. A ufix8_E3 represents numbers up to 8.

Slope Bias

An alternative (and more flexible, but less performant) version of fixed-point numbers is the slope-bias representation, where numbers are encoded as:



Binary-point-only is a special case of slope-bias, where , .

Arithmetic

Before the operation, the binary points of the arguments are aligned, and then the arguments are added/multiplied. For example, for addition from ufix4_En3 and from ufix2_En1 you must first add two zeros after the imaginary point in the second number.:

  1001 (ufix4_En3)
+   01 (ufix2_En1)
= 1.001
+ 0.100
= 1.101

But with a fractional part length of 3, this is .

Rules: in detail

Operation Type of argument1 Type of argument2 Result type

Addition

fixdt(S1, W1, f1)

fixdt(S2, W2, f2)

The minimum required type that accommodates the entire range of possible result values, in simple cases
fixdt(S1 or S2, max{W1, W2} + 1, max{f1, f2})

Multiplication

fixdt(S1, W1, f1)

fixdt(S2, W2, f2)

fixdt(S1 or S2, W1 + W2, f1 + f2)

Addition

The width is selected to accommodate the possible range of values of the result of adding arguments.

The algorithm:

  1. We find the minimum possible and maximum possible values that could be obtained by adding the operands.

    1. To calculate the minimum possible value of the result, add the minimum values of the argument types (0 for unsigned numbers and х for iconic characters).

    2. To calculate the maximum possible value of the result, we add the maximum values of the argument types.

  2. We find the minimum number of bits that are needed to preserve the range of the result.


Examples:

  • When adding two sfix5_En2 it will work sfix6_En2, since we need to expand the result type by one bit in order to save a possible transfer from the highest bit somewhere. When adding up ufix8_En1 and ufix8_En3 it will work ufix11_En3:

      1111111.1
    + 11111.111
    =  1111111.100
    +  0011111.111
    = 11011111.001

    Since bit_width .

Other examples:

Type of argument1 Type of argument2 Result type

sfix5_En2

sfix5_En2

sfix6_En2

sfix18_En5

sfix16_En6

sfix20_En6

ufix5_En17

ufix32_E2

ufix51_En17

ufix4_En7

ufix5_E2

ufix14_En7

ufix4_En7

sfix5_E2

sfix15_En7

ufix4_En7

sfix14_E2

sfix24_En7

ufix4_En7

sfix14_En2

sfix20_En7

sfix4_En38

sfix6_E78

sfix123_En38

The sum of several numbers

For multiple arguments, the minimum possible width of the accumulator is found for all arguments at once, rather than sequentially first for the first two arguments, then for adding the previous result and the third argument, etc. Calculations taking into account the types of all arguments at the same time allow you to save the width needed to save the result.

The sum of the vector elements

For the block Sum (summation of the elements of the incoming vector, special mode of the Add block) optimization is applied to avoid calculations in a loop (as happens in Add), and the type of result is determined quickly. Namely, since the type of all elements is the same fixdt(S, W, f), it can be noted that:



Similarly, the minimum value of the result is determined, and then the required type width is selected.

It can be estimated that .

Multiplication

The rules for inferring the type of the multiplication result are fully described in the table in the Rules: in detail section, and apply by induction and associativity to any number of arguments. There are no exceptions or any additional difficulties.