Fixed-point numbers are treated as signed or unsigned integers.
In an unsigned fixed-point number, all bits are used to express the absolute value of the number. When two unsigned fixed-point numbers are added, the shorter number is considered to be extended with high-order zeros.
For signed fixed-point numbers, the leftmost bit represents the sign, which is followed by the integer field. Positive numbers are represented in true binary notation with the sign bit set to zero. Negative numbers are represented in two's-complement binary notation with a one in the sign-bit position.
Specifically, a negative number is represented by the two's complement of the positive number. The two's complement of a number is obtained by inverting each bit of the number and adding a one in the low-order bit position.
This type of number representation can be considered the low-order portion of an infinitely long representation of the number. When the number is positive, all bits to the left of the most significant bit of the number are zeros. the number is negative, all these bits are ones. Therefore, when an operand must be extended with high-order bits, the expansion is achieved by setting the bits equal to the high-order bit of the operand.
The notation for signed fixed-point numbers does not include a negative zero. It has a number range in which the set of negative numbers is one larger than the set of positive numbers. The maximum positive number consists of an all-one integer field with a sign bit of zero, whereas the maximum negative number (the negative number with the greatest absolute value) consists of an all-zero integer field with a sign bit of one.
The complement of the maximum negative number cannot be represented in the same number of bits. when an operation, such as a subtraction of the maximum negative number from zero, attempts to produce the complement of the maximum negative number, a fixed-point overflow exception is recognized. An overflow does not result, however, when the maximum negative number is complemented and the final result is within the representable range. An example of this case is a subtraction of the maximum negative number from minus one. The product of two maximum negative numbers is representable as a double-length positive number.
In discussions of signed fixed-point numbers in this publication, the expression ``32-bit signed integer'' denotes a 31-bit integer with a sign bit, and the expression ``64-bit signed integer'' denotes a 63-bit integer with a sign bit.
In some operations, the result is achieved by the use of the one's complement of the number. The one's complement of a number is obtained by inverting each bit of the number.
In an arithmetic operation, a carry out of the integer field changes the sign. However, in algebraic left-shifting the sign bit does not change even if significant high-order bits are shifted out.
Programming Note
The integer part of a signed fixed-point number may be considered to
represent
a positive value , with the sign representing a value of either zero or
the
maximum negative number,