Volume 1 Application Programming (794095), страница 37
Текст из файла (страница 37)
The integer bit is implied, making a total of 24 bits in thesignificand.Double-Precision Format—This format includes a 1-bit sign, an 11-bit biased exponent whosevalue is 1023, and a 52-bit significand. The integer bit is implied, making a total of 53 bits in thesignificand.Table 4-3 on page 127 shows the range of finite values representable by the two floating-point datatypes.Table 4-3.
Range of Values in Normalized Floating-Point Data TypesRange of Normalized1 ValuesData TypeBase 2 (exact)Single PrecisionDouble Precision–1262127to 2Base 10 (approximate)–23* (2 – 2)2–1022 to 21023 * (2 – 2–52)1.17 * 10–38 to +3.40 * 10382.23 * 10–308 to +1.79 * 10308Note:1. See “Floating-Point Number Representation” on page 127 for a definition of “normalized”.For example, in the single-precision format, the largest normal number representable has an exponentof FEh and a significand of 7FFFFFh, with a numerical value of 2127 * (2 – 2–23). Results that overflowabove the maximum representable value return either the maximum representable normalized number(see “Normalized Numbers” on page 128) or infinity, with the sign of the true result, depending on therounding mode specified in the rounding control (RC) field of the MXCSR register.
Results thatunderflow below the minimum representable value return either the minimum representablenormalized number or a denormalized number (see “Denormalized (Tiny) Numbers” on page 128),with the sign of the true result, or a result determined by the SIMD floating-point exception handler,depending on the rounding mode and the underflow-exception mask (UM) in the MXCSR register (see“Unmasked Responses” on page 187).Compatibility with x87 Floating-Point Data Types. The results produced by 128-bit mediafloating-point instructions comply fully with the IEEE Standard for Binary Floating-Point Arithmetic(ANSI/IEEE Std 754), because these instructions represent data in the single-precision or doubleprecision data types throughout their operations.
The x87 floating-point instructions, however, bydefault perform operations in the double-extended-precision format. Because of this, x87 instructionsoperating on the same source operands as 128-bit media floating-point instructions may return resultsthat are slightly different in their least-significant bits.4.4.7 Floating-Point Number RepresentationA 128-bit media floating-point value can be one of five types, as follows:•••NormalDenormal (Tiny)Zero128-Bit Media and Scientific Programming127AMD64 Technology••24592—Rev. 3.13—July 2007InfinityNot a Number (NaN)In common engineering and scientific usage, floating-point numbers—also called real numbers—arerepresented in base (radix) 10.
A non-zero number consists of a sign, a normalized significand, and asigned exponent, as in:+2.71828 e0Both large and small numbers are representable in this notation, subject to the limits of data-typeprecision. For example, a million in base-10 notation appears as +1.00000 e6 and -0.0000383 isrepresented as -3.83000 e-5. A non-zero number can always be written in normalized form—that is,with a leading non-zero digit immediately before the decimal point. Thus, a normalized significand inbase-10 notation is a number in the range [1,10).
The signed exponent specifies the number ofpositions that the decimal point is shifted.Unlike the common engineering and scientific usage described above, 128-bit media floating-pointnumbers are represented in base (radix) 2. Like its base-10 counterpart, a normalized base-2significand is written with its leading non-zero digit immediately to the left of the radix point.
In base2 arithmetic, a non-zero digit is always a one, so the range of a binary significand is [1,2):+1.fraction±exponentThe leading non-zero digit is called the integer bit. As shown in Figure 4-15 on page 126, the integerbit is omitted (and called the hidden integer bit) in the single-precision and the double-precisionfloating-point formats, because its implied value is always 1 in a normalized significand (0 in adenormalized significand), and the omission allows an extra bit of precision.The following sections describe the number representations.Normalized Numbers. Normalized floating-point numbers are the most frequent operands for 128-bit media instructions.
These are finite, non-zero, positive or negative numbers in which the integer bitis 1, the biased exponent is non-zero and non-maximum, and the fraction is any representable value.Thus, the significand is within the range of [1, 2). Whenever possible, the processor represents afloating-point result as a normalized number.Denormalized (Tiny) Numbers. Denormalized numbers (also called tiny numbers) are smaller thanthe smallest representable normalized numbers. They arise through an underflow condition, when theexponent of a result lies below the representable minimum exponent. These are finite, non-zero,positive or negative numbers in which the integer bit is 0, the biased exponent is 0, and the fraction isnon-zero.The processor generates a denormalized-operand exception (DE) when an instruction uses adenormalized source operand.
The processor may generate an underflow exception (UE) when aninstruction produces a rounded, non-zero result that is too small to be represented as a normalizedfloating-point number in the destination format, and thus is represented as a denormalized number. If aresult, after rounding, is too small to be represented as the minimum denormalized number, it isrepresented as zero. (See “Exceptions” on page 177 for specific details.)128128-Bit Media and Scientific Programming24592—Rev. 3.13—July 2007AMD64 TechnologyDenormalization may correct the exponent by placing leading zeros in the significand. This may causea loss of precision, because the number of significant bits in the fraction is reduced by the leadingzeros.
In the single-precision floating-point format, for example, normalized numbers have biasedexponents ranging from 1 to 254 (the unbiased exponent range is from –126 to +127). A true resultwith an exponent of, say, –130, undergoes denormalization by right-shifting the significand by thedifference between the normalized exponent and the minimum exponent, as shown in Table 4-4 onpage 129.Table 4-4.Example of DenormalizationSignificand (base 2)ExponentResult Type1.0011010000000000–130True result0.0001001101000000–126Denormalized resultZero.
The floating-point zero is a finite, positive or negative number in which the integer bit is 0, thebiased exponent is 0, and the fraction is 0. The sign of a zero result depends on the operation beingperformed and the selected rounding mode. It may indicate the direction from which an underflowoccurred, or it may reflect the result of a division by +∞ or –∞.Infinity. Infinity is a positive or negative number, +∞ and –∞, in which the integer bit is 1, the biasedexponent is maximum, and the fraction is 0. The infinities are the maximum numbers that can berepresented in floating-point format.
Negative infinity is less than any finite number and positiveinfinity is greater than any finite number (i.e., the affine sense).An infinite result is produced when a non-zero, non-infinite number is divided by 0 or multiplied byinfinity, or when infinity is added to infinity or to 0.
Arithmetic on infinities is exact. For example,adding any floating-point number to +∞ gives a result of +∞. Arithmetic comparisons work correctlyon infinities. Exceptions occur only when the use of an infinity as a source operand constitutes aninvalid operation.Not a Number (NaN). NaNs are non-numbers, lying outside the range of representable floating-pointvalues. The integer bit is 1, the biased exponent is maximum, and the fraction is non-zero.
NaNs are oftwo types:••Signaling NaN (SNaN)Quiet NaN (QNaN)A QNaN is a NaN with the most-significant fraction bit set to 1, and an SNaN is a NaN with the mostsignificant fraction bit cleared to 0. When the processor encounters an SNaN as a source operand foran instruction, an invalid-operation exception (IE) occurs and a QNaN is produced as the result, if theexception is masked. In general, when the processor encounters a QNaN as a source operand for aninstruction, the processor does not generate an exception but generates a QNaN as the result.The processor never generates an SNaN as a result of a floating-point operation. When an invalidoperation exception (IE) occurs due to an SNaN operand, the invalid-operation exception mask (IM)128-Bit Media and Scientific Programming129AMD64 Technology24592—Rev. 3.13—July 2007bit determines the processor’s response, as described in “SIMD Floating-Point Exception Masking” onpage 184.When a floating-point operation or exception produces a QNaN result, its value is determined by therules in Table 4-5 on page 130.Table 4-5.NaN ResultsSource Operands(in either order)NaN Result1QNaNAny non-NaN floating-point value, orsingle-operand instructionsValue of QNaNSNaNAny non-NaN floating-point value, orsingle-operand instructionsValue of SNaN converted to a QNaN2QNaNQNaNQNaNSNaNSNaNQNaNSNaNSNaNInvalid-Operation Exception (IE) occurs without QNaNor SNaN source operandsValue of operand 1Value of operand 1 converted to a QNaN2Floating-point indefinite value3 (a specialform of QNaN)Note:1.