Volume 1 Application Programming (794095), страница 46
Текст из файла (страница 46)
The MULPD instruction performs ananalogous operation for two double-precision floating-point values.The MULSS instruction multiplies the single-precision floating-point value in the low-orderdoubleword of the first operand by the single-precision floating-point value in the low-orderdoubleword of the second operand and writes the result in the low-order doubleword of the destination.The three high-order doublewords of the destination are not modified.The MULSD instruction multiplies the double-precision floating-point value in the low-orderquadword of the first operand by the double-precision floating-point value in the low-order quadwordof the second operand and writes the result in the low-order quadword of the destination. The highorder quadword of the destination is not modified.128-Bit Media and Scientific Programming169AMD64 Technology24592—Rev.
3.13—July 2007Division• DIVPS—Divide Packed Single-Precision Floating-Point• DIVPD—Divide Packed Double-Precision Floating-Point• DIVSS—Divide Scalar Single-Precision Floating-Point• DIVSD—Divide Scalar Double-Precision Floating-PointThe DIVPS instruction divides each of the four single-precision floating-point values in the firstoperand by the corresponding single-precision floating-point value in the second operand and writesthe result in the corresponding quadword of the destination. The DIVPD instruction performs ananalogous operation for two double-precision floating-point values.
For vectors of n number ofelements, the operations are:operand1[i] = operand1[i]÷ operand2[i]where: i = 0 to n – 1The DIVSS instruction divides the single-precision floating-point value in the low-order doublewordof the first operand by the single-precision floating-point value in the low-order doubleword of thesecond operand and writes the result in the low-order doubleword of the destination. The three highorder doublewords of the destination are not modified.The DIVSD instruction divides the double-precision floating-point value in the low-order quadword ofthe first operand by the double-precision floating-point value in the low-order quadword of the secondoperand and writes the result in the low-order quadword of the destination.
The high-order quadwordof the destination is not modified.If accuracy requirements allow, convert floating-point division by a constant to a multiply by thereciprocal. Divisors that are powers of two and their reciprocals are exactly representable, andtherefore do not cause an accuracy issue, except for the rare cases in which the reciprocal overflows orunderflows.Square Root• SQRTPS—Square Root Packed Single-Precision Floating-Point• SQRTPD—Square Root Packed Double-Precision Floating-Point• SQRTSS—Square Root Scalar Single-Precision Floating-Point• SQRTSD—Square Root Scalar Double-Precision Floating-PointThe SQRTPS instruction computes the square root of each of four single-precision floating-pointvalues in the second operand (an XMM register or 128-bit memory location) and writes the result inthe corresponding doubleword of the destination.
The SQRTPD instruction performs an analogousoperation for two double-precision floating-point values.The SQRTSS instruction computes the square root of the low-order single-precision floating-pointvalue in the second operand (an XMM register or 32-bit memory location) and writes the result in the170128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007AMD64 Technologylow-order doubleword of the destination. The three high-order doublewords of the destination XMMregister are not modified.The SQRTSD instruction computes the square root of the low-order double-precision floating-pointvalue in the second operand (an XMM register or 64-bit memory location) and writes the result in thelow-order quadword of the destination. The high-order quadword of the destination XMM register isnot modified.Reciprocal Square Root• RSQRTPS—Reciprocal Square Root Packed Single-Precision Floating-Point• RSQRTSS—Reciprocal Square Root Scalar Single-Precision Floating-PointThe RSQRTPS instruction computes the approximate reciprocal of the square root of each of foursingle-precision floating-point values in the second operand (an XMM register or 128-bit memorylocation) and writes the result in the corresponding doubleword of the destination.The RSQRTSS instruction computes the approximate reciprocal of the square root of the low-ordersingle-precision floating-point value in the second operand (an XMM register or 32-bit memorylocation) and writes the result in the low-order doubleword of the destination.
The three high-orderdoublewords in the destination XMM register are not modified.For both RSQRTPS and RSQRTSS, the maximum relative error is less than or equal to 1.5 * 2–12.Reciprocal Estimation• RCPPS—Reciprocal Packed Single-Precision Floating-Point• RCPSS—Reciprocal Scalar Single-Precision Floating-PointThe RCPPS instruction computes the approximate reciprocal of each of the four single-precisionfloating-point values in the second operand (an XMM register or 128-bit memory location) and writesthe result in the corresponding doubleword of the destination.The RCPSS instruction computes the approximate reciprocal of the low-order single-precisionfloating-point value in the second operand (an XMM register or 32-bit memory location) and writesthe result in the low-order doubleword of the destination. The three high-order doublewords in thedestination are not modified.For both RCPPS and RCPSS, the maximum relative error is less than or equal to 1.5 * 2–12.4.6.6 CompareThe floating-point vector-compare instructions compare two operands, and they either write a mask, orthey write the maximum or minimum value, or they set flags.
Compare instructions can be used toavoid branches. Figure 4-10 on page 115 shows an example of using compare instructions.Compare and Write Mask• CMPPS—Compare Packed Single-Precision Floating-Point128-Bit Media and Scientific Programming171AMD64 Technology•••24592—Rev. 3.13—July 2007CMPPD—Compare Packed Double-Precision Floating-PointCMPSS—Compare Scalar Single-Precision Floating-PointCMPSD—Compare Scalar Double-Precision Floating-PointThe CMPPS instruction compares each of four single-precision floating-point values in the firstoperand with the corresponding single-precision floating-point value in the second operand and writesthe result in the corresponding 32 bits of the destination. The type of comparison is specified by thethree low-order bits of the immediate-byte operand.
The result of each compare is a 32-bit value of all1s (TRUE) or all 0s (FALSE). Some compare operations that are not directly supported by theimmediate-byte encodings can be implemented by swapping the contents of the source and destinationoperands before executing the compare.The CMPPD instruction performs an analogous operation for two double-precision floating-pointvalues. The CMPSS instruction performs an analogous operation for the single-precision floatingpoint values in the low-order 32 bits of the source operands.
The three high-order doublewords of thedestination are not modified. The CMPSD instruction performs an analogous operation for the doubleprecision floating-point values in the low-order 64 bits of the source operands. The high-order 64 bitsof the destination XMM register are not modified.Figure 4-36 shows a CMPPD compare operation.operand 1127operand 201270imm8comparecompareall 1s or 0s127resultall 1s or 0s0513-162.epsFigure 4-36. CMPPD Compare OperationCompare and Write Minimum or Maximum• MAXPS—Maximum Packed Single-Precision Floating-Point• MAXPD—Maximum Packed Double-Precision Floating-Point• MAXSS—Maximum Scalar Single-Precision Floating-Point172128-Bit Media and Scientific Programming24592—Rev. 3.13—July 2007•••••AMD64 TechnologyMAXSD—Maximum Scalar Double-Precision Floating-PointMINPS—Minimum Packed Single-Precision Floating-PointMINPD—Minimum Packed Double-Precision Floating-PointMINSS—Minimum Scalar Single-Precision Floating-PointMINSD—Minimum Scalar Double-Precision Floating-PointThe MAXPS and MINPS instructions compare each of four single-precision floating-point values inthe first operand with the corresponding single-precision floating-point value in the second operandand writes the maximum or minimum, respectively, of the two values in the corresponding doublewordof the destination.
The MAXPD and MINPD instructions perform analogous operations on pairs ofdouble-precision floating-point values.The MAXSS and MINSS instructions compare the single-precision floating-point value in the loworder 32 bits of the first operand with the single-precision floating-point value in the low-order 32 bitsof the second operand and writes the maximum or minimum, respectively, of the two values in the loworder 32 bits of the destination. The three high-order doublewords of the destination XMM register arenot modified.The MAXSD and MINSD instructions compare the double-precision floating-point value in the loworder 64 bits of the first operand with the double-precision floating-point value in the low-order 64 bitsof the second operand and writes the maximum or minimum, respectively, of the two values in the loworder quadword of the destination.