Volume 1 Application Programming (794095), страница 58
Текст из файла (страница 58)
The instruction thenwrites the result of each subtraction into the corresponding quadword of the destination.The PFSUBR instruction performs a subtraction that is the reverse of the PFSUB instruction. Itsubtracts each value in the first operand from the corresponding value in the second operand. Theprovision of both the PFSUB and PFSUBR instructions allows software to choose which sourceoperand to overwrite during a subtraction.Multiplication•PFMUL—Packed Floating-Point MultiplyThe PFMUL instruction multiplies each of the two single-precision floating-point values in the firstoperand by the corresponding single-precision floating-point value in the second operand and writesthe result of each multiplication into the corresponding doubleword of the destination.64-Bit Media Programming225AMD64 Technology24592—Rev.
3.13—July 2007DivisionFor a description of floating-point division techniques, see “Reciprocal Estimation” on page 227.Division is equivalent to multiplication of the dividend by the reciprocal of the divisor.Accumulation• PFACC—Packed Floating-Point Accumulate• PFNACC—Packed Floating-Point Negative Accumulate• PFPNACC—Packed Floating-Point Positive-Negative AccumulateThe PFACC instruction adds the two single-precision floating-point values in the first operand andwrites the result into the low-order word of the destination, and it adds the two single-precision valuesin the second operand and writes the result into the high-order word of the destination.
Figure 5-17illustrates the operation.operand 163operand 2063+630+result0513-183.epsFigure 5-17.PFACC Accumulate OperationThe PFNACC instruction subtracts the first operand’s high-order single-precision floating-point valuefrom its low-order single-precision floating-point value and writes the result into the low-orderdoubleword of the destination, and it subtracts the second operand’s high-order single-precisionfloating-point value from its low-order single-precision floating-point value and writes the result intothe high-order doubleword of the destination.The PFPNACC instruction subtracts the first operand’s high-order single-precision floating-pointvalue from its low-order single-precision floating-point value and writes the result into the low-orderdoubleword of the destination, and it adds the two single-precision values in the second operand andwrites the result into the high-order doubleword of the destination.PFPNACC is useful in complex-number multiplication, in which mixed positive-negativeaccumulation must be performed.
Assuming that complex numbers are represented as two-element22664-Bit Media Programming24592—Rev. 3.13—July 2007AMD64 Technologyvectors (one element is the real part, the other element is the imaginary part), there is a need to swapthe elements of one source operand to perform the multiplication, and there is a need for mixedpositive-negative accumulation to complete the parallel computation of real and imaginary results. ThePSWAPD instruction can swap elements of one source operand and the PFPNACC instruction canperform the mixed positive-negative accumulation to complete the computation.Reciprocal Estimation• PFRCP—Packed Floating-Point Reciprocal Approximation• PFRCPIT1—Packed Floating-Point Reciprocal, Iteration 1• PFRCPIT2—Packed Floating-Point Reciprocal or Reciprocal Square Root, Iteration 2The PFRCP instruction computes the approximate reciprocal of the single-precision floating-pointvalue in the low-order 32 bits of the second operand and writes the result into both doublewords of thefirst operand.The PFRCPIT1 instruction performs the first intermediate step in the Newton-Raphson iteration torefine the reciprocal approximation produced by the PFRCP instruction.
The first operand contains theinput to a previous PFRCP instruction, and the second operand contains the result of the same PFRCPinstruction.The PFRCPIT2 instruction performs the second and final step in the Newton-Raphson iteration torefine the reciprocal approximation produced by the PFRCP instruction or the reciprocal square-rootapproximation produced by the PFSQRT instructions.
The first operand contains the result of aprevious PFRCPIT1 or PFRSQIT1 instruction, and the second operand contains the result of a PFRCPor PFRSQRT instruction.The PFRCP instruction can be used together with the PFRCPIT1 and PFRCPIT2 instructions toincrease the accuracy of a single-precision significand.Reciprocal Square Root• PFRSQRT—Packed Floating-Point Reciprocal Square Root Approximation• PFRSQIT1—Packed Floating-Point Reciprocal Square Root, Iteration 1The PFRSQRT instruction computes the approximate reciprocal square root of the single-precisionfloating-point value in the low-order 32 bits of the second operand and writes the result into eachdoubleword of the first operand. The second operand is a single-precision floating-point value with a24-bit significand.
The result written to the first operand is accurate to 15 bits. Negative operands aretreated as positive operands for purposes of reciprocal square-root computation, with the sign of theresult the same as the sign of the source operand.The PFRSQIT1 instruction performs the first step in the Newton-Raphson iteration to refine thereciprocal square-root approximation produced by the PFSQRT instruction. The first operand containsthe input to a previous PFRSQRT instruction, and the second operand contains the square of the resultof the same PFRSQRT instruction.64-Bit Media Programming227AMD64 Technology24592—Rev. 3.13—July 2007The PFRSQRT instruction can be used together with the PFRSQIT1 instruction and the PFRCPIT2instruction (described in “Reciprocal Estimation” on page 227) to increase the accuracy of a singleprecision significand.5.7.4 CompareThe floating-point vector-compare instructions compare two operands, and they either write a mask orthey write the maximum or minimum value.Compare and Write Mask• PFCMPEQ—Packed Floating-Point Compare Equal• PFCMPGT—Packed Floating-Point Compare Greater Than• PFCMPGE—Packed Floating-Point Compare Greater or EqualThe PFCMPx instructions compare each of the two single-precision floating-point values in the firstoperand with the corresponding single-precision floating-point value in the second operand.
Theinstructions then write the result of each comparison into the corresponding doubleword of thedestination. If the comparison test (equal, greater than, greater or equal) is true, the result is a mask ofall 1s. If the comparison test is false, the result is a mask of all 0s.Compare and Write Minimum or Maximum• PFMAX—Packed Floating-Point Maximum• PFMIN—Packed Floating-Point MinimumThe PFMAX and PFMIN instructions compare each of the two single-precision floating-point valuesin the first operand with the corresponding single-precision floating-point value in the second operand.The instructions then write the maximum (PFMAX) or minimum (PFMIN) of the two values for eachcomparison into the corresponding doubleword of the destination.The PFMIN and PFMAX instructions are useful for clamping, such as color clamping in 3D geometryand rasterization.
They can also be used to avoid branching.5.8Instruction Effects on FlagsThe 64-bit media instructions do not read or write any flags in the rFLAGS register, nor do they writeany exception-status flags in the x87 status-word register, nor is their execution dependent on anymask bits in the x87 control-word register. The only x87 state affected by the 64-bit media instructionsis described in “Actions Taken on Executing 64-Bit Media Instructions” on page 232.5.9Instruction PrefixesInstruction prefixes, in general, are described in “Instruction Prefixes” on page 71. The followingrestrictions apply to the use of instruction prefixes with 64-bit media instructions.22864-Bit Media Programming24592—Rev.
3.13—July 2007AMD64 Technology5.9.1 Supported PrefixesThe following prefixes can be used with 64-bit media instructions:•••••Address-Size Override—The 67h prefix affects only operands in memory. The prefix is ignored byall other 64-bit media instructions.Operand-Size Override—The 66h prefix is used to form the opcodes of certain 64-bit mediainstructions. The prefix is ignored by all other 64-bit media instructions.Segment Overrides—The 2Eh (CS), 36h (SS), 3Eh (DS), 26h (ES), 64h (FS), and 65h (GS)prefixes affect only operands in memory. In 64-bit mode, the contents of the CS, DS, ES, SSsegment registers are ignored.REP—The F2 and F3h prefixes do not function as repeat prefixes for 64-bit media instructions.Instead, they are used to form the opcodes of certain 64-bit media instructions.