Volume 5 64-Bit Media and x87 Floating-Point Instructions (794099), страница 20
Текст из файла (страница 20)
Theremaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-thelast-place (ulp). For details, see the data sheet or other software-optimization documentation relatingto particular hardware implementations.PFRCP(x) returns 0 for x >= 2-126. The numeric range for operands is shown in Table 1-15.The PFRCP instruction is an AMD 3DNow!™ instruction. The presence of this instruction set isindicated by CPUID feature bits. (See “CPUID” in Volume 3.)AMD no longer recommends the use of 3DNow! instructions, which have been superceded by theirmore efficient 128-bit media counterparts.
For a complete list of recommended instructionsubstitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” onpage 335.Recommended Instruction SubstitutionRCPSSMnemonicOpcodePFRCP mmx1, mmx2/mem640F 0F /r96DescriptionComputes approximate reciprocal of single-precisionfloating-point value in an MMX register or 64-bit memorylocation and writes the result in both doublewords of thedestination MMX register.mmx16332 31mmx2/mem6463032 310approximatereciprocalpfrcp.eps114PFRCPInstruction Reference26569—Rev. 3.08—July 2007AMD64 TechnologyTable 1-15.
Numeric Range for the PFRCP ResultOperandSource 2Source 1 and Destination0+/– Maximum Normal1NormalNormal, +/– 02Unsupported3UndefinedNote:1. The result has the same sign as the source operand.2. If the absolute value of the result is less then 2–126, the result is zero with the sign being the sign of thesource operand. Otherwise, the result is a normal with the sign being the same sign as the sourceoperand.3.
“Unsupported” means that the exponent is all ones (1s).ExamplesThe general Newton-Raphson recurrence for the reciprocal 1/b is:Zi +1 ← Zi • (2 – b • Zi)The following code sequence shows the computation of a/b:X0 = PFRCP(b)X1 = PFRCPIT1(b, X0)X2 = PFRCPIT2(X1, X0)q = PFMUL(a, X2)The 24-bit final reciprocal value is X2. The quotient is formed in the last step by multiplying thereciprocal by the dividend a.Related InstructionsPFRCPIT1, PFRCPIT2rFLAGS AffectedNoneInstruction ReferencePFRCP115AMD64 Technology26569—Rev. 3.08—July 2007ExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe emulate bit (EM) of CR0 was set to 1.XXXThe AMD 3DNow!™ instructions are not supported,as indicated by EDX bit 31 in CPUID function8000_0001h.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXA page fault resulted from the execution of theinstruction.XXAn unmasked x87 floating-point exception waspending.XXAn unaligned memory reference was performed whilealignment checking was enabled.Invalid opcode, #UDGeneral protection, #GPPage fault, #PFx87 floating-pointexception pending, #MFAlignment check, #AC116XPFRCPInstruction Reference26569—Rev.
3.08—July 2007PFRCPIT1AMD64 TechnologyPacked Floating-Point Reciprocal Iteration 1Performs the first step in the Newton-Raphson iteration to refine the reciprocal approximationproduced by the PFRCP instruction. The first source/destination operand is an MMX registercontaining the results of two previous PFRCP instructions, and the second source operand is anotherMMX register or 64-bit memory location containing the source operands from the same PFRCPinstructions.This instruction is only defined for those combinations of operands such that the first source operand(mmx1) is the approximate reciprocal of the second source operand (mmx2/mem64), and thus therange of the product, mmx1 * mmx2/mem64, is (0.5, 2).
The initial approximation of an operand isaccurate to about 12 bits, and the length of the operand itself is 24 bits, so the product of these twooperands is greater than 24 bits. PFRCPIT1 applies the one's complement of the product and roundsthe result to 32 bits.
It then compresses the result to fit into 24 bits by removing the 8 redundant mostsignificant bits after the hidden integer bit.The estimate contains the correct round-to-nearest value for approximately 99% of all arguments. Theremaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-thelast-place (ulp).
For details, see the data sheet or other software-optimization documentation relatingto particular hardware implementations.The PFRCPIT1 instruction is an AMD 3DNow!™ instruction. The presence of this instruction set isindicated by CPUID feature bits. (See “CPUID” in Volume 3.)AMD no longer recommends the use of 3DNow! instructions, which have been superceded by theirmore efficient 128-bit media counterparts. For a complete list of recommended instructionsubstitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” onpage 335.Recommended Instruction SubstitutionPFRCPMnemonicPFRCPIT1 mmx1,mmx2/mem64Instruction ReferenceOpcode0F 0F /rA6DescriptionRefine approximate reciprocal of result from previousPFRCP instruction.PFRCPIT1117AMD64 Technology26569—Rev. 3.08—July 2007mmx16332 31PFRCP Resultmmx2/mem64063PFRCP Result32 31PFRCP Source0PFRCP SourceNewtonRaphsonreciprocalstep 1NewtonRaphsonreciprocalstep 1pfrcpit1.epsOperationmmx1[31:0] = Compress (2 - mmx1[31:0] * (mmx2/mem64[31:0]) - 231);mmx1[63:32] = Compress (2 - mmx1[63:32] * (mmx2/mem64[63:32]) - 231);where:“Compress” means discard the 8 redundant most-significant bits after the hidden integer bit.ExamplesThe general Newton-Raphson recurrence for the reciprocal 1/b is:Zi +1 ← Zi • (2 – b • Zi)The following code sequence computes a 24-bit approximation to a/b with one Newton-Raphsoniteration:X0 = PFRCP(b)X1 = PFRCPIT1(b, X0)X2 = PFRCPIT2(X1, X0)q = PFMUL(a, X2)a/b is formed in the last step by multiplying the reciprocal approximation by a.Related InstructionsPFRCP, PFRCPIT2rFLAGS AffectedNone118PFRCPIT1Instruction Reference26569—Rev.
3.08—July 2007AMD64 TechnologyExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe emulate bit (EM) of CR0 was set to 1.XXXThe AMD 3DNow!™ instructions are not supported,as indicated by EDX bit 31 in CPUID function8000_0001h.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXA page fault resulted from the execution of theinstruction.XXAn unmasked x87 floating-point exception waspending.XXAn unaligned memory reference was performed whilealignment checking was enabled.Invalid opcode, #UDGeneral protection, #GPPage fault, #PFx87 floating-pointexception pending, #MFAlignment check, #ACInstruction ReferenceXPFRCPIT1119AMD64 TechnologyPFRCPIT226569—Rev.
3.08—July 2007Packed Floating-Point Reciprocal or ReciprocalSquare Root Iteration 2Performs the second and final step in the Newton-Raphson iteration to refine the reciprocalapproximation produced by the PFRCP instruction or the reciprocal square-root approximationproduced by the PFSQRT instruction.
PFRCPIT2 takes two paired elements in each source operand.These paired elements are the results of a PFRCP and PFRCPIT1 instruction sequence or of aPFRSQRT and PFRSQIT1 instruction sequence. The first source/destination operand is an MMXregister that contains the PFRCPIT1 or PFRSQIT1 results and the second source operand is anotherMMX register or 64-bit memory location that contains the PFRCP or PFRSQRT results.The PFRCPIT2 instruction expands the compressed PFRCPIT1 or PFRSQIT1 results from 24 to 32bits and multiplies them by their respective source operands. An optimal correction factor is added tothe product, which is then rounded to 24 bits.The estimate contains the correct round-to-nearest value for approximately 99% of all arguments.
Theremaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-thelast-place (ulp). For details, see the data sheet or other software-optimization documentation relatingto particular hardware implementations.The PFRCPIT2 instruction is an AMD 3DNow!™ instruction. The presence of this instruction set isindicated by CPUID feature bits.
(See “CPUID” in Volume 3.)AMD no longer recommends the use of 3DNow! instructions, which have been superceded by theirmore efficient 128-bit media counterparts. For a complete list of recommended instructionsubstitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” onpage 335.Recommended Instruction SubstitutionPFRCPMnemonicPFRCPIT2 mmx1, mmx2/mem64120Opcode0F 0F /rB6DescriptionRefines approximate reciprocal result from previousPFRCP and PFRCPIT1 instructions or from previousPFRSQRT and PFRSQIT1 instructions.PFRCPIT2Instruction Reference26569—Rev.
3.08—July 2007AMD64 Technologymmx16332 31Iteration-1 Resultmmx2/mem64063Iteration-1 Result32 31Reciprocal ResultNewtonRaphsonreciprocalstep 20Reciprocal ResultNewtonRaphsonreciprocalstep 2pfrcpit2.epsOperationmmx1[31:0] = Expand(mmx1[31:0]) * mmx2/mem64[31:0];mmx1[63:32] = Expand(mmx1[63:32]) * mmx2/mem64[63:32];where:“Expand” means convert a 24-bit significand to a 32-bit significand according to the following rule:temp[31:0] = {1’b1, 8{mmx1[22]}, mmx1[22:0]};ExamplesThe general Newton-Raphson recurrence for the reciprocal 1/b is:Zi +1 ← Zi • (2 – b • Zi)The following code sequence computes a 24-bit approximation to a/b with one Newton-Raphsoniteration:X0 = PFRCP(b)X1 = PFRCPIT1(b, X0)X2 = PFRCPIT2(X1, X0)q = PFMUL(a, X2)a/b is formed in the last step by multiplying the reciprocal approximation by a.Related InstructionsPFRCP, PFRCPIT1, PFRSQRT, PFRSQIT1rFLAGS AffectedNoneInstruction ReferencePFRCPIT2121AMD64 Technology26569—Rev.
3.08—July 2007ExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe emulate bit (EM) of CR0 was set to 1.XXXThe AMD 3DNow!™ instructions are not supported,as indicated by EDX bit 31 in CPUID function8000_0001h.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXA page fault resulted from the execution of theinstruction.XXAn unmasked x87 floating-point exception waspending.XXAn unaligned memory reference was performed whilealignment checking was enabled.Invalid opcode, #UDGeneral protection, #GPPage fault, #PFx87 floating-pointexception pending, #MFAlignment check, #AC122XPFRCPIT2Instruction Reference26569—Rev. 3.08—July 2007PFRSQIT1AMD64 TechnologyPacked Floating-Point Reciprocal Square RootIteration 1Performs the first step in the Newton-Raphson iteration to refine the reciprocal square-rootapproximation produced by the PFSQRT instruction.