Volume 5 64-Bit Media and x87 Floating-Point Instructions (794099), страница 21
Текст из файла (страница 21)
The first source/destination operand is an MMXregister containing the result from a previous PFRSQRT instruction, and the second source operand isanother MMX register or 64-bit memory location containing the source operand from the samePFRSQRT instruction.This instruction is only defined for those combinations of operands such that the first source operand(mmx1) is the approximate reciprocal of the second source operand (mmx2/mem64), and thus therange of the product, mmx1 * mmx2/mem64, is (0.5, 2). The length of both operands is 24 bits, so theproduct of these two operands is greater than 24 bits.
The product is normalized and then rounded to 32bits. The one's complement of the result is applied, a 1 is added as the most-significant bit, and theresult re-normalized. The result is then compressed to fit into 24 bits by removing 8 redundant mostsignificant bits after the hidden integer bit, and the exponent is reduced by 1 to account for the divisionby 2.The PFRSQIT1 instruction is an AMD 3DNow!™ instruction. The presence of this instruction set isindicated by CPUID feature bits. (See “CPUID” in Volume 3.)AMD no longer recommends the use of 3DNow! instructions, which have been superceded by theirmore efficient 128-bit media counterparts. For a complete list of recommended instructionsubstitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” onpage 335.Recommended Instruction SubstitutionPFRSQRTMnemonicPFRSQIT1 mmx1,mmx2/mem64Instruction ReferenceOpcode0F 0F /rA7DescriptionRefines reciprocal square root approximation of previousPFRSQRT instruction.PFRSQIT1123AMD64 Technology26569—Rev.
3.08—July 2007mmx16332 31PFSQRT Resultmmx2/mem64063PFSQRT Result32 31PFSQRT SourceNewtonRaphsonreciprocalsquare rootstep 10PFSQRT SourceNewtonRaphsonreciprocalsquare rootstep 1pfrsqit1.epsOperationmmx1[31:0] = Compress ((3 - mmx1[31:0] * (mmx2/mem64[31:0]) - 231)/2);mmx1[63:32] = Compress ((3 - mmx1[63:32] * (mmx2/mem64[63:32]) - 231)/2);where:“Compress” means discard the 8 redundant most-significant bits after the hidden integer bit.ExamplesThe following code sequence shows how the PFRSQRT and PFMUL instructions can be used tocompute a = 1/sqrt (b):X0 = PFRSQRT(b)X1 = PFMUL(X0,X0)X2 = PFRSQIT1(b,X1)a = PFRCPIT2(X2,X0)Related InstructionsPFRCPIT2, PFRSQRTrFLAGS AffectedNone124PFRSQIT1Instruction Reference26569—Rev.
3.08—July 2007AMD64 TechnologyExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe emulate bit (EM) of CR0 was set to 1.XXXThe AMD 3DNow!™ instructions are not supported,as indicated by EDX bit 31 in CPUID function8000_0001h.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXA page fault resulted from the execution of theinstruction.XXAn unmasked x87 floating-point exception waspending.XXAn unaligned memory reference was performed whilealignment checking was enabled.Invalid opcode, #UDGeneral protection, #GPPage fault, #PFx87 floating-pointexception pending, #MFAlignment check, #ACInstruction ReferenceXPFRSQIT1125AMD64 TechnologyPFRSQRT26569—Rev.
3.08—July 2007Packed Floating-Point Reciprocal Square RootApproximationComputes the approximate reciprocal square root of the single-precision floating-point value in thelow-order 32 bits of an MMX register or 64-bit memory location and writes the result in eachdoubleword of another MMX register. The source operand is single-precision with a 24-bitsignificand, and the result is accurate to 15 bits. Negative operands are treated as positive operands forpurposes of reciprocal square-root computation, with the sign of the result the same as the sign of thesource operand.This instruction can be used together with the PFRSQIT1 and PFRCPIT2 instructions to increaseaccuracy.
The first stage of this refinement in accuracy (PFRSQIT1) requires that the input and outputof the previously executed PFRSQRT instruction be used as input to the PFRSQIT1 instruction.The estimate contains the correct round-to-nearest value for approximately 99% of all arguments.
Theremaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-thelast-place (ulp). For details, see the data sheet or other software-optimization documentation relatingto particular hardware implementations.The PFRSQRT instruction is an AMD 3DNow!™ instruction. The presence of this instruction set isindicated by CPUID feature bits. (See “CPUID” in Volume 3.)The numeric range for operands is shown in Table 1-16 on page 127.AMD no longer recommends the use of 3DNow! instructions, which have been superceded by theirmore efficient 128-bit media counterparts. For a complete list of recommended instructionsubstitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” onpage 335.Recommended Instruction SubstitutionRSQRTSSMnemonicPFRSQRT mmx1,mmx2/mem64126Opcode0F 0F /r97DescriptionComputes approximate reciprocal square root of a packedsingle-precision floating-point value.PFRSQRTInstruction Reference26569—Rev.
3.08—July 2007AMD64 Technologymmx163xmm2/mem6432 3163032 310reciprocalsquare rootpfrsqrt.epsTable 1-16. Numeric Range for the PFRCP ResultOperandSource 2Source 1 and Destination0+/– Maximum Normal1NormalNormal1Unsupported2Undefined1Note:1. The result has the same sign as the source operand.2. “Unsupported” means that the exponent is all ones (1s).ExamplesThe following code sequence shows how the PFRSQRT and PFMUL instructions can be used tocompute a = 1/sqrt (b):X0 = PFRSQRT(b)X1 = PFMUL(X0,X0)X2 = PFRSQIT1(b,X1)a = PFRCPIT2(X2,X0)Related InstructionsPFRCPIT2, PFRSQIT1rFLAGS AffectedNoneInstruction ReferencePFRSQRT127AMD64 Technology26569—Rev. 3.08—July 2007ExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe emulate bit (EM) of CR0 was set to 1.XXXThe AMD 3DNow!™ instructions are not supported,as indicated by EDX bit 31 in CPUID function8000_0001h.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXA page fault resulted from the execution of theinstruction.XXAn unmasked x87 floating-point exception waspending.XXAn unaligned memory reference was performed whilealignment checking was enabled.Invalid opcode, #UDGeneral protection, #GPPage fault, #PFx87 floating-pointexception pending, #MFAlignment check, #AC128XPFRSQRTInstruction Reference26569—Rev.
3.08—July 2007AMD64 TechnologyPFSUBPacked Floating-Point SubtractSubtracts each packed single-precision floating-point value in the second source operand from thecorresponding packed single-precision floating-point value in the first source operand and writes theresult of each subtraction in the corresponding doubleword of the destination (first source). The firstsource/destination operand is an MMX register. The second source operand is another MMX registeror 64-bit memory location. The numeric range for operands is shown in Table 1-17 on page 130.The PFSUB instruction is an AMD 3DNow!™ instruction. The presence of this instruction set isindicated by CPUID feature bits.
(See “CPUID” in Volume 3.)AMD no longer recommends the use of 3DNow! instructions, which have been superceded by theirmore efficient 128-bit media counterparts. For a complete list of recommended instructionsubstitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” onpage 335.Recommended Instruction SubstitutionSUBPSMnemonicOpcodePFSUB mmx1, mmx2/mem640F 0F /r9ADescriptionSubtracts packed single-precision floating-point values inan MMX register or 64-bit memory location from packedsingle-precision floating-point values in another MMXregister and writes the result in the destination MMXregister.mmx163mmx2/mem6432 3106332 310subtractsubtractpfsub.epsInstruction ReferencePFSUB129AMD64 Technology26569—Rev.