Volume 4 128-Bit Media Instructions (794098), страница 48
Текст из файла (страница 48)
3.09—July 2007RSQRTSSAMD64 TechnologyReciprocal Square Root Scalar Single-PrecisionFloating-PointComputes the approximate reciprocal of the square root of the low-order single-precision floatingpoint value in an XMM register or in a 32-bit memory location and writes the result in the low-orderdoubleword of another XMM register. The three high-order doublewords in the destination XMMregister are not modified. The rounding control bits (RC) in the MXCSR register have no effect on theresult.The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal square root. A sourcevalue that is ±zero or denormal returns an infinity of the source value’s sign.
Negative source valuesother than –zero and –denormal return a QNaN floating-point indefinite value (“Indefinite Values” inVolume 1). For both SNaN and QNaN source operands, a QNaN is returned.The RSQRTSS instruction is an SSE instruction. The presence of this instruction set is indicated by aCPUID feature bit. (See “CPUID” in Volume 3.)MnemonicOpcodeRSQRTSS xmm1, xmm2/mem32F3 0F 52/rDescriptionComputes reciprocal of square root of single-precisionfloating-point value in an XMM register or 32-bit memorylocation and writes the result in the destination XMMregister.xmm1127xmm2/mem3232 31012732 310reciprocalsquare rootrsqrtss.epsRelated InstructionsRSQRTPS, SQRTPD, SQRTPS, SQRTSD, SQRTSSrFLAGS AffectedNoneMXCSR Flags AffectedNoneInstruction ReferenceRSQRTSS369AMD64 Technology26568—Rev. 3.09—July 2007ExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe SSE instructions are not supported, as indicatedby EDX bit 25 of CPUID function 0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR supportbit (OSFXSR) of CR4 was cleared to 0.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.Invalid opcode, #UDGeneral protection, #GPPage fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performed whilealignment checking was enabled.370RSQRTSSInstruction Reference26568—Rev.
3.09—July 2007SHUFPDAMD64 TechnologyShuffle Packed Double-Precision Floating-PointMoves either of the two packed double-precision floating-point values in the first source operand to thelow-order quadword of the destination (first source) and moves either of the two packed doubleprecision floating-point values in the second source operand to the high-order quadword of thedestination.
In each case, the value of the destination quadword is determined by the least-significanttwo bits in the immediate-byte operand, as shown in Table 1-7 on page 371. The firstsource/destination operand is an XMM register. The second source operand is another XMM registeror 128-bit memory location.The SHUFPD instruction is an SSE2 instruction. The presence of this instruction set is indicated by aCPUID feature bit. (See “CPUID” in Volume 3.)MnemonicOpcodeSHUFPD xmm1, xmm2/mem128, imm8DescriptionShuffles packed double-precision floatingpoint values in an XMM register andanother XMM register or 128-bit memorylocation and puts the result in thedestination XMM register.66 0F C6 /r ibxmm1127xmm2/mem12864 63012764 630imm87 0muxmuxshufpd.epsTable 1-7.Immediate-Byte Operand Encoding for SHUFPDDestination BitsFilledImmediate-ByteBit Field63–00127–641Instruction ReferenceValue of BitFieldSource 1 Bits MovedSource 2 Bits Moved063–0—1127–64—0—63–01—127–64SHUFPD371AMD64 Technology26568—Rev.
3.09—July 2007Related InstructionsSHUFPSrFLAGS AffectedNoneMXCSR Flags AffectedNoneExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe SSE2 instructions are not supported, asindicated by EDX bit 26 of CPUID function0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR supportbit (OSFXSR) of CR4 is cleared to 0.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXThe memory operand was not aligned on a 16-byteboundary while MXCSR.MM was cleared to 0.Page fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performed whilealignment checking was enabled withMXCSR.MM set to 1.Invalid opcode, #UDGeneral protection, #GPX372SHUFPDInstruction Reference26568—Rev.
3.09—July 2007SHUFPSAMD64 TechnologyShuffle Packed Single-Precision Floating-PointMoves two of the four packed single-precision floating-point values in the first source operand to thelow-order quadword of the destination (first source) and moves two of the four packed single-precisionfloating-point values in the second source operand to the high-order quadword of the destination. Ineach case, the value of the destination doubleword is determined by a two-bit field in the immediatebyte operand, as shown in Table 1-8 on page 373. The first source/destination operand is an XMMregister. The second source operand is another XMM register or 128-bit memory location.The SHUFPS instruction is an SSE instruction. The presence of this instruction set is indicated by aCPUID feature bit.
(See “CPUID” in Volume 3.)MnemonicOpcodeSHUFPS xmm1, xmm2/mem128,imm8DescriptionShuffles packed single-precision floatingpoint values in an XMM register and anotherXMM register or 128-bit memory location andputs the result in the destination XMMregister.0F C6 /r ibxmm112764 63xmm2/mem128012796 9564 6332 310imm87 0muxmuxshufps.epsTable 1-8.Immediate-Byte Operand Encoding for SHUFPSDestination BitsFilled31–0Instruction ReferenceImmediate-ByteBit Field1–0Value of BitFieldSource 1Bits MovedSource 2Bits Moved031–0—163–32—295–64—3127–96—SHUFPS373AMD64 TechnologyTable 1-8.26568—Rev. 3.09—July 2007Immediate-Byte Operand Encoding for SHUFPSDestination BitsFilledImmediate-ByteBit Field63–32Value of BitFieldSource 1Bits MovedSource 2Bits Moved031–0—163–32—295–64—3127–96—0—31–01—63–322—95–643—127–960—31–01—63–322—95–643—127–963–295–645–4127–967–6Related InstructionsSHUFPDrFLAGS AffectedNoneMXCSR Flags AffectedNoneExceptionsExceptionRealVirtual8086 ProtectedCause of ExceptionXXXThe SSE instructions are not supported, as indicatedby EDX bit 25 of CPUID function 0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR supportbit (OSFXSR) of CR4 was cleared to 0.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.Invalid opcode, #UD374SHUFPSInstruction Reference26568—Rev.
3.09—July 2007ExceptionRealAMD64 TechnologyVirtual8086 ProtectedCause of ExceptionXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXThe memory operand was not aligned on a 16-byteboundary while MXCSR.MM was cleared to 0.Page fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performed whilealignment checking was enabled withMXCSR.MM set to 1.XXGeneral protection, #GPXInstruction ReferenceSHUFPS375AMD64 Technology26568—Rev. 3.09—July 2007SQRTPDSquare Root Packed Double-PrecisionFloating-PointComputes the square root of each of the two packed double-precision floating-point values in an XMMregister or 128-bit memory location and writes the result in the corresponding quadword of anotherXMM register.
Taking the square root of +infinity returns +infinity.The SQRTPD instruction is an SSE2 instruction. The presence of this instruction set is indicated by aCPUID feature bit. (See “CPUID” in Volume 3.)MnemonicOpcodeSQRTPD xmm1,xmm2/mem128DescriptionComputes square roots of packed double-precisionfloating-point values in an XMM register or 128-bitmemory location and writes the result in the destinationXMM register.66 0F 51 /rxmm112764 63xmm2/mem128012764 630square rootsquare rootsqrtpd.epsRelated InstructionsRSQRTPS, RSQRTSS, SQRTPS, SQRTSD, SQRTSSrFLAGS AffectedNone376SQRTPDInstruction Reference26568—Rev.
3.09—July 2007AMD64 TechnologyMXCSR Flags AffectedMMFZRCPMUMOMZMDMIMDAZPEUEOEZEM1715141312111098765432DEIEMM10Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.ExceptionsExceptionInvalid opcode, #UDRealVirtual8086 ProtectedCause of ExceptionXXXThe SSE2 instructions are not supported, asindicated by EDX bit 26 of CPUID function0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR supportbit (OSFXSR) of CR4 was cleared to 0.XXXThere was an unmasked SIMD floating-pointexception while CR4.OSXMMEXCPT was cleared to0.See SIMD Floating-Point Exceptions, below, fordetails.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXThe memory operand was not aligned on a 16-byteboundary while MXCSR.MM was cleared to 0.Page fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performed whilealignment checking was enabled withMXCSR.MM set to 1.XThere was an unmasked SIMD floating-pointexception while CR4.OSXMMEXCPT was set to 1.See SIMD Floating-Point Exceptions, below, fordetails.General protection, #GPXSIMD Floating-PointException, #XFXXSIMD Floating-Point ExceptionsXXXA source operand was an SNaN value.XXXA source operand was negative (not including –0).Denormalized-operandexception (DE)XXXA source operand was a denormal value.Precision exception(PE)XXXA result could not be represented exactly in thedestination format.Invalid-operationexception (IE)Instruction ReferenceSQRTPD377AMD64 Technology26568—Rev.
3.09—July 2007SQRTPSSquare Root Packed Single-PrecisionFloating-PointComputes the square root of each of the four packed single-precision floating-point values in an XMMregister or 128-bit memory location and writes the result in the corresponding doubleword of anotherXMM register. Taking the square root of +infinity returns +infinity.The SQRTPS instruction is an SSE instruction. The presence of this instruction set is indicated by aCPUID feature bit.
(See “CPUID” in Volume 3.)MnemonicOpcodeSQRTPS xmm1,xmm2/mem1280F 51 /rDescriptionComputes square roots of packed single-precisionfloating-point values in an XMM register or 128-bitmemory location and writes the result in the destinationXMM register.xmm112796 9564 63xmm2/mem12832 31012796 9564 6332 310square rootsquare rootsquare rootsquare rootsqrtps.epsRelated InstructionsRSQRTPS, RSQRTSS, SQRTPD, SQRTSD, SQRTSSrFLAGS AffectedNone378SQRTPSInstruction Reference26568—Rev. 3.09—July 2007AMD64 TechnologyMXCSR Flags AffectedMMFZRCPMUMOMZMDMIMDAZPEUEOEZEM1715141312111098765432DEIEMM10Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.ExceptionsExceptionInvalid opcode, #UDRealVirtual8086 ProtectedCause of ExceptionXXXThe SSE instructions are not supported, as indicatedby EDX bit 25 of CPUID function 0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR supportbit (OSFXSR) of CR4 was cleared to 0.XXXThere was an unmasked SIMD floating-pointexception while CR4.OSXMMEXCPT was cleared to0.See SIMD Floating-Point Exceptions, below, fordetails.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.XXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.XXThe memory operand was not aligned on a 16-byteboundary while MXCSR.MM was cleared to 0.Page fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performed whilealignment checking was enabled withMXCSR.MM set to 1.XXThere was an unmasked SIMD floating-pointexception while CR4.OSXMMEXCPT was set to 1.See SIMD Floating-Point Exceptions, below, fordetails.General protection, #GPXSIMD Floating-PointException, #XFXSIMD Floating-Point ExceptionsXXXA source operand was an SNaN value.XXXA source operand was negative (not including –0).Denormalized-operandexception (DE)XXXA source operand was a denormal value.Precision exception(PE)XXXA result could not be represented exactly in thedestination format.Invalid-operationexception (IE)Instruction ReferenceSQRTPS379AMD64 Technology26568—Rev.