Volume 2B Instruction Set Reference N-Z (794102), страница 39
Текст из файла (страница 39)
2B 4-273INSTRUCTION SET REFERENCE, N-Z#GP(0)If the memory address is in a non-canonical form.If memory operand is not aligned on a 16-byte boundary,regardless of segment.#PF(fault-code)For a page fault.#NMIf CR0.TS[bit 3] = 1.#UDIf CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE[bit 25] = 0.If the LOCK prefix is used.4-274 Vol. 2BRSQRTPS—Compute Reciprocals of Square Roots of Packed Single-Precision FloatingPoint ValuesINSTRUCTION SET REFERENCE, N-ZRSQRTSS—Compute Reciprocal of Square Root of Scalar SinglePrecision Floating-Point ValueOpcodeInstructionF3 0F 52 /r RSQRTSS xmm1,xmm2/m3264-BitModeCompat/Leg ModeDescriptionValidValidComputes the approximate reciprocal ofthe square root of the low singleprecision floating-point value inxmm2/m32 and stores the results inxmm1.DescriptionComputes an approximate reciprocal of the square root of the low single-precisionfloating-point value in the source operand (second operand) stores the single-precision floating-point result in the destination operand.
The source operand can be anXMM register or a 32-bit memory location. The destination operand is an XMMregister. The three high-order doublewords of the destination operand remainunchanged. See Figure 10-6 in the Intel® 64 and IA-32 Architectures SoftwareDeveloper’s Manual, Volume 1, for an illustration of a scalar single-precision floatingpoint operation.The relative error for this approximation is:|Relative Error| ≤ 1.5 ∗ 2−12The RSQRTSS instruction is not affected by the rounding control bits in the MXCSRregister. When a source value is a 0.0, an ∞ of the sign of the source value isreturned. A denormal source value is treated as a 0.0 (of the same sign).
When asource value is a negative value (other than −0.0), a floating-point indefinite isreturned. When a source value is an SNaN or QNaN, the SNaN is converted to a QNaNor the source QNaN is returned.In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction toaccess additional registers (XMM8-XMM15).OperationDEST[31:0] ← APPROXIMATE(1.0/SQRT(SRC[31:0]));(* DEST[127:32] unchanged *)Intel C/C++ Compiler Intrinsic EquivalentRSQRTSS __m128 _mm_rsqrt_ss(__m128 a)SIMD Floating-Point ExceptionsNone.RSQRTSS—Compute Reciprocal of Square Root of Scalar Single-Precision Floating-PointValueVol.
2B 4-275INSTRUCTION SET REFERENCE, N-ZProtected Mode Exceptions#GP(0)For an illegal memory operand effective address in the CS, DS,ES, FS or GS segments.#SS(0)For an illegal address in the SS segment.#PF(fault-code)For a page fault.#NMIf CR0.TS[bit 3] = 1.#UDIf CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE[bit 25] = 0.If the LOCK prefix is used.#AC(0)If alignment checking is enabled and an unaligned memoryreference is made while the current privilege level is 3.Real-Address Mode ExceptionsGP(0)If any part of the operand lies outside the effective addressspace from 0 to FFFFH.#NMIf CR0.TS[bit 3] = 1.#UDIf CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE[bit 25] = 0.If the LOCK prefix is used.Virtual-8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)For a page fault.#AC(0)If alignment checking is enabled and an unaligned memoryreference is made.Compatibility Mode ExceptionsSame exceptions as in protected mode.64-Bit Mode Exceptions#SS(0)If a memory address referencing the SS segment is in a noncanonical form.#GP(0)If the memory address is in a non-canonical form.#PF(fault-code)For a page fault.#NMIf CR0.TS[bit 3] = 1.4-276 Vol.
2BRSQRTSS—Compute Reciprocal of Square Root of Scalar Single-Precision Floating-PointValueINSTRUCTION SET REFERENCE, N-Z#UDIf CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE[bit 25] = 0.If the LOCK prefix is used.#AC(0)If alignment checking is enabled and an unaligned memoryreference is made while the current privilege level is 3.RSQRTSS—Compute Reciprocal of Square Root of Scalar Single-Precision Floating-PointValueVol. 2B 4-277INSTRUCTION SET REFERENCE, N-ZSAHF—Store AH into FlagsOpcodeInstruction64-BitModeCompat/Leg ModeDescription9ESAHFInvalid*ValidLoads SF, ZF, AF, PF, and CF from AHinto EFLAGS register.NOTES:* Valid in specific steppings.
See Description section.DescriptionLoads the SF, ZF, AF, PF, and CF flags of the EFLAGS register with values from thecorresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3,and 5 of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in theEFLAGS register remain as shown in the “Operation” section below.This instruction executes as described above in compatibility mode and legacy mode.It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1.OperationIF IA-64 ModeTHENIF CPUID.80000001.ECX[0] = 1;THENRFLAGS(SF:ZF:0:AF:0:PF:1:CF) ← AH;ELSE#UD;FIELSEEFLAGS(SF:ZF:0:AF:0:PF:1:CF) ← AH;FI;Flags AffectedThe SF, ZF, AF, PF, and CF flags are loaded with values from the AH register.
Bits 1, 3,and 5 of the EFLAGS register are unaffected, with the values remaining 1, 0, and 0,respectively.Protected Mode ExceptionsNone.Real-Address Mode ExceptionsNone.4-278 Vol. 2BSAHF—Store AH into FlagsINSTRUCTION SET REFERENCE, N-ZVirtual-8086 Mode ExceptionsNone.Compatibility Mode ExceptionsNone.64-Bit Mode Exceptions#UDIf CPUID.80000001.ECX[0] = 0.If the LOCK prefix is used.SAHF—Store AH into FlagsVol. 2B 4-279INSTRUCTION SET REFERENCE, N-ZSAL/SAR/SHL/SHR—ShiftOpcode***InstructionD0 /4SAL r/m8, 1REX + D0 /4SAL r/m8**, 1D2 /4SAL r/m8, CLREX + D2 /4Compat/Leg ModeDescriptionValidValidMultiply r/m8 by 2, once.ValidN.E.Multiply r/m8 by 2, once.ValidValidMultiply r/m8 by 2, CL times.SAL r/m8**, CLValidN.E.Multiply r/m8 by 2, CL times.C0 /4 ibSAL r/m8, imm8ValidValidMultiply r/m8 by 2, imm8times.REX + C0 /4 ibSAL r/m8**, imm8ValidN.E.Multiply r/m8 by 2, imm8times.D1 /4SAL r/m16, 1ValidValidMultiply r/m16 by 2, once.D3 /4SAL r/m16, CLValidValidMultiply r/m16 by 2, CL times.C1 /4 ibSAL r/m16, imm8ValidValidMultiply r/m16 by 2, imm8times.D1 /4SAL r/m32, 1ValidValidMultiply r/m32 by 2, once.REX.W + D1 /4SAL r/m64, 1ValidN.E.Multiply r/m64 by 2, once.D3 /4SAL r/m32, CLValidValidMultiply r/m32 by 2, CL times.REX.W + D3 /4SAL r/m64, CLValidN.E.Multiply r/m64 by 2, CL times.C1 /4 ibSAL r/m32, imm8ValidValidMultiply r/m32 by 2, imm8times.REX.W + C1 /4 ibSAL r/m64, imm8ValidN.E.Multiply r/m64 by 2, imm8times.D0 /7SAR r/m8, 1ValidValidSigned divide* r/m8 by 2,once.REX + D0 /7SAR r/m8**, 1ValidN.E.Signed divide* r/m8 by 2,once.D2 /7SAR r/m8, CLValidValidSigned divide* r/m8 by 2, CLtimes.REX + D2 /7SAR r/m8**, CLValidN.E.Signed divide* r/m8 by 2, CLtimes.C0 /7 ibSAR r/m8, imm8ValidValidSigned divide* r/m8 by 2,imm8 time.REX + C0 /7 ibSAR r/m8**, imm8ValidN.E.Signed divide* r/m8 by 2,imm8 times.D1 /7SAR r/m16,1ValidValidSigned divide* r/m16 by 2,once.4-280 Vol.
2B64-BitModeSAL/SAR/SHL/SHR—ShiftINSTRUCTION SET REFERENCE, N-ZOpcodeInstruction64-BitModeCompat/Leg ModeDescriptionD3 /7SAR r/m16, CLValidValidSigned divide* r/m16 by 2, CLtimes.C1 /7 ibSAR r/m16, imm8ValidValidSigned divide* r/m16 by 2,imm8 times.D1 /7SAR r/m32, 1ValidValidSigned divide* r/m32 by 2,once.REX.W + D1 /7SAR r/m64, 1ValidN.E.Signed divide* r/m64 by 2,once.D3 /7SAR r/m32, CLValidValidSigned divide* r/m32 by 2, CLtimes.REX.W + D3 /7SAR r/m64, CLValidN.E.Signed divide* r/m64 by 2, CLtimes.C1 /7 ibSAR r/m32, imm8ValidValidSigned divide* r/m32 by 2,imm8 times.REX.W + C1 /7 ibSAR r/m64, imm8ValidN.E.Signed divide* r/m64 by 2,imm8 timesD0 /4SHL r/m8, 1ValidValidMultiply r/m8 by 2, once.REX + D0 /4SHL r/m8**, 1ValidN.E.Multiply r/m8 by 2, once.D2 /4SHL r/m8, CLValidValidMultiply r/m8 by 2, CL times.REX + D2 /4SHL r/m8**, CLValidN.E.Multiply r/m8 by 2, CL times.C0 /4 ibSHL r/m8, imm8ValidValidMultiply r/m8 by 2, imm8times.REX + C0 /4 ibSHL r/m8**, imm8ValidN.E.Multiply r/m8 by 2, imm8times.D1 /4SHL r/m16,1ValidValidMultiply r/m16 by 2, once.D3 /4SHL r/m16, CLValidValidMultiply r/m16 by 2, CL times.C1 /4 ibSHL r/m16, imm8ValidValidMultiply r/m16 by 2, imm8times.D1 /4SHL r/m32,1ValidValidMultiply r/m32 by 2, once.REX.W + D1 /4SHL r/m64,1ValidN.E.Multiply r/m64 by 2, once.D3 /4SHL r/m32, CLValidValidMultiply r/m32 by 2, CL times.REX.W + D3 /4SHL r/m64, CLValidN.E.Multiply r/m64 by 2, CL times.C1 /4 ibSHL r/m32, imm8ValidValidMultiply r/m32 by 2, imm8times.REX.W + C1 /4 ibSHL r/m64, imm8ValidN.E.Multiply r/m64 by 2, imm8times.SAL/SAR/SHL/SHR—ShiftVol.
2B 4-281INSTRUCTION SET REFERENCE, N-ZOpcodeInstruction64-BitModeCompat/Leg ModeDescriptionD0 /5SHR r/m8,1ValidValidUnsigned divide r/m8 by 2,once.REX + D0 /5SHR r/m8**, 1ValidN.E.Unsigned divide r/m8 by 2,once.D2 /5SHR r/m8, CLValidValidUnsigned divide r/m8 by 2, CLtimes.REX + D2 /5SHR r/m8**, CLValidN.E.Unsigned divide r/m8 by 2, CLtimes.C0 /5 ibSHR r/m8, imm8ValidValidUnsigned divide r/m8 by 2,imm8 times.REX + C0 /5 ibSHR r/m8**, imm8ValidN.E.Unsigned divide r/m8 by 2,imm8 times.D1 /5SHR r/m16, 1ValidValidUnsigned divide r/m16 by 2,once.D3 /5SHR r/m16, CLValidValidUnsigned divide r/m16 by 2,CL timesC1 /5 ibSHR r/m16, imm8ValidValidUnsigned divide r/m16 by 2,imm8 times.D1 /5SHR r/m32, 1ValidValidUnsigned divide r/m32 by 2,once.REX.W + D1 /5SHR r/m64, 1ValidN.E.Unsigned divide r/m64 by 2,once.D3 /5SHR r/m32, CLValidValidUnsigned divide r/m32 by 2,CL times.REX.W + D3 /5SHR r/m64, CLValidN.E.Unsigned divide r/m64 by 2,CL times.C1 /5 ibSHR r/m32, imm8ValidValidUnsigned divide r/m32 by 2,imm8 times.REX.W + C1 /5 ibSHR r/m64, imm8ValidN.E.Unsigned divide r/m64 by 2,imm8 times.NOTES:* Not the same form of division as IDIV; rounding is toward negative infinity.** In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix isused: AH, BH, CH, DH.***See IA-32 Architecture Compatibility section below.4-282 Vol.
2BSAL/SAR/SHL/SHR—ShiftINSTRUCTION SET REFERENCE, N-ZDescriptionShifts the bits in the first operand (destination operand) to the left or right by thenumber of bits specified in the second operand (count operand). Bits shifted beyondthe destination operand boundary are first shifted into the CF flag, then discarded.
Atthe end of the shift operation, the CF flag contains the last bit shifted out of the destination operand.The destination operand can be a register or a memory location. The count operandcan be an immediate value or the CL register. The count is masked to 5 bits (or 6 bitsif in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if64-bit mode and REX.W is used).