Volume 2B Instruction Set Reference N-Z (794102), страница 9
Текст из файла (страница 9)
Execution of 128-bit instructions on a non-SSE2 capableprocessor (one that is MMX technology capable) will result in theinstruction operating on the mm registers, not #UD.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.Virtual-8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)For a page fault.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made.Compatibility Mode ExceptionsSame as for protected mode exceptions.64-Bit Mode Exceptions#SS(0)4-58 Vol. 2BIf a memory address referencing the SS segment is in a noncanonical form.PANDN—Logical AND NOTINSTRUCTION SET REFERENCE, N-Z#GP(0)If the memory address is in a non-canonical form.(128-bit operations only) If memory operand is not aligned on a16-byte boundary, regardless of segment.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.PANDN—Logical AND NOTVol.
2B 4-59INSTRUCTION SET REFERENCE, N-ZPAUSE—Spin Loop HintOpcodeInstruction64-BitModeCompat/Leg ModeDescriptionF3 90PAUSEValidValidGives hint to processor that improvesperformance of spin-wait loops.DescriptionImproves the performance of spin-wait loops. When executing a “spin-wait loop,” aPentium 4 or Intel Xeon processor suffers a severe performance penalty when exitingthe loop because it detects a possible memory order violation.
The PAUSE instructionprovides a hint to the processor that the code sequence is a spin-wait loop. Theprocessor uses this hint to avoid the memory order violation in most situations,which greatly improves processor performance. For this reason, it is recommendedthat a PAUSE instruction be placed in all spin-wait loops.An additional function of the PAUSE instruction is to reduce the power consumed bya Pentium 4 processor while executing a spin loop. The Pentium 4 processor canexecute a spin-wait loop extremely quickly, causing the processor to consume a lot ofpower while it waits for the resource it is spinning on to become available. Insertinga pause instruction in a spin-wait loop greatly reduces the processor’s powerconsumption.This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instructionoperates like a NOP instruction.
The Pentium 4 and Intel Xeon processors implementthe PAUSE instruction as a pre-defined delay. The delay is finite and can be zero forsome processors. This instruction does not change the architectural state of theprocessor (that is, it performs essentially a delaying no-op operation).This instruction’s operation is the same in non-64-bit modes and 64-bit mode.OperationExecute_Next_Instruction(DELAY);Numeric ExceptionsNone.Exceptions (All Operating Modes)#UD4-60 Vol. 2BIf the LOCK prefix is used.PAUSE—Spin Loop HintINSTRUCTION SET REFERENCE, N-ZPAVGB/PAVGW—Average Packed IntegersOpcodeInstruction64-BitModeCompat/Leg ModeDescription0F E0 /rPAVGB mm1,mm2/m64ValidValidAverage packed unsigned byteintegers from mm2/m64 and mm1with rounding.66 0F E0, /rPAVGB xmm1,xmm2/m128ValidValidAverage packed unsigned byteintegers from xmm2/m128 and xmm1with rounding.0F E3 /rPAVGW mm1,mm2/m64ValidValidAverage packed unsigned wordintegers from mm2/m64 and mm1with rounding.66 0F E3 /rPAVGW xmm1,xmm2/m128ValidValidAverage packed unsigned wordintegers from xmm2/m128 and xmm1with rounding.DescriptionPerforms a SIMD average of the packed unsigned integers from the source operand(second operand) and the destination operand (first operand), and stores the resultsin the destination operand.
For each corresponding pair of data elements in the firstand second operands, the elements are added together, a 1 is added to the temporary sum, and that result is shifted right one bit position. The source operand can bean MMX technology register or a 64-bit memory location or it can be an XMM registeror a 128-bit memory location. The destination operand can be an MMX technologyregister or an XMM register.The PAVGB instruction operates on packed unsigned bytes and the PAVGW instruction operates on packed unsigned words.In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction toaccess additional registers (XMM8-XMM15).OperationPAVGB instruction with 64-bit operands:SRC[7:0) ← (SRC[7:0) + DEST[7:0) + 1) >> 1; (* Temp sum before shifting is 9 bits *)(* Repeat operation performed for bytes 2 through 6 *)SRC[63:56) ← (SRC[63:56) + DEST[63:56) + 1) >> 1;PAVGW instruction with 64-bit operands:SRC[15:0) ← (SRC[15:0) + DEST[15:0) + 1) >> 1; (* Temp sum before shifting is 17 bits *)(* Repeat operation performed for words 2 and 3 *)SRC[63:48) ← (SRC[63:48) + DEST[63:48) + 1) >> 1;PAVGB/PAVGW—Average Packed IntegersVol.
2B 4-61INSTRUCTION SET REFERENCE, N-ZPAVGB instruction with 128-bit operands:SRC[7:0) ← (SRC[7:0) + DEST[7:0) + 1) >> 1; (* Temp sum before shifting is 9 bits *)(* Repeat operation performed for bytes 2 through 14 *)SRC[63:56) ← (SRC[63:56) + DEST[63:56) + 1) >> 1;PAVGW instruction with 128-bit operands:SRC[15:0) ← (SRC[15:0) + DEST[15:0) + 1) >> 1; (* Temp sum before shifting is 17 bits *)(* Repeat operation performed for words 2 through 6 *)SRC[127:48) ← (SRC[127:112) + DEST[127:112) + 1) >> 1;Intel C/C++ Compiler Intrinsic EquivalentPAVGB__m64 _mm_avg_pu8 (__m64 a, __m64 b)PAVGW__m64 _mm_avg_pu16 (__m64 a, __m64 b)PAVGB__m128i _mm_avg_epu8 ( __m128i a, __m128i b)PAVGW__m128i _mm_avg_epu16 ( __m128i a, __m128i b)Flags AffectedNone.Numeric ExceptionsNone.Protected Mode Exceptions#GP(0)If a memory operand effective address is outside the CS, DS,ES, FS, or GS segment limit.(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.#SS(0)#UDIf a memory operand effective address is outside the SSsegment limit.If CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.4-62 Vol.
2BPAVGB/PAVGW—Average Packed IntegersINSTRUCTION SET REFERENCE, N-ZReal-Address Mode Exceptions#GP(0)(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.If any part of the operand lies outside of the effective addressspace from 0 to FFFFH.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.If the LOCK prefix is used.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.Virtual-8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)For a page fault.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made.Compatibility Mode ExceptionsSame as for protected mode exceptions.64-Bit Mode Exceptions#SS(0)If a memory address referencing the SS segment is in a noncanonical form.#GP(0)If the memory address is in a non-canonical form.(128-bit operations only) If memory operand is not aligned on a16-byte boundary, regardless of segment.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.PAVGB/PAVGW—Average Packed IntegersVol.