Volume 2B Instruction Set Reference N-Z (794102), страница 22
Текст из файла (страница 22)
2BPSADBW—Compute Sum of Absolute DifferencesINSTRUCTION SET REFERENCE, N-Z64-Bit Mode Exceptions#SS(0)If a memory address referencing the SS segment is in a noncanonical form.#GP(0)If the memory address is in a non-canonical form.(128-bit operations only) If memory operand is not aligned on a16-byte boundary, regardless of segment.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.PSADBW—Compute Sum of Absolute DifferencesVol.
2B 4-151INSTRUCTION SET REFERENCE, N-ZPSHUFB — Packed Shuffle BytesOpcodeInstruction64-BitModeCompat/Leg ModeDescription0F 38 00 /rPSHUFB mm1,mm2/m64ValidValidShuffle bytes in mm1according to contents ofmm2/m64.66 0F 38 00 /rPSHUFB xmm1,xmm2/m128ValidValidShuffle bytes in xmm1according to contents ofxmm2/m128.DescriptionPSHUFB performs in-place shuffles of bytes in the destination operand (the firstoperand) according to the shuffle control mask in the source operand (the secondoperand). The instruction permutes the data in the destination operand, leaving theshuffle mask unaffected. If the most significant bit (bit[7]) of each byte of the shufflecontrol mask is set, then constant zero is written in the result byte. Each byte in theshuffle control mask forms an index to permute the corresponding byte in the destination operand.
The value of each index is the least significant 4 bits (128-bit operation) or 3 bits (64-bit operation) of the shuffle control byte. Both operands can beMMX register or XMM registers. When the source operand is a 128-bit memoryoperand, the operand must be aligned on a 16-byte boundary or a general-protectionexception (#GP) will be generated.In 64-bit mode, use the REX prefix to access additional registers.OperationPSHUFB with 64 bit operands:for i = 0 to 7 {if (SRC[(i * 8)+7] == 1 ) thenDEST[(i*8)+7...(i*8)+0] ← 0;elseindex[2..0] ← SRC[(i*8)+2 ..
(i*8)+0];DEST[(i*8)+7...(i*8)+0] ← DEST[(index*8+7)..(index*8+0)];endif;}PSHUFB with 128 bit operands:for i = 0 to 15 {if (SRC[(i * 8)+7] == 1 ) thenDEST[(i*8)+7..(i*8)+0] ← 0;4-152 Vol. 2BPSHUFB — Packed Shuffle BytesINSTRUCTION SET REFERENCE, N-Zelseindex[3..0] ← SRC[(i*8)+3 .. (i*8)+0];DEST[(i*8)+7..(i*8)+0] ← DEST[(index*8+7)..(index*8+0)];endif}00++))++++++00++++++))++00++++))++++Figure 4-6.
PSHUB with 64-Bit OperandsIntel C/C++ Compiler Intrinsic EquivalentPSHUFB__m64 _mm_shuffle_pi8 (__m64 a, __m64 b)PSHUFB__m128i _mm_shuffle_epi8 (__m128i a, __m128i b)Protected Mode Exceptions#GP(0)If a memory operand effective address is outside the CS, DS,ES, FS or GS segments.(128-bit operations only) If not aligned on 16-byte boundary,regardless of segment.#SS(0)If a memory operand effective address is outside the SSsegment limit.#PF(fault-code)If a page fault occurs.#UDIf CR0.EM = 1.(128-bit operations only) If CR4.OSFXSR(bit 9) = 0.If CPUID.SSSE3(ECX bit 9) = 0.If the LOCK prefix is used.#NMIf TS bit in CR0 is set.PSHUFB — Packed Shuffle BytesVol. 2B 4-153INSTRUCTION SET REFERENCE, N-Z#MF(64-bit operations only) If there is a pending x87 FPU exception.#AC(0)(64-bit operations only) If alignment checking is enabled andunaligned memory reference is made while the current privilegelevel is 3.Real Mode Exceptions#GP(0)If any part of the operand lies outside of the effective addressspace from 0 to 0FFFFH.(128-bit operations only) If not aligned on 16-byte boundary,regardless of segment.#UDIf CR0.EM = 1.(128-bit operations only) If CR4.OSFXSR(bit 9) = 0.If CPUID.SSSE3(ECX bit 9) = 0.If the LOCK prefix is used.#NMIf TS bit in CR0 is set.#MF(64-bit operations only) If there is a pending x87 FPU exception.Virtual 8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled andunaligned memory reference is made.Compatibility Mode ExceptionsSame as for protected mode exceptions.64-Bit Mode Exceptions#SS(0)If a memory address referencing the SS segment is in a noncanonical form.#GP(0)If the memory address is in a non-canonical form.(128-bit operations only) If memory operand is not aligned on a16-byte boundary, regardless of segment.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:ECX.SSSE3[bit 9] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.4-154 Vol.
2BPSHUFB — Packed Shuffle BytesINSTRUCTION SET REFERENCE, N-Z#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.PSHUFB — Packed Shuffle BytesVol. 2B 4-155INSTRUCTION SET REFERENCE, N-ZPSHUFD—Shuffle Packed DoublewordsOpcodeInstruction64-BitModeCompat/Leg ModeDescription66 0F 70 /r ibPSHUFD xmm1,xmm2/m128, imm8ValidValidShuffle the doublewords inxmm2/m128 based on theencoding in imm8 and storethe result in xmm1.DescriptionCopies doublewords from source operand (second operand) and inserts them in thedestination operand (first operand) at the locations selected with the order operand(third operand). Figure 4-7 shows the operation of the PSHUFD instruction and theencoding of the order operand. Each 2-bit field in the order operand selects thecontents of one doubleword location in the destination operand.
For example, bits 0and 1 of the order operand select the contents of doubleword 0 of the destinationoperand. The encoding of bits 0 and 1 of the order operand (see the field encoding inFigure 4-7) determines which doubleword from the source operand will be copied todoubleword 0 of the destination operand.SRCDESTX3Y3X2Y2X1X0Y1Y0ORDER7 6 5 4 3 2 10Encodingof Fields inORDEROperand00B - X001B - X110B - X211B - X3Figure 4-7. PSHUFD Instruction OperationThe source operand can be an XMM register or a 128-bit memory location.
The destination operand is an XMM register. The order operand is an 8-bit immediate. Notethat this instruction permits a doubleword in the source operand to be copied to morethan one doubleword location in the destination operand.In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction toaccess additional registers (XMM8-XMM15).4-156 Vol. 2BPSHUFD—Shuffle Packed DoublewordsINSTRUCTION SET REFERENCE, N-ZOperationDEST[31:0] ← (SRC >> (ORDER[1:0] ∗ 32))[31:0];DEST[63:32] ← (SRC >> (ORDER[3:2] ∗ 32))[31:0];DEST[95:64] ← (SRC >> (ORDER[5:4] ∗ 32))[31:0];DEST[127:96] ← (SRC >> (ORDER[7:6] ∗ 32))[31:0];Intel C/C++ Compiler Intrinsic EquivalentPSHUFD__m128i _mm_shuffle_epi32(__m128i a, int n)Flags AffectedNone.Numeric ExceptionsNone.Protected Mode Exceptions#GP(0)If a memory operand effective address is outside the CS, DS,ES, FS, or GS segment limit.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#SS(0)#UDIf a memory operand effective address is outside the SSsegment limit.If CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#PF(fault-code)If a page fault occurs.Real-Address Mode Exceptions#GP(0)If a memory operand is not aligned on a 16-byte boundary,regardless of segment.If any part of the operand lies outside of the effective addressspace from 0 to FFFFH.#UDIf CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.PSHUFD—Shuffle Packed DoublewordsVol.
2B 4-157INSTRUCTION SET REFERENCE, N-ZVirtual-8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)For a page fault.Compatibility Mode ExceptionsSame as for protected mode exceptions.64-Bit Mode Exceptions#SS(0)#GP(0)If a memory address referencing the SS segment is in a noncanonical form.If the memory address is in a non-canonical form.If memory operand is not aligned on a 16-byte boundary,regardless of segment.#UDIf CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#PF(fault-code)If a page fault occurs.4-158 Vol.
2BPSHUFD—Shuffle Packed DoublewordsINSTRUCTION SET REFERENCE, N-ZPSHUFHW—Shuffle Packed High WordsOpcodeInstruction64-BitModeCompat/Leg ModeDescriptionF3 0F 70 /r ibPSHUFHW xmm1, xmm2/m128, imm8ValidValidShuffle the high words inxmm2/m128 based on theencoding in imm8 and storethe result in xmm1.DescriptionCopies words from the high quadword of the source operand (second operand) andinserts them in the high quadword of the destination operand (first operand) at wordlocations selected with the order operand (third operand). This operation is similar tothe operation used by the PSHUFD instruction, which is illustrated in Figure 4-7. Forthe PSHUFHW instruction, each 2-bit field in the order operand selects the contentsof one word location in the high quadword of the destination operand.