Volume 2B Instruction Set Reference N-Z (794102), страница 17
Текст из файла (страница 17)
PMULHUW and PMULHW Instruction Operation Using 64-bit OperandsOperationPMULHUW instruction with 64-bit operands:TEMP0[31:0] ← DEST[15:0] ∗ SRC[15:0]; (* Unsigned multiplication *)TEMP1[31:0] ← DEST[31:16] ∗ SRC[31:16];TEMP2[31:0] ← DEST[47:32] ∗ SRC[47:32];4-116 Vol. 2BPMULHUW—Multiply Packed Unsigned Integers and Store High ResultINSTRUCTION SET REFERENCE, N-ZTEMP3[31:0] ←DEST[15:0] ←DEST[31:16] ←DEST[47:32] ←DEST[63:48] ←DEST[63:48] ∗ SRC[63:48];TEMP0[31:16];TEMP1[31:16];TEMP2[31:16];TEMP3[31:16];PMULHUW instruction with 128-bit operands:TEMP0[31:0] ← DEST[15:0] ∗ SRC[15:0]; (* Unsigned multiplication *)TEMP1[31:0] ← DEST[31:16] ∗ SRC[31:16];TEMP2[31:0] ← DEST[47:32] ∗ SRC[47:32];TEMP3[31:0] ← DEST[63:48] ∗ SRC[63:48];TEMP4[31:0] ← DEST[79:64] ∗ SRC[79:64];TEMP5[31:0] ← DEST[95:80] ∗ SRC[95:80];TEMP6[31:0] ← DEST[111:96] ∗ SRC[111:96];TEMP7[31:0] ← DEST[127:112] ∗ SRC[127:112];DEST[15:0] ←TEMP0[31:16];DEST[31:16] ← TEMP1[31:16];DEST[47:32] ← TEMP2[31:16];DEST[63:48] ← TEMP3[31:16];DEST[79:64] ← TEMP4[31:16];DEST[95:80] ← TEMP5[31:16];DEST[111:96] ← TEMP6[31:16];DEST[127:112] ← TEMP7[31:16];Intel C/C++ Compiler Intrinsic EquivalentPMULHUW__m64 _mm_mulhi_pu16(__m64 a, __m64 b)PMULHUW__m128i _mm_mulhi_epu16 ( __m128i a, __m128i b)Flags AffectedNone.Numeric ExceptionsNone.Protected Mode Exceptions#GP(0)If a memory operand effective address is outside the CS, DS,ES, FS, or GS segment limit.(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.#SS(0)If a memory operand effective address is outside the SSsegment limit.PMULHUW—Multiply Packed Unsigned Integers and Store High ResultVol.
2B 4-117INSTRUCTION SET REFERENCE, N-Z#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.Real-Address Mode Exceptions#GP(0)(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.If any part of the operand lies outside of the effective addressspace from 0 to FFFFH.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.Virtual-8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)For a page fault.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made.Compatibility Mode ExceptionsSame as for protected mode exceptions.64-Bit Mode Exceptions#SS(0)If a memory address referencing the SS segment is in a noncanonical form.#GP(0)If the memory address is in a non-canonical form.(128-bit operations only) If memory operand is not aligned on a16-byte boundary, regardless of segment.4-118 Vol.
2BPMULHUW—Multiply Packed Unsigned Integers and Store High ResultINSTRUCTION SET REFERENCE, N-Z#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.PMULHUW—Multiply Packed Unsigned Integers and Store High ResultVol.
2B 4-119INSTRUCTION SET REFERENCE, N-ZPMULHW—Multiply Packed Signed Integers and Store High ResultOpcodeInstruction64-BitModeCompat/Leg ModeDescription0F E5 /rPMULHW mm,mm/m64ValidValidMultiply the packed signed wordintegers in mm1 register andmm2/m64, and store the high 16bits of the results in mm1.66 0F E5 /rPMULHW xmm1,xmm2/m128ValidValidMultiply the packed signed wordintegers in xmm1 andxmm2/m128, and store the high 16bits of the results in xmm1.DescriptionPerforms a SIMD signed multiply of the packed signed word integers in the destination operand (first operand) and the source operand (second operand), and storesthe high 16 bits of each intermediate 32-bit result in the destination operand.(Figure 4-3 shows this operation when using 64-bit operands.) The source operandcan be an MMX technology register or a 64-bit memory location, or it can be an XMMregister or a 128-bit memory location.
The destination operand can be an MMX technology register or an XMM register.n 64-bit mode, using a REX prefix in the form of REX.R permits this instruction toaccess additional registers (XMM8-XMM15).OperationPMULHW instruction with 64-bit operands:TEMP0[31:0] ← DEST[15:0] ∗ SRC[15:0]; (* Signed multiplication *)TEMP1[31:0] ← DEST[31:16] ∗ SRC[31:16];TEMP2[31:0] ← DEST[47:32] ∗ SRC[47:32];TEMP3[31:0] ← DEST[63:48] ∗ SRC[63:48];DEST[15:0] ←TEMP0[31:16];DEST[31:16] ← TEMP1[31:16];DEST[47:32] ← TEMP2[31:16];DEST[63:48] ← TEMP3[31:16];PMULHW instruction with 128-bit operands:TEMP0[31:0] ← DEST[15:0] ∗ SRC[15:0]; (* Signed multiplication *)TEMP1[31:0] ← DEST[31:16] ∗ SRC[31:16];TEMP2[31:0] ← DEST[47:32] ∗ SRC[47:32];TEMP3[31:0] ← DEST[63:48] ∗ SRC[63:48];TEMP4[31:0] ← DEST[79:64] ∗ SRC[79:64];TEMP5[31:0] ← DEST[95:80] ∗ SRC[95:80];4-120 Vol.
2BPMULHW—Multiply Packed Signed Integers and Store High ResultINSTRUCTION SET REFERENCE, N-ZTEMP6[31:0] ← DEST[111:96] ∗ SRC[111:96];TEMP7[31:0] ← DEST[127:112] ∗ SRC[127:112];DEST[15:0] ←TEMP0[31:16];DEST[31:16] ← TEMP1[31:16];DEST[47:32] ← TEMP2[31:16];DEST[63:48] ← TEMP3[31:16];DEST[79:64] ← TEMP4[31:16];DEST[95:80] ← TEMP5[31:16];DEST[111:96] ← TEMP6[31:16];DEST[127:112] ← TEMP7[31:16];Intel C/C++ Compiler Intrinsic EquivalentPMULHW__m64 _mm_mulhi_pi16 (__m64 m1, __m64 m2)PMULHW__m128i _mm_mulhi_epi16 ( __m128i a, __m128i b)Flags AffectedNone.Numeric ExceptionsNone.Protected Mode Exceptions#GP(0)If a memory operand effective address is outside the CS, DS,ES, FS, or GS segment limit.(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.#SS(0)If a memory operand effective address is outside the SSsegment limit.#UDIf CR0.EM[bit 2] = 1.128-bit operations will generate #UD only if CR4.OSFXSR[bit 9]= 0.
Execution of 128-bit instructions on a non-SSE2 capableprocessor (one that is MMX technology capable) will result in theinstruction operating on the mm registers, not #UD.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.PMULHW—Multiply Packed Signed Integers and Store High ResultVol. 2B 4-121INSTRUCTION SET REFERENCE, N-ZReal-Address Mode Exceptions#GP(0)(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.If any part of the operand lies outside of the effective addressspace from 0 to FFFFH.#UDIf CR0.EM[bit 2] = 1.128-bit operations will generate #UD only if CR4.OSFXSR[bit 9]= 0. Execution of 128-bit instructions on a non-SSE2 capableprocessor (one that is MMX technology capable) will result in theinstruction operating on the mm registers, not #UD.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.Virtual-8086 Mode ExceptionsSame exceptions as in real address mode.#PF(fault-code)For a page fault.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made.Compatibility Mode ExceptionsSame as for protected mode exceptions.64-Bit Mode Exceptions#SS(0)#GP(0)If a memory address referencing the SS segment is in a noncanonical form.If the memory address is in a non-canonical form.(128-bit operations only) If memory operand is not aligned on a16-byte boundary, regardless of segment.#UDIf CR0.EM[bit 2] = 1.(128-bit operations only) If CR4.OSFXSR[bit 9] = 0.(128-bit operations only) If CPUID.01H:EDX.SSE2[bit 26] = 0.If the LOCK prefix is used.#NMIf CR0.TS[bit 3] = 1.#MF(64-bit operations only) If there is a pending x87 FPU exception.#PF(fault-code)If a page fault occurs.#AC(0)(64-bit operations only) If alignment checking is enabled and anunaligned memory reference is made while the current privilegelevel is 3.4-122 Vol.
2BPMULHW—Multiply Packed Signed Integers and Store High ResultINSTRUCTION SET REFERENCE, N-ZPMULLW—Multiply Packed Signed Integers and Store Low ResultOpcodeInstruction64-BitModeCompat/Leg ModeDescription0F D5 /rPMULLW mm,mm/m64ValidValidMultiply the packed signed wordintegers in mm1 register andmm2/m64, and store the low 16bits of the results in mm1.66 0F D5 /rPMULLW xmm1,xmm2/m128ValidValidMultiply the packed signed wordintegers in xmm1 and xmm2/m128,and store the low 16 bits of theresults in xmm1.DescriptionPerforms a SIMD signed multiply of the packed signed word integers in the destination operand (first operand) and the source operand (second operand), and storesthe low 16 bits of each intermediate 32-bit result in the destination operand.(Figure 4-3 shows this operation when using 64-bit operands.) The source operandcan be an MMX technology register or a 64-bit memory location, or it can be an XMMregister or a 128-bit memory location.
The destination operand can be an MMX technology register or an XMM register.In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction toaccess additional registers (XMM8-XMM15).SRCDESTTEMPZ3 = X3 ∗ Y3DESTX3Y3X2X1Y2Y1Z2 = X2 ∗ Y2Z3[15:0]Z2[15:0]X0Y0Z1 = X1 ∗ Y1Z1[15:0]Z0 = X0 ∗ Y0Z0[15:0]Figure 4-4. PMULLU Instruction Operation Using 64-bit OperandsOperationPMULLW instruction with 64-bit operands:TEMP0[31:0] ← DEST[15:0] ∗ SRC[15:0]; (* Signed multiplication *)TEMP1[31:0] ← DEST[31:16] ∗ SRC[31:16];TEMP2[31:0] ← DEST[47:32] ∗ SRC[47:32];TEMP3[31:0] ← DEST[63:48] ∗ SRC[63:48];PMULLW—Multiply Packed Signed Integers and Store Low ResultVol.
2B 4-123INSTRUCTION SET REFERENCE, N-ZDEST[15:0] ←DEST[31:16] ←DEST[47:32] ←DEST[63:48] ←TEMP0[15:0];TEMP1[15:0];TEMP2[15:0];TEMP3[15:0];PMULLW instruction with 128-bit operands:TEMP0[31:0] ← DEST[15:0] ∗ SRC[15:0]; (* Signed multiplication *)TEMP1[31:0] ← DEST[31:16] ∗ SRC[31:16];TEMP2[31:0] ← DEST[47:32] ∗ SRC[47:32];TEMP3[31:0] ← DEST[63:48] ∗ SRC[63:48];TEMP4[31:0] ← DEST[79:64] ∗ SRC[79:64];TEMP5[31:0] ← DEST[95:80] ∗ SRC[95:80];TEMP6[31:0] ← DEST[111:96] ∗ SRC[111:96];TEMP7[31:0] ← DEST[127:112] ∗ SRC[127:112];DEST[15:0] ←TEMP0[15:0];DEST[31:16] ← TEMP1[15:0];DEST[47:32] ← TEMP2[15:0];DEST[63:48] ← TEMP3[15:0];DEST[79:64] ← TEMP4[15:0];DEST[95:80] ← TEMP5[15:0];DEST[111:96] ← TEMP6[15:0];DEST[127:112] ← TEMP7[15:0];Intel C/C++ Compiler Intrinsic EquivalentPMULLW__m64 _mm_mullo_pi16(__m64 m1, __m64 m2)PMULLW__m128i _mm_mullo_epi16 ( __m128i a, __m128i b)Flags AffectedNone.Numeric ExceptionsNone.Protected Mode Exceptions#GP(0)If a memory operand effective address is outside the CS, DS,ES, FS, or GS segment limit.(128-bit operations only) If a memory operand is not aligned ona 16-byte boundary, regardless of segment.#SS(0)4-124 Vol.