Volume 2A Instruction Set Reference A-M (794101), страница 61
Текст из файла (страница 61)
NaNs on the input sources or computationallygenerated NaNs will have at least one NaN propagated to the destination.OperationIF (imm8[4] = 1)THEN Temp1[63:0] Å DEST[63:0] * SRC[63:0];ELSE Temp1[63:0] Å +0.0; FI;IF (imm8[5] = 1)THEN Temp1[127:64] Å DEST[127:64] * SRC[127:64];3-326 Vol. 2ADPPD — Dot Product of Packed Double Precision Floating-Point ValuesINSTRUCTION SET REFERENCE, A-MELSE Temp1[127:64] Å +0.0; FI;Temp2[63:0] Å Temp1[63:0] + Temp1[127:64];IF (imm8[0] = 1)THEN DEST[63:0] Å Temp2[63:0];ELSE DEST[63:0] Å +0.0; FI;IF (imm8[1] = 1)THEN DEST[127:64] Å Temp2[63:0];ELSE DEST[127:64] Å +0.0; FI;Flags AffectedNoneIntel C/C++ Compiler Intrinsic EquivalentDPPD__m128d _mm_dp_pd ( __m128d a, __m128d b, const int mask);SIMD Floating-Point ExceptionsOverflow, Underflow, Invalid, Precision, DenormalExceptions are determined separately for each add and multiply operation.Unmasked exceptions will leave the destination untouched.Protected Mode Exceptions#GP(0)For an illegal memory operand effective address in the CS, DS,ES, FS, or GS segments.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#SS(0)For an illegal address in the SS segment.#PF(fault-code)For a page fault.#NMIf CR0.TS[bit 3] = 1.#UDIf an unmasked SIMD floating-point exception and OSXMMEXCPT in CR4 is 0.If CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:ECX.SSE4_1[bit 19] = 0.If LOCK prefix is used.Either the prefix REP (F3h) or REPN (F2H) is used.#XMIf an unmasked SIMD floating-point exception and CR4.OSXMMEXCPT[bit 10] = 1.DPPD — Dot Product of Packed Double Precision Floating-Point ValuesVol.
2A 3-327INSTRUCTION SET REFERENCE, A-MReal Mode Exceptions#GP(0)if any part of the operand lies outside of the effective addressspace from 0 to 0FFFFH.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#NMIf CR0.TS[bit 3] = 1.#UDIf an unmasked SIMD floating-point exception and OSXMMEXCPT in CR4 is 0.If CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:ECX.SSE4_1[bit 19] = 0.If LOCK prefix is used.Either the prefix REP (F3h) or REPN (F2H) is used.#XMIf an unmasked SIMD floating-point exception and CR4.OSXMMEXCPT[bit 10] = 1.Virtual 8086 Mode ExceptionsSame exceptions as in Real Address Mode.#PF(fault-code)For a page fault.Compatibility Mode ExceptionsSame exceptions as in Protected Mode.64-Bit Mode Exceptions#GP(0)If the memory address is in a non-canonical form.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#SS(0)If a memory address referencing the SS segment is in a noncanonical form.#PF(fault-code)For a page fault.#NMIf TS in CR0 is set.#UDIf an unmasked SIMD floating-point exception and OSXMMEXCPT in CR4 is 0.If EM in CR0 is set.If OSFXSR in CR4 is 0.If CPUID feature flag ECX.SSE4_1 is 0.If LOCK prefix is used.Either the prefix REP (F3h) or REPN (F2H) is used.3-328 Vol.
2ADPPD — Dot Product of Packed Double Precision Floating-Point ValuesINSTRUCTION SET REFERENCE, A-M#XMIf an unmasked SIMD floating-point exception and CR4.OSXMMEXCPT[bit 10] = 1.DPPD — Dot Product of Packed Double Precision Floating-Point ValuesVol. 2A 3-329INSTRUCTION SET REFERENCE, A-MDPPS — Dot Product of Packed Single Precision Floating-Point ValuesOpcodeInstructionOp/En64-BitModeCompat/ DescriptionLeg Mode66 0F 3A 40 /ribDPPS xmm1,xmm2/m128,imm8AValidValidSelectively multiply packedSP floating-point valuesfrom xmm1 with packed SPfloating-point values fromxmm2, add and selectivelystore the packed SPfloating-point values or zerovalues to xmm1.Instruction Operand EncodingOp/EnOperand 1Operand 2Operand 3Operand 4AModRM:reg (r, w)ModRM:r/m (r)imm8NADescriptionConditionally multiplies the packed single precision floating-point values in the destination operand (first operand) with the packed single-precision floats in the source(second operand) depending on a mask extracted from the high 4 bits of the immediate byte (third operand).
If a condition mask bit in Imm8[7:4] is zero, the corresponding multiplication is replaced by a value of 0.0.The four resulting single-precision values are summed into an intermediate result.The intermediate result is conditionally broadcasted to the destination using a broadcast mask specified by bits [3:0] of the immediate byte..If a broadcast mask bit is "1", the intermediate result is copied to the correspondingdword element in the destination operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero.DPPS follows the NaN forwarding rules stated in the Software Developer’s Manual,vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs.
Horizontalpropagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationallygenerated NaNs will have at least one NaN propagated to the destination.OperationIF (imm8[4] == 1)THEN Temp1[31:0] Å DEST[31:0] * SRC[31:0];ELSE Temp1[31:0] Å +0.0; FI;IF (imm8[5] == 1)THEN Temp1[63:32] Å DEST[63:32] * SRC[63:32];3-330 Vol.
2ADPPS — Dot Product of Packed Single Precision Floating-Point ValuesINSTRUCTION SET REFERENCE, A-MELSE Temp1[63:32] Å +0.0; FI;IF (imm8[6] == 1)THEN Temp1[95:64] Å DEST[95:64] * SRC[95:64];ELSE Temp1[95:64] Å +0.0; FI;IF (imm8[7] == 1)THEN Temp1[127:96] Å DEST[127:96] * SRC[127:96];ELSE Temp1[127:96] Å +0.0; FI;Temp2[31:0] Å Temp1[31:0] + Temp1[63:32];Temp3[31:0] Å Temp1[95:64] + Temp1[127:96];Temp4[31:0] Å Temp2[31:0] + Temp3[31:0];IF (imm8[0] == 1)THEN DEST[31:0] Å Temp4[31:0];ELSE DEST[31:0] Å +0.0; FI;IF (imm8[1] == 1)THEN DEST[63:32] Å Temp4[31:0];ELSE DEST[63:32] Å +0.0; FI;IF (imm8[2] == 1)THEN DEST[95:64] Å Temp4[31:0];ELSE DEST[95:64] Å +0.0; FI;IF (imm8[3] == 1)THEN DEST[127:96] Å Temp4[31:0];ELSE DEST[127:96] Å +0.0; FI;Intel C/C++ Compiler Intrinsic EquivalentDPPS__m128 _mm_dp_ps ( __m128 a, __m128 b, const int mask);SIMD Floating-Point ExceptionsOverflow, Underflow, Invalid, Precision, DenormalExceptions are determined separately for each add and multiply operation, in theorder of their execution.
Unmasked exceptions will leave the destination operandsunchanged.Protected Mode Exceptions#GP(0)For an illegal memory operand effective address in the CS, DS,ES, FS, or GS segments.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#SS(0)For an illegal address in the SS segment.#PF(fault-code)For a page fault.DPPS — Dot Product of Packed Single Precision Floating-Point ValuesVol. 2A 3-331INSTRUCTION SET REFERENCE, A-M#NMIf CR0.TS[bit 3] = 1.#UDIf an unmasked SIMD floating-point exception and OSXMMEXCPT in CR4 is 0.If CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:ECX.SSE4_1[bit 19] = 0.If LOCK prefix is used.Either the prefix REP (F3h) or REPN (F2H) is used.#XMIf an unmasked SIMD floating-point exception and CR4.OSXMMEXCPT[bit 10] = 1.Real Mode Exceptions#GP(0)if any part of the operand lies outside of the effective addressspace from 0 to 0FFFFH.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#NMIf CR0.TS[bit 3] = 1.#UDIf an unmasked SIMD floating-point exception and OSXMMEXCPT in CR4 is 0.If CR0.EM[bit 2] = 1.If CR4.OSFXSR[bit 9] = 0.If CPUID.01H:ECX.SSE4_1[bit 19] = 0.If LOCK prefix is used.Either the prefix REP (F3h) or REPN (F2H) is used.#XMIf an unmasked SIMD floating-point exception and CR4.OSXMMEXCPT[bit 10] = 1.Virtual 8086 Mode ExceptionsSame exceptions as in Real Address Mode.#PF(fault-code)For a page fault.Compatibility Mode ExceptionsSame exceptions as in Protected Mode.64-Bit Mode Exceptions#GP(0)If the memory address is in a non-canonical form.If a memory operand is not aligned on a 16-byte boundary,regardless of segment.#SS(0)3-332 Vol.
2AIf a memory address referencing the SS segment is in a noncanonical form.DPPS — Dot Product of Packed Single Precision Floating-Point ValuesINSTRUCTION SET REFERENCE, A-M#PF(fault-code)For a page fault.#NMIf TS in CR0 is set.#UDIf an unmasked SIMD floating-point exception and OSXMMEXCPT in CR4 is 0.If EM in CR0 is set.If OSFXSR in CR4 is 0.If CPUID feature flag ECX.SSE4_1 is 0.If LOCK prefix is used.Either the prefix REP (F3h) or REPN (F2H) is used.#XMIf an unmasked SIMD floating-point exception and CR4.OSXMMEXCPT[bit 10] = 1.DPPS — Dot Product of Packed Single Precision Floating-Point ValuesVol.
2A 3-333INSTRUCTION SET REFERENCE, A-MEMMS—Empty MMX Technology StateOpcodeInstructionOp/En64-BitModeCompat/ DescriptionLeg Mode0F 77EMMSAValidValidSet the x87 FPU tag wordto empty.Instruction Operand EncodingOp/EnOperand 1Operand 2Operand 3Operand 4ANANANANADescriptionSets the values of all the tags in the x87 FPU tag word to empty (all 1s). This operation marks the x87 FPU data registers (which are aliased to the MMX technologyregisters) as available for use by x87 FPU floating-point instructions.
(See Figure 8-7in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, forthe format of the x87 FPU tag word.) All other MMX instructions (other than theEMMS instruction) set all the tags in x87 FPU tag word to valid (all 0s).The EMMS instruction must be used to clear the MMX technology state at the end ofall MMX technology procedures or subroutines and before calling other procedures orsubroutines that may execute x87 floating-point instructions.
If a floating-pointinstruction loads one of the registers in the x87 FPU data register stack before thex87 FPU tag word has been reset by the EMMS instruction, an x87 floating-pointregister stack overflow can occur that will result in an x87 floating-point exception orincorrect result.EMMS operation is the same in non-64-bit modes and 64-bit mode.Operationx87FPUTagWord ← FFFFH;Intel C/C++ Compiler Intrinsic Equivalentvoid _mm_empty()Flags AffectedNone.Protected Mode Exceptions#UDIf CR0.EM[bit 2] = 1.#NMIf CR0.TS[bit 3] = 1.3-334 Vol.