Volume 4 128-Bit Media Instructions (794098), страница 22
Текст из файла (страница 22)
3.09—July 2007INSERTQInsert FieldInserts bits from the lower 64 bits of the source operand into the lower 64 bits of the destinationoperand. No other bits in the lower 64 bits of the destination are modified. The upper 64 bits of thedestination are undefined.The least-significant l bits of the source operand are inserted into the destination, with the leastsignificant bit of the source operand inserted at bit position n, where l and n are defined as the fieldlength and bit index, respectively.Bits (field length – 1):0 of the source operand are inserted into bits (bit index + field length – 1):(bitindex) of the destination.
If the sum of the bit index + length field is greater than 64, the results areundefined.For example, if the bit index is 32 (20h) and the field length is 16 (10h), then the result in thedestination register will be source operand[15:0] in bits 47:32. Bits 63:48 and bits 31:0 are notmodified.A value of zero in the field length is defined as a length of 64.
If the length field is 0 and the bit index is0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results areundefined.The bits to insert are located in the XMM2 source operand. The bit index and field length can bespecified as immediate values or can be specified in the XMM source operand. In the immediate form,the bit index and the field length are specified by the fourth (second immediate byte) and thirdoperands (first immediate byte), respectively.
In the register form, the bit index and field length arespecified in bits [77:72] and bits [69:64] of the source XMM register, respectively. The bit index andfield length are each six bits in length; other bits in the field are ignored.Support for the INSERTQ instruction is indicated by ECX bit 6 (SSE4A) as returned by CPUIDfunction 8000_0001h. Software must check the CPUID bit once per program or library initializationbefore using the INSERTQ instruction, or inconsistent behavior may result.MnemonicINSERTQ xmm1, xmm2, imm8,imm8INSERTQ xmm1, xmm2130OpcodeDescriptionF2 0F 78 /r ib ibInsert field starting at bit 0 of xmm2 with the lengthspecified by [5:0] of the first immediate byte.
Thisfield is inserted into xmm1 starting at the bitposition specified by [5:0] of the second immediatebyte.F2 0F 79 /rInsert field starting at bit 0 of xmm2 with the lengthspecified by xmm2[69:64]. This field is inserted intoxmm1 starting at the bit position specified byxmm2[77:72].INSERTQInstruction Reference26568—Rev. 3.09—July 2007AMD64 Technologyxmm2xmm112712764 63firstsecondimm8imm800 7 5 0 7 564 63select number of bits to insert0select bit position for insertxmm1127xmm264 63012777726964 630select number of bits to insertselect bit position for insertRelated InstructionsEXTRQ, PINSRW, PEXTRWrFLAGS AffectedNoneExceptionsExceptionInvalid opcode, #UDDevice not available,#NMInstruction ReferenceRealVirtual8086 ProtectedCause of ExceptionXXXThe SSE4A instructions are not supported, asindicated by ECX bit 6 (SSE4A) of CPUID function8000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR support bit(OSFXSR) of CR4 is cleared to 0.XXXThe task-switch bit (TS) of CR0 was set to 1.INSERTQ131AMD64 Technology26568—Rev.
3.09—July 2007LDDQULoad Unaligned Double QuadwordMoves an unaligned 128-bit (double quadword) value from a 128-bit memory location to a destinationXMM register.Like the MOVUPD instruction, the LDDQU instruction loads a 128-bit operand from an unalignedmemory location. However, to improve performance when the memory operand is actuallymisaligned, LDDQU may read an aligned 16 bytes to get the first part of the operand, and an aligned16 bytes to get the second part of the operand.
This behavior is implementation-specific, and LDDQUmay only read the exact 16 bytes needed for the memory operand. If the memory operand is in amemory range where reading extra bytes can cause performance or functional issues, use theMOVUPD instruction instead of LDDQU.Memory operands that are not aligned on a 16-byte boundary do not cause a general-protectionexception.The LDDQU instruction is an SSE3 instruction.
The presence of this instruction set is indicated by aCPUID feature bit. (See “CPUID” in Volume 3.)MnemonicOpcodeLDDQU xmm1, mem128DescriptionF2 0F F0 /rMoves a 128-bit value from an unaligned 128-bitmemory location to the destination XMM register.mem128xmm112701270copyRelated InstructionsMOVDQUrFLAGS AffectedNoneMXCSR Flags AffectedNone132LDDQUInstruction Reference26568—Rev. 3.09—July 2007AMD64 TechnologyExceptionsRealVirtual8086ProtectedCause of ExceptionXXXThe SSE3 instructions are not supported, as indicatedby ECX bit 0 of CPUID function 0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR support bit(OSFXSR) of CR4 was cleared to 0.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segment limitor was non-canonical.General protection,#GPXXXA memory address exceeded a data segment limit orwas non-canonical.XA null data segment was used to reference memory.ExceptionInvalid opcode, #UDPage fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performed whilealignment checking was enabled.Instruction ReferenceLDDQU133AMD64 Technology26568—Rev.
3.09—July 2007LDMXCSRLoad MXCSR Control/Status RegisterLoads the MXCSR register with a 32-bit value from memory.A general protection exception occurs if the LDMXCSR instruction attempts to load non-zero valuesinto reserved MXCSR bits. Software can use MXCSR_MASK to determine which bits of MXCSR arereserved. For details on the MXCSR_MASK, see “128-Bit, 64-Bit, and x87 Programming” inVolume 2.The MXCSR register is described in “Registers” in Volume 1.The LDMXCSR instruction is an SSE instruction; check the status of EDX bit 25 returned by CPUIDfunction 0000_0001h to verify that the processor supports this function. (See “CPUID” in Volume 3.)MnemonicOpcodeLDMXCSR mem32Description0F AE /2Loads MXCSR register with 32-bit value in memory.Related InstructionsSTMXCSRrFLAGS AffectedNoneMXCSR Flags AffectedMMFZRCMMM171514PMUMOMZMDMIMDAZPEUEOEZEDEIEMMMMMMMMMMMMMM131211109876543210Note: A flag that may be set to one or cleared to zero is M (modified).
Unaffected flags are blank.ExceptionsRealVirtual8086ProtectedXXXThe SSE instructions are not supported, asindicated by EDX bit 25 of CPUID function0000_0001h.XXXThe emulate bit (EM) of CR0 was set to 1.XXXThe operating-system FXSAVE/FXRSTOR supportbit (OSFXSR) of CR4 was cleared to 0.Device not available,#NMXXXThe task-switch bit (TS) of CR0 was set to 1.Stack, #SSXXXA memory address exceeded the stack segmentlimit or was non-canonical.ExceptionInvalid opcode, #UD134Cause of ExceptionLDMXCSRInstruction Reference26568—Rev.
3.09—July 2007AMD64 TechnologyRealVirtual8086ProtectedCause of ExceptionXXXA memory address exceeded a data segment limitor was non-canonical.XA null data segment was used to reference memory.XXOnes were written to the reserved bits in MXCSR.Page fault, #PFXXA page fault resulted from the execution of theinstruction.Alignment check, #ACXXAn unaligned memory reference was performedwhile alignment checking was enabled.ExceptionGeneral protection, #GPXInstruction ReferenceLDMXCSR135AMD64 Technology26568—Rev.
3.09—July 2007MASKMOVDQUMasked Move Double Quadword UnalignedStores bytes from the first source operand as selected by the sign bits in the second source operand(sign-bit is 0 = no write and sign-bit is 1 = write) to a memory location specified in the DS:rDIregisters. The first source operand is an XMM register, and the second source operand is anotherXMM register. The store address may be unaligned.A mask value of all 0s results in the following behavior:•••No data is written to memory.Code and data breakpoints are not guaranteed to be signaled in all implementations.Exceptions associated with memory addressing and page faults are not guaranteed to be signaled inall implementations.MASKMOVDQU implicitly uses weakly-ordered, write-combining buffering for the data, asdescribed in “Buffering and Combining Memory Writes” in Volume 2.