Volume 1 Application Programming (794095), страница 39
Текст из файла (страница 39)
3.13—July 2007••••••••AMD64 TechnologyS—Signed, or Saturation, or ShiftSD—Scalar double-precision floating-pointSI—Signed integerSS—Scalar single-precision floating-point, or Signed saturationU—Unsigned, or Unordered, or UnalignedUS—Unsigned saturationW—Wordx—One or more variable characters in the mnemonicFor example, the mnemonic for the instruction that packs four words into eight unsigned bytes isPACKUSWB. In this mnemonic, the US designates an unsigned result with saturation, and the WBdesignates the source as words and the result as bytes.4.5.2 Data TransferThe data-transfer instructions copy operands between a memory location, an XMM register, an MMXregister, or a GPR.
The MOV mnemonic, which stands for move, is a misnomer. A copy function isactually performed instead of a move. A new copy of the source value is created at the destinationaddress, and the original copy remains unchanged at its source location.Move.•••••••MOVD—Move Doubleword or QuadwordMOVQ—Move QuadwordMOVDQA—Move Aligned Double QuadwordMOVDQU—Move Unaligned Double QuadwordMOVDQ2Q—Move Quadword to QuadwordMOVQ2DQ—Move Quadword to QuadwordLDDQU—Load Double Quadword UnalignedThe MOVD instruction copies a 32-bit or 64-bit value from a GPR register or memory location to thelow-order 32 or 64 bits of an XMM register, or from the low-order 32 or 64 bits of an XMM register toa 32-bit or 64-bit GPR or memory location.
If the source operand is a GPR or memory location, thesource is zero-extended to 128 bits in the XMM register. If the source is an XMM register, only thelow-order 32 or 64 bits of the source are copied to the destination.The MOVQ instruction copies a 64-bit value from memory to the low quadword of an XMM register,or from the low quadword of an XMM register to memory, or between the low quadwords of twoXMM registers. If the source is in memory and the destination is an XMM register, the source is zeroextended to 128 bits in the XMM register.The MOVDQA instruction copies a 128-bit value from memory to an XMM register, or from an XMMregister to memory, or between two XMM registers.
If either the source or destination is a memory128-Bit Media and Scientific Programming135AMD64 Technology24592—Rev. 3.13—July 2007location, the memory address must be aligned. The MOVDQU instruction does the same, except forunaligned operands. The LDDQU instruction is virtually identical in operation to the MOVDQUinstruction. The LDDQU instruction moves a double quadword of data from a 128-bit memoryoperand into a destination XMM register.The MOVDQ2Q instruction copies the low-order 64-bit value in an XMM register to an MMXregister. The MOVQ2DQ instruction copies a 64-bit value from an MMX register to the low-order 64bits of an XMM register, with zero-extension to 128 bits.Figure 4-17 on page 137 shows the capabilities of the various integer move instructions. Theseinstructions move large amounts of data.
When copying between XMM registers, or between an XMMregister and memory, a move instruction can copy up to 16 bytes of data. When copying between anXMM register and an MMX or GPR register, a move instruction can copy up to 8 bytes of data. TheMOVx instructions—along with the PUNPCKx instructions—are often among the most frequentlyused instructions in 128-bit media integer and floating-point procedures.The move instructions are in many respects similar to the assignment operator in high-level languages.The simplest example of their use is for initializing variables. To initialize a register to 0, however,rather than using a MOVx instruction it may be more efficient to use the PXOR instruction withidentical destination and source operands.Move Non-Temporal.
The move non-temporal instructions are streaming-store instructions. Theyminimize pollution of the cache.••MOVNTDQ—Move Non-Temporal Double QuadwordMASKMOVDQU—Masked Move Double Quadword UnalignedThe MOVNTDQ instruction stores its second operand (a 128-bit XMM register value) into its firstoperand (a 128-bit memory location). MOVNTDQ indicates to the processor that its data is nontemporal, which assumes that the referenced data will be used only once and is therefore not subject tocache-related overhead (as opposed to temporal data, which assumes that the data will be accessedagain soon and should be cached).
The non-temporal instructions use weakly-ordered, writecombining buffering of write data, and they minimize cache pollution. The exact method by whichcache pollution is minimized depends on the hardware implementation of the instruction. For furtherinformation, see “Memory Optimization” on page 92.136128-Bit Media and Scientific Programming24592—Rev. 3.13—July 20070MOVDQAMOVDQU127XMM Register or Memory(source)0memory127XMM Register(destination)AMD64 TechnologyMOVQ0memory127XMM Register or Memory(destination)MOVDQAMOVDQU127XMM Register(source)0MOVQGPR Register or Memory(destination)0MOVDXMM Register(destination)GPR Register or Memory(source)0630memory1271270memory63XMM Register(source)MOVDMMXTM Register(destination)631270XMM Register(source)0MOVDQ2Q127XMM Register(destination)063MMX Register(source)0MOVQ2DQ513-173.epsFigure 4-17.
Integer Move OperationsMASKMOVDQU is also a non-temporal instruction. It stores bytes from the first operand, as selectedby the mask value in the second operand (0 = no write and 1 = write), to a memory location specifiedin the rDI and DS registers. The first and second operands are both XMM registers. The address may128-Bit Media and Scientific Programming137AMD64 Technology24592—Rev. 3.13—July 2007be unaligned. Figure 4-18 shows the MASKMOVDQU operation. It is useful for the handling of endcases in block copies and block fills based on streaming stores.operand 1operand 212701270. .
. . . . . . . . . . . .select. . . . . . . . . . . . . .selectstore addressmemoryrDI513-148.epsFigure 4-18. MASKMOVDQU Move Mask OperationMove Mask.•PMOVMSKB—Packed Move Mask ByteThe PMOVMSKB instruction moves the most-significant bit of each byte in an XMM register to thelow-order word of a 32-bit or 64-bit general-purpose register, with zero-extension. The instruction isuseful for extracting bits from mask patterns, or zero values from quantized data, or sign bits—resulting in a byte that can be used for data-dependent branching. Figure 4-19 on page 139 shows thePMOVMSKB operation.138128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007AMD64 TechnologyGPR127XMM00concatenate 16 most-significant bits513-157..epsFigure 4-19. PMOVMSKB Move Mask Operation4.5.3 Data ConversionThe integer data-conversion instructions convert integer operands to floating-point operands. Theseinstructions take 128-bit integer source operands. For data-conversion instructions that take 128-bitfloating-point source operands, see “Data Conversion” on page 162. For data-conversion instructionsthat take 64-bit source operands, see “Data Conversion” on page 211 and “Data Conversion” onpage 224.Convert Integer to Floating-Point. These instructions convert integer data types in XMM registersor memory into floating-point data types in XMM registers.••CVTDQ2PS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-PointCVTDQ2PD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-PointThe CVTDQ2PS instruction converts four 32-bit signed integer values in the second operand to foursingle-precision floating-point values and writes the converted values in another XMM register.
If theresult of the conversion is an inexact value, the value is rounded. The CVTDQ2PD instruction isanalogous to CVTDQ2PS except that it converts two 64-bit signed integer values to two doubleprecision floating-point values.Convert MMX Integer to Floating-Point. These instructions convert integer data types in MMXregisters or memory into floating-point data types in XMM registers.••CVTPI2PS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-PointCVTPI2PD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-PointThe CVTPI2PS instruction converts two 32-bit signed integer values in an MMX register or a 64-bitmemory location to two single-precision floating-point values and writes the converted values in thelow-order 64 bits of an XMM register.
The high-order 64 bits of the XMM register are not modified.The CVTPI2PD instruction is analogous to CVTPI2PS except that it converts two 32-bit signedinteger values to two double-precision floating-point values and writes the converted values in the full128 bits of an XMM register.128-Bit Media and Scientific Programming139AMD64 Technology24592—Rev. 3.13—July 2007Before executing a CVTPI2x instruction, software should ensure that the MMX registers are properlyinitialized so as to prevent conflict with their aliased use by x87 floating-point instructions.
This mayrequire clearing the MMX state, as described in “Accessing Operands in MMX™ Registers” onpage 188.For a description of 128-bit media instructions that convert in the opposite direction—floating-point tointeger in MMX registers—see “Convert Floating-Point to MMX™ Integer” on page 163. For asummary of instructions that operate on MMX registers, see Chapter 5, “64-Bit Media Programming.”Convert GPR Integer to Floating-Point. These instructions convert integer data types in GPRregisters or memory into floating-point data types in XMM registers.••CVTSI2SS—Convert Signed Doubleword or Quadword Integer to Scalar Single-PrecisionFloating-PointCVTSI2SD—Convert Signed Doubleword or Quadword Integer to Scalar Double-PrecisionFloating-PointThe CVTSI2SS instruction converts a 32-bit or 64-bit signed integer value in a general-purposeregister or memory location to a single-precision floating-point value and writes the converted value inthe low-order 32 bits of an XMM register.