Volume 1 Application Programming (794095), страница 54
Текст из файла (страница 54)
In most instructions that taketwo operands, the first (left-most) operand is both a source operand and the destination operand. Thesecond (right-most) operand serves only as a source. Some instructions can have one or more prefixesthat modify default properties, as described in “Instruction Prefixes” on page 228.Mnemonics. The following characters are used as prefixes in the mnemonics of integer instructions:•••••CVT—ConvertCVTT—Convert with truncationP—Packed (vector)PACK—Pack elements of 2x data size to 1x data sizePUNPCK—Unpack and interleave elementsIn addition to the above prefix characters, the following characters are used elsewhere in themnemonics of integer instructions:•••••••••••B—ByteD—DoublewordDQ—Double quadwordID—Integer doublewordIW—Integer wordPD—Packed double-precision floating-pointPI—Packed integerPS—Packed single-precision floating-pointQ—QuadwordS—SignedSS—Signed saturation20864-Bit Media Programming24592—Rev.
3.13—July 2007••••AMD64 TechnologyU—UnsignedUS—Unsigned saturationW—Wordx—One or more variable characters in the mnemonicFor example, the mnemonic for the instruction that packs four words into eight unsigned bytes isPACKUSWB. In this mnemonic, the PACK designates 2x-to-1x conversion of vector elements, the USdesignates unsigned results with saturation, and the WB designates vector elements of the source aswords and those of the result as bytes.5.6.2 Exit Media StateThe exit media state instructions are used to isolate the use of processor resources between 64-bitmedia instructions and x87 floating-point instructions.••EMMS—Exit Media StateFEMMS—Fast Exit Media StateThese instructions initialize the contents of the x87 floating-point stack registers—called clearing theMMX state. Software should execute one of these instructions before leaving a 64-bit media procedure.The EMMS and FEMMS instructions both clear the MMX state, as described in “Mixing Media Codewith x87 Code” on page 233.
The instructions differ in one respect: FEMMS leaves the data in the x87stack registers undefined. By contrast, EMMS leaves the data in each such register as it was defined bythe last x87 or 64-bit media instruction that wrote to the register. The FEMMS instruction is supportedfor backward-compatibility. Software that must be compatible with both AMD and non-AMDprocessors should use the EMMS instruction.5.6.3 Data TransferThe data-transfer instructions copy operands between a 32-bit or 64-bit memory location, an MMXregister, an XMM register, or a GPR. The MOV mnemonic, which stands for move, is a misnomer. Acopy function is actually performed instead of a move.Move• MOVD—Move Doubleword• MOVQ—Move Quadword• MOVDQ2Q—Move Double Quadword to Quadword• MOVQ2DQ—Move Quadword to Double QuadwordThe MOVD instruction copies a 32-bit or 64-bit value from a general-purpose register (GPR) ormemory location to an MMX register, or from an MMX register to a GPR or memory location.
If thesource operand is 32 bits and the destination operand is 64 bits, the source is zero-extended to 64 bitsin the destination. If the source is 64 bits and the destination is 32 bits, only the low-order 32 bits of thesource are copied to the destination.64-Bit Media Programming209AMD64 Technology24592—Rev. 3.13—July 2007The MOVQ instruction copies a 64-bit value from an MMX register or 64-bit memory location toanother MMX register, or from an MMX register to another MMX register or 64-bit memory location.The MOVDQ2Q instruction copies the low-order 64-bit value in an XMM register to an MMXregister.The MOVQ2DQ instruction copies a 64-bit value from an MMX register to the low-order 64 bits of anXMM register, with zero-extension to 128 bits.The MOVD and MOVQ instructions—along with the PUNPCKx instructions—are often among themost frequently used instructions in 64-bit media procedures (both integer and floating-point).
Themove instructions are similar to the assignment operator in high-level languages.Move Non-Temporal. The move non-temporal instructions are called streaming-store instructions.They minimize pollution of the cache. The assumption is that the data they reference will be used onlyonce, and is therefore not subject to cache-related overhead such as write-allocation. For furtherinformation, see “Memory Optimization” on page 92.••MOVNTQ—Move Non-Temporal QuadwordMASKMOVQ—Mask Move QuadwordThe MOVNTQ instruction stores a 64-bit MMX register value into a 64-bit memory location. TheMASKMOVQ instruction stores bytes from the first operand, as selected by the mask value (mostsignificant bit of each byte) in the second operand, to a memory location specified in the rDI and DSregisters. The first operand is an MMX register, and the second operand is another MMX register.
Thesize of the store is determined by the effective address size. Figure 5-11 on page 211 shows theMASKMOVQ operation.21064-Bit Media Programming24592—Rev. 3.13—July 2007AMD64 Technologyoperand 1operand 2630630. . . . . .select. . . . . .selectstore addressmemoryrDI513-133.epsFigure 5-11. MASKMOVQ Move Mask OperationThe MOVNTQ and MASKMOVQ instructions use weakly-ordered, write-combining buffering ofwrite data and they minimize cache pollution. The exact method by which cache pollution isminimized depends on the hardware implementation of the instruction. For further information, see“Memory Optimization” on page 92.A typical case benefitting from streaming stores occurs when data written by the processor is neverread by the processor, such as data written to a graphics frame buffer.
MASKMOVQ is useful for thehandling of end cases in block copies and block fills based on streaming stores.Move Mask• PMOVMSKB—Packed Move Mask ByteThe PMOVMSKB instruction moves the most-significant bit of each byte in an MMX register to thelow-order byte of a 32-bit or 64-bit general-purpose register, with zero-extension. It is useful forextracting bits from a mask, or extracting zero-point values from quantized data such as signalsamples, resulting in a byte that can be used for data-dependent branching.5.6.4 Data ConversionThe integer data-conversion instructions convert operands from integer formats to floating-pointformats. They take 64-bit integer source operands.
For data-conversion instructions that take 32-bitand 64-bit floating-point source operands, see “Data Conversion” on page 224. For data-conversion64-Bit Media Programming211AMD64 Technology24592—Rev. 3.13—July 2007instructions that take 128-bit source operands, see “Data Conversion” on page 139 and “DataConversion” on page 162.Convert Integer to Floating-Point. These instructions convert integer data types into floating-pointdata types.••••CVTPI2PS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-PointCVTPI2PD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-PointPI2FW—Packed Integer To Floating-Point Word ConversionPI2FD—Packed Integer to Floating-Point Doubleword ConversionThe CVTPI2Px instructions convert two 32-bit signed integer values in the second operand (an MMXregister or 64-bit memory location) to two single-precision (CVTPI2PS) or double-precision(CVTPI2PD) floating-point values.
The instructions then write the converted values into the low-order64 bits of an XMM register (CVTPI2PS) or the full 128 bits of an XMM register (CVTPI2PD). TheCVTPI2PS instruction does not modify the high-order 64 bits of the XMM register.The PI2Fx instructions are 3DNow! instructions. They convert two 16-bit (PI2FW) or 32-bit (PI2FD)signed integer values in the second operand to two single-precision floating-point values.
Theinstructions then write the converted values into the destination. If a PI2FD conversion produces aninexact value, the value is truncated (rounded toward zero).5.6.5 Data ReorderingThe integer data-reordering instructions pack, unpack, interleave, extract, insert, shuffle, and swap theelements of vector operands.Pack with Saturation. These instructions pack 2x-sized data types into 1x-sized data types, thushalving the precision of each element in a vector operand.•••PACKSSDW—Pack with Saturation Signed Doubleword to WordPACKSSWB—Pack with Saturation Signed Word to BytePACKUSWB—Pack with Saturation Signed Word to Unsigned ByteThe PACKSSDW instruction converts each 32-bit signed integer in its two source operands (an MMXregister or 64-bit memory location and another MMX register) into a 16-bit signed integer and packsthe converted values into the destination MMX register.
The PACKSSWB instruction does theanalogous operation between word elements in the source vectors and byte elements in the destinationvector. The PACKUSWB instruction does the same as PACKSSWB except that it converts wordintegers into unsigned (rather than signed) bytes.Figure 5-12 on page 213 shows an example of a PACKSSDW instruction. The operation merges vectorelements of 2x size (doubleword-size) into vector elements of 1x size (word-size), thus reducing theprecision of the vector-element data types. Any results that would otherwise overflow or underflow aresaturated (clamped) at the maximum or minimum representable value, respectively, as described in“Saturation” on page 204.21264-Bit Media Programming24592—Rev. 3.13—July 2007AMD64 Technologyoperand 163operand 206363result00513-143.epsFigure 5-12.PACKSSDW Pack OperationConversion from higher-to-lower precision may be needed, for example, after an arithmetic operationwhich requires the higher-precision format to prevent possible overflow, but which requires the lowerprecision format for a subsequent operation.Unpack and Interleave.