Volume 1 Application Programming (794095), страница 55
Текст из файла (страница 55)
These instructions interleave vector elements from the high or low half oftwo source operands. They can be used to double the precision of operands.•••••PUNPCKHBW—Unpack and Interleave High BytesPUNPCKHWD—Unpack and Interleave High WordsPUNPCKHDQ—Unpack and Interleave High DoublewordsPUNPCKLBW—Unpack and Interleave Low BytesPUNPCKLWD—Unpack and Interleave Low Words•PUNPCKLDQ—Unpack and Interleave Low DoublewordsThe PUNPCKHBW instruction unpacks the four high-order bytes from its two source operands andinterleaves them into the bytes in the destination operand.
The bytes in the low-order half of the sourceoperand are ignored. The PUNPCKHWD and PUNPCKHDQ instructions perform analogousoperations for words and doublewords in the source operands, packing them into interleaved wordsand interleaved doublewords in the destination operand.The PUNPCKLBW, PUNPCKLWD, and PUNPCKLDQ instructions are analogous to their highelement counterparts except that they take elements from the low doubleword of each source vectorand ignore elements in the high doubleword.
If the source operand for PUNPCKLx instructions is inmemory, only the low 32 bits of the operand are loaded.Figure 5-13 on page 214 shows an example of the PUNPCKLWD instruction. The elements are takenfrom the low half of the source operands. In this register image, elements from operand2 are placed tothe left of elements from operand1.64-Bit Media Programming213AMD64 Technology24592—Rev.
3.13—July 2007operand 163operand 206363result00513-144.epsFigure 5-13. PUNPCKLWD Unpack and Interleave OperationIf one of the two source operands is a vector consisting of all zero-valued elements, the unpackinstructions perform the function of expanding vector elements of 1x size into vector elements of 2xsize (for example, word-size to doubleword-size). If both source operands are of identical value, theunpack instructions can perform the function of duplicating adjacent elements in a vector.The PUNPCKx instructions—along with MOVD and MOVQ—are among the most frequently usedinstructions in 64-bit media procedures (both integer and floating-point).Extract and Insert. These instructions copy a word element from a vector, in a manner specified byan immediate operand.••PEXTRW—Packed Extract WordPINSRW—Packed Insert WordThe PEXTRW instruction extracts a 16-bit value from an MMX register, as selected by the immediatebyte operand, and writes it to the low-order word of a 32-bit or 64-bit general-purpose register, withzero-extension to 32 or 64 bits.
PEXTRW is useful for loading computed values, such as table-lookupindices, into general-purpose registers where the values can be used for addressing tables in memory.The PINSRW instruction inserts a 16-bit value from a the low-order word of a 32-bit or 64-bit generalpurpose register or a 16-bit memory location into an MMX register. The location in the destinationregister is selected by the immediate-byte operand. The other words in the destination register operandare not modified.Shuffle and Swap. These instructions reorder the elements of a vector.••PSHUFW—Packed Shuffle WordsPSWAPD—Packed Swap Doubleword21464-Bit Media Programming24592—Rev.
3.13—July 2007AMD64 TechnologyThe PSHUFW instruction moves any one of the four words in its second operand (an MMX register or64-bit memory location) to specified word locations in its first operand (another MMX register). Theordering of the shuffle can occur in any of 256 possible ways, as specified by the immediate-byteoperand. Figure 5-14 shows one of the 256 possible shuffle operations. PSHUFW is useful, forexample, in color imaging when computing alpha saturation of RGB values.
In this case, PSHUFWcan replicate an alpha value in a register so that parallel comparisons with three RGB values can beperformed.63operand 106363operand 200result513-126.epsFigure 5-14. PSHUFW Shuffle OperationThe PSWAPD instruction swaps (reverses) the order of two 32-bit values in the second operand andwrites each swapped value in the corresponding doubleword of the destination.
Figure 5-15 shows aswap operation. PSWAPD is useful, for example, in complex-number multiplication in which theelements of one source operand must be swapped (see “Accumulation” on page 226 for details).PSWAPD supports independent source and result operands so that it can also perform a load function.operand 163operand 206363result00513-132.epsFigure 5-15.64-Bit Media ProgrammingPSWAPD Swap Operation215AMD64 Technology24592—Rev. 3.13—July 20075.6.6 ArithmeticThe integer vector-arithmetic instructions perform an arithmetic operation on the elements of twosource vectors. Arithmetic instructions that are not specifically named as unsigned perform signedtwo’s-complement arithmetic.Addition• PADDB—Packed Add Bytes• PADDW—Packed Add Words• PADDD—Packed Add Doublewords• PADDQ—Packed Add Quadwords• PADDSB—Packed Add with Saturation Bytes• PADDSW—Packed Add with Saturation Words••PADDUSB—Packed Add Unsigned with Saturation BytesPADDUSW—Packed Add Unsigned with Saturation WordsThe PADDB, PADDW, PADDD, and PADDQ instructions add each 8-bit (PADDB), 16-bit (PADDW),32-bit (PADDD), or 64-bit (PADDQ) integer element in the second operand to the corresponding,same-sized integer element in the first operand.
The instructions then write the integer result of eachaddition to the corresponding, same-sized element of the destination. These instructions operate onboth signed and unsigned integers. However, if the result overflows, only the low-order byte, word,doubleword, or quadword of each result is written to the destination. The PADDD instruction can beused together with PMADDWD (page 218) to implement dot products.The PADDSB and PADDSW instructions perform additions analogous to the PADDB and PADDWinstructions, except with saturation.
For each result in the destination, if the result is larger than thelargest, or smaller than the smallest, representable 8-bit (PADDSB) or 16-bit (PADDSW) signedinteger, the result is saturated to the largest or smallest representable value, respectively.The PADDUSB and PADDUSW instructions perform saturating additions analogous to the PADDSBand PADDSW instructions, except on unsigned integer elements.Subtraction• PSUBB—Packed Subtract Bytes• PSUBW—Packed Subtract Words• PSUBD—Packed Subtract Doublewords• PSUBQ—Packed Subtract Quadword••••PSUBSB—Packed Subtract with Saturation BytesPSUBSW—Packed Subtract with Saturation WordsPSUBUSB—Packed Subtract Unsigned and Saturate BytesPSUBUSW—Packed Subtract Unsigned and Saturate Words21664-Bit Media Programming24592—Rev. 3.13—July 2007AMD64 TechnologyThe subtraction instructions perform operations analogous to the addition instructions.The PSUBB, PSUBW, PSUBD, and PSUBQ instructions subtract each 8-bit (PSUBB), 16-bit(PSUBW), 32-bit (PSUBD), or 64-bit (PSUBQ) integer element in the second operand from thecorresponding, same-sized integer element in the first operand.
The instructions then write the integerresult of each subtraction to the corresponding, same-sized element of the destination. Theseinstructions operate on both signed and unsigned integers. However, if the result underflows, only thelow-order byte, word, doubleword, or quadword of each result is written to the destination.The PSUBSB and PSUBSW instructions perform subtractions analogous to the PSUBB and PSUBWinstructions, except with saturation.
For each result in the destination, if the result is larger than thelargest, or smaller than the smallest, representable 8-bit (PSUBSB) or 16-bit (PSUBSW) signedinteger, the result is saturated to the largest or smallest representable value, respectively.The PSUBUSB and PSUBUSW instructions perform saturating subtractions analogous to thePSUBSB and PSUBSW instructions, except on unsigned integer elements.Multiplication• PMULHW—Packed Multiply High Signed Word• PMULLW—Packed Multiply Low Signed Word• PMULHRW—Packed Multiply High Rounded Word• PMULHUW—Packed Multiply High Unsigned Word• PMULUDQ—Packed Multiply Unsigned Doubleword and Store QuadwordThe PMULHW instruction multiplies each 16-bit signed integer value in first operand by thecorresponding 16-bit integer in the second operand, producing a 32-bit intermediate result.
Theinstruction then writes the high-order 16 bits of the 32-bit intermediate result of each multiplication tothe corresponding word of the destination. The PMULLW instruction performs the samemultiplication as PMULHW but writes the low-order 16 bits of the 32-bit intermediate result to thecorresponding word of the destination.The PMULHRW instruction performs the same multiplication as PMULHW but with rounding.
Afterthe multiplication, PMULHRW adds 8000h to the lower word of the doubleword result, thus roundingthe high-order word which is returned as the result.The PMULHUW instruction performs the same multiplication as PMULHW but on unsignedoperands.