Volume 1 Application Programming (794095), страница 41
Текст из файла (страница 41)
. . . . . . .127Figure 4-25.0result513-163.epsArithmetic Operation on Vectors of BytesAddition.••••••••PADDB—Packed Add BytesPADDW—Packed Add WordsPADDD—Packed Add DoublewordsPADDQ—Packed Add QuadwordsPADDSB—Packed Add with Saturation BytesPADDSW—Packed Add with Saturation WordsPADDUSB—Packed Add Unsigned with Saturation BytesPADDUSW—Packed Add Unsigned with Saturation WordsThe PADDB, PADDW, PADDD, and PADDQ instructions add each packed 8-bit (PADDB), 16-bit(PADDW), 32-bit (PADDD), or 64-bit (PADDQ) integer element in the second operand to thecorresponding, same-sized integer element in the first operand and write the integer result to thecorresponding, same-sized element of the destination. Figure 4-25 shows a PADDB operation.
Theseinstructions operate on both signed and unsigned integers. However, if the result overflows, the carry isignored and only the low-order byte, word, doubleword, or quadword of each result is written to thedestination. The PADDD instruction can be used together with PMADDWD (page 149) to implementdot products.The PADDSB and PADDSW instructions add each 8-bit (PADDSB) or 16-bit (PADDSW) signedinteger element in the second operand to the corresponding, same-sized signed integer element in thefirst operand and write the signed integer result to the corresponding, same-sized element of thedestination. For each result in the destination, if the result is larger than the largest, or smaller than thesmallest, representable 8-bit (PADDSB) or 16-bit (PADDSW) signed integer, the result is saturated tothe largest or smallest representable value, respectively.146128-Bit Media and Scientific Programming24592—Rev. 3.13—July 2007AMD64 TechnologyThe PADDUSB and PADDUSW instructions perform saturating-add operations analogous to thePADDSB and PADDSW instructions, except on unsigned integer elements.Subtraction.••••••••PSUBB—Packed Subtract BytesPSUBW—Packed Subtract WordsPSUBD—Packed Subtract DoublewordsPSUBQ—Packed Subtract QuadwordPSUBSB—Packed Subtract with Saturation BytesPSUBSW—Packed Subtract with Saturation WordsPSUBUSB—Packed Subtract Unsigned and Saturate BytesPSUBUSW—Packed Subtract Unsigned and Saturate WordsThe subtraction instructions perform operations analogous to the addition instructions.The PSUBB, PSUBW, PSUBD, and PSUBQ instructions subtract each 8-bit (PSUBB), 16-bit(PSUBW), 32-bit (PSUBD), or 64-bit (PSUBQ) integer element in the second operand from thecorresponding, same-sized integer element in the first operand and write the integer result to thecorresponding, same-sized element of the destination.
For vectors of n number of elements, theoperation is:operand1[i] = operand1[i] - operand2[i]where: i = 0 to n – 1These instructions operate on both signed and unsigned integers. However, if the result underflows, theborrow is ignored and only the low-order byte, word, doubleword, or quadword of each result iswritten to the destination.The PSUBSB and PSUBSW instructions subtract each 8-bit (PSUBSB) or 16-bit (PSUBSW) signedinteger element in the second operand from the corresponding, same-sized signed integer element inthe first operand and write the signed integer result to the corresponding, same-sized element of thedestination. For each result in the destination, if the result is larger than the largest, or smaller than thesmallest, representable 8-bit (PSUBSB) or 16-bit (PSUBSW) signed integer, the result is saturated tothe largest or smallest representable value, respectively.The PSUBUSB and PSUBUSW instructions perform saturating-add operations analogous to thePSUBSB and PSUBSW instructions, except on unsigned integer elements.Multiplication.••••PMULHW—Packed Multiply High Signed WordPMULLW—Packed Multiply Low Signed WordPMULHUW—Packed Multiply High Unsigned WordPMULUDQ—Packed Multiply Unsigned Doubleword and Store Quadword128-Bit Media and Scientific Programming147AMD64 Technology24592—Rev.
3.13—July 2007The PMULHW instruction multiplies each 16-bit signed integer value in the first operand by thecorresponding 16-bit integer in the second operand, producing a 32-bit intermediate result. Theinstruction then writes the high-order 16 bits of the 32-bit intermediate result of each multiplication tothe corresponding word of the destination. The PMULLW instruction performs the samemultiplication as PMULHW but writes the low-order 16 bits of the 32-bit intermediate result to thecorresponding word of the destination.Figure 4-26 shows the PMULHW and PMULLW operations. The difference between the two iswhether the high or low half of each intermediate-element result is copied to the destination result.operand 1operand 21270*1270**.255...*0intermediate result.127..result.0513-152.epsFigure 4-26.
PMULxW Multiply OperationThe PMULHUW instruction performs the same multiplication as PMULHW but on unsignedoperands. Without this instruction, it is difficult to perform unsigned integer multiplies using 128-bitmedia instructions. The instruction is useful in 3D rasterization, which operates on unsigned pixelvalues.The PMULUDQ instruction, unlike the other PMULx instructions, preserves the full precision ofresults by multiplying only half of the source-vector elements. It multiplies the 32-bit unsigned integervalues in the first (low-order) and third doublewords of the source operands, writes the full 64-bitresult of the low-order multiply to the low-order doubleword of the destination, and writes acorresponding result of the high-order multiply to the high-order doubleword of the destination.Figure 4-27 on page 149 shows a PMULUDQ operation.148128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007AMD64 Technologyoperand 1operand 21270127*1270*result0513-153.epsFigure 4-27.PMULUDQ Multiply OperationSee “Shift” on page 152 for shift instructions that can be used to perform multiplication and divisionby powers of 2.Multiply-Add. This instruction multiplies the elements of two source vectors and add theirintermediate results in a single operation.•PMADDWD—Packed Multiply Words and Add DoublewordsThe PMADDWD instruction multiplies each 16-bit signed value in the first operand by thecorresponding 16-bit signed value in the second operand.
The instruction then adds the adjacent 32-bitintermediate results of each multiplication, and writes the 32-bit result of each addition into thecorresponding doubleword of the destination. For vectors of n number of source elements (src), mnumber of destination elements (dst), and n = 2m, the operation is:dst[j] = ((src1[i] * src2[i]) + (src1[i+1] * src2[i+1]))where: i = 0 to n – 1i = 2jPMADDWD thus performs four signed multiply-adds in parallel.
Figure 4-28 on page 150 shows theoperation.128-Bit Media and Scientific Programming149AMD64 Technology24592—Rev. 3.13—July 2007operand 1operand 21270*1270**.255+intermediate result..+127.0+result*+0513-154.epsFigure 4-28.PMADDWD Multiply-Add OperationPMADDWD can be used with one source operand (for example, a coefficient) taken from memory andthe other source operand (for example, the data to be multiplied by that coefficient) taken from anXMM register.
The instruction can also be used together with the PADDD instruction (page 146) tocompute dot products. Scaling can be done, before or after the multiply, using a vector-shift instruction(page 152).If all four of the 16-bit source operands used to produce a 32-bit multiply-add result have the value8000h, the result is represented as 8000_0000h, because the maximum negative 16-bit value of 8000hmultiplied by itself equals 4000_0000h, and 4000_0000h added to 4000_0000h equals 8000_0000h.The result of multiplying two negative numbers should be a positive number, but 8000_0000h is themaximum possible 32-bit negative number rather than a positive number.Average.••PAVGB—Packed Average Unsigned BytesPAVGW—Packed Average Unsigned WordsThe PAVGx instructions compute the rounded average of each unsigned 8-bit (PAVGB) or 16-bit(PAVGW) integer value in the first operand and the corresponding, same-sized unsigned integer in thesecond operand and write the result in the corresponding, same-sized element of the destination.