Volume 1 Basic Architecture (794100), страница 68
Текст из файла (страница 68)
UNPCKHPD Instruction, High Unpack and Interleave OperationVol. 1 11-11PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)DESTX1X0SRCY1Y0DESTY0X0Figure 11-7. UNPCKLPD Instruction, Low Unpack and Interleave Operation11.4.1.6SSE2 Conversion InstructionsSSE2 conversion instructions (see Figure 11-8) support packed and scalar conversions between:•••Double-precision and single-precision floating-point formatsDouble-precision floating-point and doubleword integer formatsSingle-precision floating-point and doubleword integer formatsConversion between double-precision and single-precision floating-pointsvalues — The following instructions convert operands between double-precision andsingle-precision floating-point formats. The operands being operated on arecontained in XMM registers or memory (at most, one operand can reside in memory;the destination is always an MMX register).The CVTPS2PD (convert packed single-precision floating-point values to packeddouble-precision floating-point values) instruction converts two packed singleprecision floating-point values to two double-precision floating-point values.The CVTPD2PS (convert packed double-precision floating-point values to packedsingle-precision floating-point values) instruction converts two packed doubleprecision floating-point values to two single-precision floating-point values.
When aconversion is inexact, the result is rounded according to the rounding mode selectedin the MXCSR register.The CVTSS2SD (convert scalar single-precision floating-point value to scalar doubleprecision floating-point value) instruction converts a single-precision floating-pointvalue to a double-precision floating-point value.The CVTSD2SS (convert scalar double-precision floating-point value to scalar singleprecision floating-point value) instruction converts a double-precision floating-pointvalue to a single-precision floating-point value. When the conversion is inexact, theresult is rounded according to the rounding mode selected in the MXCSR register.11-12 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)2 DoublewordInteger(XMM/mem)CC VTVT PTP D2D DQ2DQCVTDQ2PDCVTSD2SSCVTPD2PSCVTPI2PSD2PPIVTCI2S SISD D2VT SC TTVCD2SSIVTCCCV VTTT PDPD 2P2P II4 DoublewordInteger(XMM/mem)CVTSS2SDCVTPS2PD2 DoublewordInteger(MMX/mem)DoublewordInteger(r32/mem)CVCV TPSTT 2DPS Q2DQS2PPIS2 2PIPT SCV TTPCVQDVTCSI IS2 2SST SSCV TTSCV2SSITCVSingle-PrecisionFloating Point(XMM/mem)Double-PrecisionFloating-Point(XMM/mem)Figure 11-8.
SSE and SSE2 Conversion InstructionsConversion between double-precision floating-point values and doublewordintegers — The following instructions convert operands between double-precisionfloating-point and doubleword integer formats. Operands are housed in XMM registers, MMX registers, general registers or memory (at most one operand can reside inmemory; the destination is always an XMM, MMX, or general register).The CVTPD2PI (convert packed double-precision floating-point values to packeddoubleword integers) instruction converts two packed double-precision floating-pointnumbers to two packed signed doubleword integers, with the result stored in an MMXregister. When rounding to an integer value, the source value is rounded according tothe rounding mode in the MXCSR register. The CVTTPD2PI (convert with truncationpacked double-precision floating-point values to packed doubleword integers)instruction is similar to the CVTPD2PI instruction except that truncation is used toround a source value to an integer value (see Section 4.8.4.2, “Truncation with SSEand SSE2 Conversion Instructions”).The CVTPI2PD (convert packed doubleword integers to packed double-precisionfloating-point values) instruction converts two packed signed doubleword integers totwo double-precision floating-point values.Vol.
1 11-13PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)The CVTPD2DQ (convert packed double-precision floating-point values to packeddoubleword integers) instruction converts two packed double-precision floating-pointnumbers to two packed signed doubleword integers, with the result stored in the lowquadword of an XMM register.
When rounding an integer value, the source value isrounded according to the rounding mode selected in the MXCSR register. TheCVTTPD2DQ (convert with truncation packed double-precision floating-point valuesto packed doubleword integers) instruction is similar to the CVTPD2DQ instructionexcept that truncation is used to round a source value to an integer value (seeSection 4.8.4.2, “Truncation with SSE and SSE2 Conversion Instructions”).The CVTDQ2PD (convert packed doubleword integers to packed double-precisionfloating-point values) instruction converts two packed signed doubleword integerslocated in the low-order doublewords of an XMM register to two double-precisionfloating-point values.The CVTSD2SI (convert scalar double-precision floating-point value to doublewordinteger) instruction converts a double-precision floating-point value to a doublewordinteger, and stores the result in a general-purpose register.
When rounding aninteger value, the source value is rounded according to the rounding mode selectedin the MXCSR register. The CVTTSD2SI (convert with truncation scalar double-precision floating-point value to doubleword integer) instruction is similar to theCVTSD2SI instruction except that truncation is used to round the source value to aninteger value (see Section 4.8.4.2, “Truncation with SSE and SSE2 ConversionInstructions”).The CVTSI2SD (convert doubleword integer to scalar double-precision floating-pointvalue) instruction converts a signed doubleword integer in a general-purpose registerto a double-precision floating-point number, and stores the result in an XMM register.Conversion between single-precision floating-point and doubleword integerformats — These instructions convert between packed single-precision floatingpoint and packed doubleword integer formats.
Operands are housed in XMM registers, MMX registers, general registers, or memory (the latter for at most one sourceoperand). The destination is always an XMM, MMX, or general register. These SSE2instructions supplement conversion instructions (CVTPI2PS, CVTPS2PI, CVTTPS2PI,CVTSI2SS, CVTSS2SI, and CVTTSS2SI) introduced with SSE extensions.The CVTPS2DQ (convert packed single-precision floating-point values to packeddoubleword integers) instruction converts four packed single-precision floating-pointvalues to four packed signed doubleword integers, with the source and destinationoperands in XMM registers or memory (the latter for at most one source operand).When the conversion is inexact, the rounded value according to the rounding modeselected in the MXCSR register is returned.
The CVTTPS2DQ (convert with truncationpacked single-precision floating-point values to packed doubleword integers)instruction is similar to the CVTPS2DQ instruction except that truncation is used toround a source value to an integer value (see Section 4.8.4.2, “Truncation with SSEand SSE2 Conversion Instructions”).The CVTDQ2PS (convert packed doubleword integers to packed single-precisionfloating-point values) instruction converts four packed signed doubleword integers tofour packed single-precision floating-point numbers, with the source and destination11-14 Vol.
1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)operands in XMM registers or memory (the latter for at most one source operand).When the conversion is inexact, the rounded value according to the rounding modeselected in the MXCSR register is returned.11.4.2SSE2 64-Bit and 128-Bit SIMD Integer InstructionsSSE2 extensions add several 128-bit packed integer instructions to the IA-32 architecture. Where appropriate, a 64-bit version of each of these instructions is alsoprovided.
The 128-bit versions of instructions operate on data in XMM registers;64-bit versions operate on data in MMX registers. The instructions follow.The MOVDQA (move aligned double quadword) instruction transfers a double quadword operand from memory to an XMM register or vice versa; or between XMM registers. The memory address must be aligned to a 16-byte boundary; otherwise, ageneral-protection exception (#GP) is generated.The MOVDQU (move unaligned double quadword) instruction performs the sameoperations as the MOVDQA instruction, except that 16-byte alignment of a memoryaddress is not required.The PADDQ (packed quadword add) instruction adds two packed quadword integeroperands or two single quadword integer operands, and stores the results in an XMMor MMX register, respectively. This instruction can operate on either unsigned orsigned (two’s complement notation) integer operands.The PSUBQ (packed quadword subtract) instruction subtracts two packed quadwordinteger operands or two single quadword integer operands, and stores the results inan XMM or MMX register, respectively.