Volume 1 Application Programming (794095), страница 47
Текст из файла (страница 47)
The high-order quadword of the destination XMM register is notmodified.The MINx and MAXx instructions are useful for clamping (saturating) values, such as color values in3D geometry and rasterization.Compare and Write rFLAGS• COMISS—Compare Ordered Scalar Single-Precision Floating-Point• COMISD—Compare Ordered Scalar Double-Precision Floating-Point• UCOMISS—Unordered Compare Scalar Single-Precision Floating-Point• UCOMISD—Unordered Compare Scalar Double-Precision Floating-PointThe COMISS instruction performs an ordered compare of the single-precision floating-point value inthe low-order 32 bits of the first operand with the single-precision floating-point value in the low-order32 bits of the second operand and sets the zero flag (ZF), parity flag (PF), and carry flag (CF) bits inthe rFLAGS register to reflect the result of the compare. The OF, AF, and SF bits in rFLAGS are set tozero.The COMISD instruction performs an analogous operation on the double-precision floating-pointvalues in the low-order 64 bits of the source operands.
The UCOMISS and UCOMISD instructionsperform an analogous, but unordered, compare operations. Figure 4-37 on page 174 shows a COMISDcompare operation.128-Bit Media and Scientific Programming173AMD64 Technology24592—Rev. 3.13—July 2007operand 1operand 212701270compare063rFLAGS310513-161.epsFigure 4-37. COMISD Compare OperationThe difference between an ordered and unordered comparison has to do with the conditions underwhich a floating-point invalid-operation exception (IE) occurs. In an ordered comparison (COMISS orCOMISD), an IE exception occurs if either of the source operands is either type of NaN (QNaN orSNaN).
In an unordered comparison, the exception occurs only if a source operand is an SNaN. For adescription of NaNs, see “Floating-Point Number Representation” on page 127. For a description ofexceptions, see “Exceptions” on page 177.4.6.7 LogicalThe vector-logic instructions perform Boolean logic operations, including AND, OR, and exclusiveOR.And••••ANDPS—Logical Bitwise AND Packed Single-Precision Floating-PointANDPD—Logical Bitwise AND Packed Double-Precision Floating-PointANDNPS—Logical Bitwise AND NOT Packed Single-Precision Floating-PointANDNPD—Logical Bitwise AND NOT Packed Double-Precision Floating-PointThe ANDPS instruction performs a logical bitwise AND of the four packed single-precision floatingpoint values in the first operand and the corresponding four single-precision floating-point values inthe second operand and writes the result in the destination. The ANDPD instruction performs ananalogous operation on two packed double-precision floating-point values.
The ANDNPS andANDNPD instructions invert the elements of the first source vector (creating a one’s complement ofeach element), AND them with the elements of the second source vector, and write the result to thedestination.Or• ORPS—Logical Bitwise OR Packed Single-Precision Floating-Point174128-Bit Media and Scientific Programming24592—Rev. 3.13—July 2007•AMD64 TechnologyORPD—Logical Bitwise OR Packed Double-Precision Floating-PointThe ORPS instruction performs a logical bitwise OR of four single-precision floating-point values inthe first operand and the corresponding four single-precision floating-point values in the secondoperand and writes the result in the destination.
The ORPD instruction performs an analogousoperation on pairs of two double-precision floating-point values.Exclusive Or• XORPS—Logical Bitwise Exclusive OR Packed Single-Precision Floating-Point• XORPD—Logical Bitwise Exclusive OR Packed Double-Precision Floating-PointThe XORPS instruction performs a logical bitwise exclusive OR of four single-precision floatingpoint values in the first operand and the corresponding four single-precision floating-point values inthe second operand and writes the result in the destination.
The XORPD instruction performs ananalogous operation on pairs of two double-precision floating-point values.4.7Instruction Effects on FlagsThe STMXCSR and LDMXCSR instructions, described in “Save and Restore State” on page 156,read and write flags in the MXCSR register. For a description of the MXCSR register, see “MXCSRRegister” on page 117.The COMISS, COMISD, UCOMISS, and UCOMISD instructions, described in “Compare” onpage 171, write flag bits in the rFLAGS register.
For a description of the rFLAGS register, see “FlagsRegister” on page 33.4.8Instruction PrefixesInstruction prefixes, in general, are described in “Instruction Prefixes” on page 71. The followingrestrictions apply to the use of instruction prefixes with 128-bit media instructions.4.8.1 Supported PrefixesThe following prefixes can be used with 128-bit media instructions:•••Address-Size Override—The 67h prefix affects only operands in memory. The prefix is ignored byall other 128-bit media instructions.Operand-Size Override—The 66h prefix is used to form the opcodes of certain 128-bit mediainstructions. The prefix is ignored by all other 128-bit media instructions.Segment Overrides—The 2Eh (CS), 36h (SS), 3Eh (DS), 26h (ES), 64h (FS), and 65h (GS)prefixes affect only operands in memory.
In 64-bit mode, the contents of the CS, DS, ES, SSsegment registers are ignored.128-Bit Media and Scientific Programming175AMD64 Technology••24592—Rev. 3.13—July 2007REP—The F2 and F3h prefixes do not function as repeat prefixes for 128-bit media instructions.Instead, they are used to form the opcodes of certain 128-bit media instructions. The prefixes areignored by all other 128-bit media instructions.REX—The REX prefixes affect operands that reference a GPR or XMM register when running in64-bit mode.
It allows access to the full 64-bit width of any of the 16 extended GPRs and to any ofthe 16 extended XMM registers. The REX prefix also affects the FXSAVE and FXRSTORinstructions, in which it selects between two types of 512-byte memory-image format, as describedin “Media and x87 Processor State” in Volume 2. The prefix is ignored by all other 128-bit mediainstructions.4.8.2 Special-Use and Reserved PrefixesThe following prefixes are used as opcode bytes in some 128-bit media instructions and are reserved inall other 128-bit media instructions:••Operand-Size Override—The 66h prefix.REP—The F2 and F3h prefixes.4.8.3 Prefixes That Cause ExceptionsThe following prefixes cause an exception:•LOCK—The F0h prefix causes an invalid-opcode exception when used with 128-bit mediainstructions.4.9Feature DetectionBefore executing 128-bit media instructions, software should determine whether the processorsupports the technology by executing the CPUID instruction.
“Feature Detection” on page 74describes how software uses the CPUID instruction to detect feature support. For full support of the128-bit media instructions documented here, the following features require detection:••••••SSE, indicated by EDX bit 25 returned by CPUID function 0000_0001h.SSE2, indicated by EDX bit 26 returned by CPUID function 0000_0001h.SSE3, indicated by ECX bit 0 returned by CPUID function 0000_0001h.SSE4A, indicated by ECX bit 6 returned by CPUID function 8000_0001h.FXSAVE and FXRSTOR, indicated by EDX bit 24 returned by CPUID functions 0000_0001h and8000_0001h.Misaligned SSE memory access mode is indicated by ECX bit 7 returned by CPUID function8000_0001h.
(See “Misaligned Exception Mask (MM)” on page 120 for further details.Software that runs in long mode should also check for the following support:•Long Mode, indicated by bit 29 of CPUID function 8000_0001h.176128-Bit Media and Scientific Programming24592—Rev.
3.13—July 2007AMD64 TechnologySee “Processor Feature Identification” in Volume 2 for a full description of the CPUID instruction andits function codes.In addition, the operating system must support the FXSAVE and FXRSTOR instructions (by havingset CR4.OSFXSR = 1), and it may wish to support SIMD floating-point exceptions (by having setCR4.OSXMMEXCPT = 1). For details, see “System-Control Registers” in Volume 2.4.10ExceptionsTypes of Exceptions. 128-bit media instructions can generate two types of exceptions:••General-Purpose Exceptions, described below in “General-Purpose Exceptions”SIMD Floating-Point Exception, described below in “SIMD Floating-Point Exception Causes” onpage 178Relation to x87 Exceptions. Although the 128-bit media instructions and the x87 floating-pointinstructions each have certain exceptions with the same names, the exception-reporting and exceptionhandling methods used by the two instruction subsets are distinct and independent of each other.