Volume 1 Basic Architecture (794100), страница 66
Текст из файла (страница 66)
It also describes exceptions that can be generated with theSSE and SSE2 instructions and gives guidelines for writing applications with SSE andSSE2 extensions.For additional information about SSE2 extensions, see:•Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volumes2A & 2B, provide a detailed description of individual SSE3 instructions.•Chapter 12, “System Programming for Streaming SIMD Instruction Sets,” in theIntel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A,gives guidelines for integrating the SSE and SSE2 extensions into an operatingsystem environment.11-2 Vol.
1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)11.2SSE2 PROGRAMMING ENVIRONMENTFigure 11-1 shows the programming environment for SSE2 extensions. No newregisters or other instruction execution state are defined with SSE2 extensions. SSE2instructions use the XMM registers, the MMX registers, and/or IA-32 general-purposeregisters, as follows:•XMM registers — These eight registers (see Figure 10-2) are used to operate onpacked or scalar double-precision floating-point data. Scalar operations areoperations performed on individual (unpacked) double-precision floating-pointvalues stored in the low quadword of an XMM register. XMM registers are alsoused to perform operations on 128-bit packed integer data. They are referencedby the names XMM0 through XMM7.Address Space232XMM RegistersEight 128-BitMXCSR Register-132 BitsMMX RegistersEight 64-BitGeneral-PurposeRegistersEight 32-Bit0EFLAGS Register32 BitsFigure 11-1.
Steaming SIMD Extensions 2 Execution Environment•MXCSR register — This 32-bit register (see Figure 10-3) provides status andcontrol bits used in floating-point operations. The denormals-are-zeros andflush-to-zero flags in this register provide a higher performance alternative forthe handling of denormal source operands and denormal (underflow) results.
Formore information on the functions of these flags see Section 10.2.3.4,“Denormals-Are-Zeros,” and Section 10.2.3.3, “Flush-To-Zero.”•MMX registers — These eight registers (see Figure 9-2) are used to performoperations on 64-bit packed integer data. They are also used to hold operands forsome operations performed between MMX and XMM registers. MMX registers arereferenced by the names MM0 through MM7.Vol. 1 11-3PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)•General-purpose registers — The eight general-purpose registers (seeFigure 3-5) are used along with the existing IA-32 addressing modes to addressoperands in memory.
MMX and XMM registers cannot be used to addressmemory. The general-purpose registers are also used to hold operands for someSSE2 instructions. These registers are referenced by the names EAX, EBX, ECX,EDX, EBP, ESI, EDI, and ESP.•EFLAGS register — This 32-bit register (see Figure 3-8) is used to record theresults of some compare operations.11.2.1SSE2 in 64-Bit Mode and Compatibility ModeIn compatibility mode, SSE2 extensions function like they do in protected mode. In64-bit mode, eight additional XMM registers are accessible. Registers XMM8-XMM15are accessed by using REX prefixes.Memory operands are specified using the ModR/M, SIB encoding described in Section3.7.5.Some SSE2 instructions may be used to operate on general-purpose registers. Usethe REX.W prefix to access 64-bit general-purpose registers.
Note that if a REX prefixis used when it has no meaning, the prefix is ignored.11.2.2Compatibility of SSE2 Extensions with SSE, MMXTechnology and x87 FPU Programming EnvironmentSSE2 extensions do not introduce any new state to the IA-32 execution environmentbeyond that of SSE. SSE2 extensions represent an enhancement of SSE extensions;they are fully compatible and share the same state information. SSE and SSE2instructions can be executed together in the same instruction stream without theneed to save state when switching between instruction sets.XMM registers are independent of the x87 FPU and MMX registers; so SSE and SSE2operations performed on XMM registers can be performed in parallel with x87 FPU orMMX technology operations (see Section 11.6.7, “Interaction of SSE/SSE2 Instructions with x87 FPU and MMX Instructions”).The FXSAVE and FXRSTOR instructions save and restore the SSE and SSE2 statesalong with the x87 FPU and MMX states.11.2.3Denormals-Are-Zeros FlagThe denormals-are-zeros flag (bit 6 in the MXCSR register) was introduced into theIA-32 architecture with the SSE2 extensions.
See Section 10.2.3.4, “Denormals-AreZeros,” for a description of this flag.11-4 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)11.3SSE2 DATA TYPESSSE2 extensions introduced one 128-bit packed floating-point data type and four128-bit SIMD integer data types to the IA-32 architecture (see Figure 11-2).•Packed double-precision floating-point — This 128-bit data type consists oftwo IEEE 64-bit double-precision floating-point values packed into a doublequadword.
(See Figure 4-3 for the layout of a 64-bit double-precision floatingpoint value; refer to Section 4.2.2, “Floating-Point Data Types,” for a detaileddescription of double-precision floating-point values.)•128-bit packed integers — The four 128-bit packed integer data types cancontain 16 byte integers, 8 word integers, 4 doubleword integers, or 2 quadwordintegers. (Refer to Section 4.6.2, “128-Bit Packed SIMD Data Types,” for adetailed description of the 128-bit packed integers.)128-Bit Packed DoublePrecision Floating-Point12764 630128-Bit Packed Byte Integers12701270128-Bit Packed Word Integers128-Bit Packed DoublewordIntegers1270128-Bit Packed QuadwordIntegers1270Figure 11-2.
Data Types Introduced with the SSE2 ExtensionsAll of these data types are operated on in XMM registers or memory. Instructions areprovided to convert between these 128-bit data types and the 64-bit and 32-bit datatypes.The address of a 128-bit packed memory operand must be aligned on a 16-byteboundary, except in the following cases:••a MOVUPD instruction which supports unaligned accessesscalar instructions that use an 8-byte memory operand that is not subject toalignment requirementsFigure 4-2 shows the byte order of 128-bit (double quadword) and 64-bit (quadword) data types in memory.Vol. 1 11-5PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)11.4SSE2 INSTRUCTIONSThe SSE2 instructions are divided into four functional groups:•••Packed and scalar double-precision floating-point instructions•Cacheability-control and instruction-ordering instructions64-bit and 128-bit SIMD integer instructions128-bit extensions of SIMD integer instructions introduced with the MMXtechnology and the SSE extensionsThe following sections provide more information about each group.11.4.1Packed and Scalar Double-Precision Floating-PointInstructionsThe packed and scalar double-precision floating-point instructions are divided intothe following sub-groups:••••••Data movement instructionsArithmetic instructionsComparison instructionsConversion instructionsLogical instructionsShuffle instructionsThe packed double-precision floating-point instructions perform SIMD operationssimilarly to the packed single-precision floating-point instructions (see Figure 11-3).Each source operand contains two double-precision floating-point values, and thedestination operand contains the results of the operation (OP) performed in parallelon the corresponding values (X0 and Y0, and X1 and Y1) in each operand.X1Y1X0Y0OPOPX1 OP Y1X0 OP Y0Figure 11-3.
Packed Double-Precision Floating-Point Operations11-6 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)The scalar double-precision floating-point instructions operate on the low (leastsignificant) quadwords of two source operands (X0 and Y0), as shown in Figure 11-4.The high quadword (X1) of the first source operand is passed through to the destination. The scalar operations are similar to the floating-point operations performed inx87 FPU data registers with the precision control field in the x87 FPU control word setfor double precision (53-bit significand), except that x87 stack operations use a15-bit exponent range for the result while SSE2 operations use an 11-bit exponentrange.See Section 11.6.8, “Compatibility of SIMD and x87 FPU Floating-Point Data Types,”for more information about obtaining compatible results when performing bothscalar double-precision floating-point operations in XMM registers and in x87 FPUdata registers.X1Y1X0Y0OPX1X0 OP Y0Figure 11-4.