Volume 1 Basic Architecture (794100), страница 62
Текст из файла (страница 62)
XMM registers are referenced by the names XMM0 throughXMM7.Address Space232XMM RegistersEight 128-BitMXCSR Register-132 BitsMMX RegistersEight 64-BitGeneral-PurposeRegistersEight 32-Bit0EFLAGS Register32 BitsFigure 10-1. SSE Execution Environment•MXCSR register — This 32-bit register (see Figure 10-3 and Section 10.2.3,“MXCSR Control and Status Register”) provides status and control bits used inSIMD floating-point operations.•MMX registers — These eight registers (see Figure 9-2) are used to performoperations on 64-bit packed integer data.
They are also used to hold operands forsome operations performed between the MMX and XMM registers. MMX registersare referenced by the names MM0 through MM7.•General-purpose registers — The eight general-purpose registers (seeFigure 3-5) are used along with the existing IA-32 addressing modes to addressoperands in memory. (MMX and XMM registers cannot be used to addressmemory).
The general-purpose registers are also used to hold operands for someVol. 1 10-3PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)SSE instructions and are referenced as EAX, EBX, ECX, EDX, EBP, ESI, EDI, andESP.•EFLAGS register — This 32-bit register (see Figure 3-8) is used to record resultof some compare operations.10.2.1SSE in 64-Bit Mode and Compatibility ModeIn compatibility mode, SSE extensions function like they do in protected mode.
In64-bit mode, eight additional XMM registers are accessible. Registers XMM8-XMM15are accessed by using REX prefixes. Memory operands are specified using theModR/M, SIB encoding described in Section 3.7.5.Some SSE instructions may be used to operate on general-purpose registers. Use theREX.W prefix to access 64-bit general-purpose registers. Note that if a REX prefix isused when it has no meaning, the prefix is ignored.10.2.2XMM RegistersEight 128-bit XMM data registers were introduced into the IA-32 architecture withSSE extensions (see Figure 10-2). These registers can be accessed directly using thenames XMM0 to XMM7; and they can be accessed independently from the x87 FPUand MMX registers and the general-purpose registers (that is, they are not aliased toany other of the processor’s registers).1270XMM7XMM6XMM5XMM4XMM3XMM2XMM1XMM0Figure 10-2. XMM RegistersSSE instructions use the XMM registers only to operate on packed single-precisionfloating-point operands.
SSE2 extensions expand the functions of the XMM registersto operand on packed or scalar double-precision floating-point operands and packed10-4 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)integer operands (see Section 11.2, “SSE2 Programming Environment,” and Section12.1, “SSE3/SSSE3 Programming Environment and Data types”).XMM registers can only be used to perform calculations on data; they cannot be usedto address memory.
Addressing memory is accomplished by using the generalpurpose registers.Data can be loaded into XMM registers or written from the registers to memory in32-bit, 64-bit, and 128-bit increments. When storing the entire contents of an XMMregister in memory (128-bit store), the data is stored in 16 consecutive bytes, withthe low-order byte of the register being stored in the first byte in memory.10.2.3MXCSR Control and Status RegisterThe 32-bit MXCSR register (see Figure 10-3) contains control and status informationfor SSE, SSE2, and SSE3 SIMD floating-point operations. This register contains:•••flag and mask bits for SIMD floating-point exceptions•denormals-are-zeros flag that controls how SIMD floating-point instructionshandle denormal source operandsrounding control field for SIMD floating-point operationsflush-to-zero flag that provides a means of controlling underflow conditions onSIMD floating-point operationsThe contents of this register can be loaded from memory with the LDMXCSR andFXRSTOR instructions and stored in memory with STMXCSR and FXSAVE.Bits 16 through 31 of the MXCSR register are reserved and are cleared on a powerup or reset of the processor; attempting to write a non-zero value to these bits, usingeither the FXRSTOR or LDMXCSR instructions, will result in a general-protectionexception (#GP) being generated.Vol.
1 10-5PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)3116 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0ReservedFZRCP U O Z D I D P U O Z D IAM M M M M ME E E E E EZFlush to ZeroRounding ControlPrecision MaskUnderflow MaskOverflow MaskDivide-by-Zero MaskDenormal Operation MaskInvalid Operation MaskDenormals Are Zeros*Precision FlagUnderflow FlagOverflow FlagDivide-by-Zero FlagDenormal FlagInvalid Operation Flag* The denormals-are-zeros flag was introduced in the Pentium 4 and Intel Xeon processor.Figure 10-3. MXCSR Control/Status Register10.2.3.1SIMD Floating-Point Mask and Flag BitsBits 0 through 5 of the MXCSR register indicate whether a SIMD floating-point exception has been detected. They are “sticky” flags.
That is, after a flag is set, it remainsset until explicitly cleared. To clear these flags, use the LDMXCSR or the FXRSTORinstruction to write zeroes to them.Bits 7 through 12 provide individual mask bits for the SIMD floating-point exceptions.An exception type is masked if the corresponding mask bit is set, and it is unmaskedif the bit is clear. These mask bits are set upon a power-up or reset. This causes allSIMD floating-point exceptions to be initially masked.If LDMXCSR or FXRSTOR clears a mask bit and sets the corresponding exception flagbit, a SIMD floating-point exception will not be generated as a result of this change.The unmasked exception will be generated only upon the execution of the nextSSE/SSE2/SSE3 instruction that detects the unmasked exception condition.For more information about the use of the SIMD floating-point exception mask andflag bits, see Section 11.5, “SSE, SSE2, and SSE3 Exceptions,” and Section 12.8,“SSE3/SSSE3 Exceptions.”10-6 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)10.2.3.2SIMD Floating-Point Rounding Control FieldBits 13 and 14 of the MXCSR register (the rounding control [RC] field) control how theresults of SIMD floating-point instructions are rounded.
See Section 4.8.4,“Rounding,” for a description of the function and encoding of the rounding control bits.10.2.3.3Flush-To-ZeroBit 15 (FZ) of the MXCSR register enables the flush-to-zero mode, which controls themasked response to a SIMD floating-point underflow condition. When the underflowexception is masked and the flush-to-zero mode is enabled, the processor performsthe following operations when it detects a floating-point underflow condition:••Returns a zero result with the sign of the true resultSets the precision and underflow exception flagsIf the underflow exception is not masked, the flush-to-zero bit is ignored.The flush-to-zero mode is not compatible with IEEE Standard 754.
The IEEEmandated masked response to underflow is to deliver the denormalized result (seeSection 4.8.3.2, “Normalized and Denormalized Finite Numbers”). The flush-to-zeromode is provided primarily for performance reasons. At the cost of a slight precisionloss, faster execution can be achieved for applications where underflows are commonand rounding the underflow result to zero can be tolerated.The flush-to-zero bit is cleared upon a power-up or reset of the processor, disablingthe flush-to-zero mode.10.2.3.4Denormals-Are-ZerosBit 6 (DAZ) of the MXCSR register enables the denormals-are-zeros mode, whichcontrols the processor’s response to a SIMD floating-point denormal operand condition. When the denormals-are-zeros flag is set, the processor converts all denormalsource operands to a zero with the sign of the original operand before performing anycomputations on them.
The processor does not set the denormal-operand exceptionflag (DE), regardless of the setting of the denormal-operand exception mask bit(DM); and it does not generate a denormal-operand exception if the exception isunmasked.The denormals-are-zeros mode is not compatible with IEEE Standard 754 (seeSection 4.8.3.2, “Normalized and Denormalized Finite Numbers”). The denormalsare-zeros mode is provided to improve processor performance for applications suchas streaming media processing, where rounding a denormal operand to zero doesnot appreciably affect the quality of the processed data.The denormals-are-zeros flag is cleared upon a power-up or reset of the processor,disabling the denormals-are-zeros mode.The denormals-are-zeros mode was introduced in the Pentium 4 and Intel Xeonprocessor with the SSE2 extensions; however, it is fully compatible with the SSEVol.
1 10-7PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)SIMD floating-point instructions (that is, the denormals-are-zeros flag affects theoperation of the SSE SIMD floating-point instructions). In earlier IA-32 processorsand in some models of the Pentium 4 processor, this flag (bit 6) is reserved. SeeSection 11.6.3, “Checking for the DAZ Flag in the MXCSR Register,” for instructionsfor detecting the availability of this feature.Attempting to set bit 6 of the MXCSR register on processors that do not support theDAZ flag will cause a general-protection exception (#GP). See Section 11.6.6,“Guidelines for Writing to the MXCSR Register,” for instructions for preventing suchgeneral-protection exceptions by using the MXCSR_MASK value returned by theFXSAVE instruction.10.2.4Compatibility of SSE Extensions with SSE2/SSE3/MMX andthe x87 FPUThe state (XMM registers and MXCSR register) introduced into the IA-32 executionenvironment with the SSE extensions is shared with SSE2 and SSE3 extensions.SSE/SSE2/SSE3 instructions are fully compatible; they can be executed together inthe same instruction stream with no need to save state when switching betweeninstruction sets.XMM registers are independent of the x87 FPU and MMX registers, soSSE/SSE2/SSE3 operations performed on the XMM registers can be performed inparallel with operations on the x87 FPU and MMX registers (see Section 11.6.7,“Interaction of SSE/SSE2 Instructions with x87 FPU and MMX Instructions”).The FXSAVE and FXRSTOR instructions save and restore the SSE/SSE2/SSE3 statesalong with the x87 FPU and MMX state.10.3SSE DATA TYPESSSE extensions introduced one data type, the 128-bit packed single-precisionfloating-point data type, to the IA-32 architecture (see Figure 10-4).