Volume 1 Basic Architecture (794100), страница 72
Текст из файла (страница 72)
Because SSE andSSE2 extensions share the same state and perform companion operations, theseguidelines apply to both sets of extensions.Chapter 12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual,Volume 3A, discusses the interface to the processor for context switching as well asother operating system considerations when writing code that uses SSE/SSE2/SSE3extensions.11.6.1General Guidelines for Using SSE/SSE2 ExtensionsThe following guidelines describe how to take full advantage of the performancegains available with the SSE and SSE2 extensions:••Ensure that the processor supports the SSE and SSE2 extensions.Ensure that your operating system supports the SSE and SSE2 extensions.(Operating system support for the SSE extensions implies support for SSE2extension and vice versa.)1. SSE3 refers to ADDSUBPD, ADDSUBPS, HADDPD, HADDPS, HSUBPD and HSUBPS; the only otherSSE3 instruction that can raise floating-point exceptions is FISTTP: it can generate x87 FPUinvalid operation and inexact result exceptions.Vol.
1 11-27PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)•Use stack and data alignment techniques to keep data properly aligned forefficient memory use.•Use the non-temporal store instructions offered with the SSE and SSE2extensions.•Employ the optimization and scheduling techniques described in the IntelPentium 4 Optimization Reference Manual (see Section 1.4, “Related Literature,”for the order number for this manual).11.6.2Checking for SSE/SSE2 SupportBefore an application attempts to use the SSE and/or SSE2 extensions, it shouldcheck that they are present on the processor:1. Check that the processor supports the CPUID instruction.
Bit 21 of the EFLAGSregister can be used to check processor’s support the CPUID instruction.2. Check that the processor supports the SSE and/or SSE2 extensions (true ifCPUID.01H:EDX.SSE[bit 25] = 1 and/or CPUID.01H:EDX.SSE2[bit 26] = 1).Operating system must provide system level support for handling SSE state, exceptions before an application can use the SSE and/or SSE2 extensions (see Chapter 12in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A,).If the processor attempts to execute an unsupported SSE or SSE2 instruction, theprocessor will generate an invalid-opcode exception (#UD). If an operating systemdid not provide adequate system level support for SSE, executing an SSE or SSE2instructions can also generate #UD.11.6.3Checking for the DAZ Flag in the MXCSR RegisterThe denormals-are-zero flag in the MXCSR register is available in most of thePentium 4 processors and in the Intel Xeon processor, with the exception of someearly steppings.
To check for the presence of the DAZ flag in the MXCSR register, dothe following:1. Establish a 512-byte FXSAVE area in memory.2. Clear the FXSAVE area to all 0s.3. Execute the FXSAVE instruction, using the address of the first byte of the clearedFXSAVE area as a source operand. See “FXSAVE—Save x87 FPU, MMX, SSE, andSSE2 State” in Chapter 3 of the Intel® 64 and IA-32 Architectures SoftwareDeveloper’s Manual, Volume 2A, for a description of the FXSAVE instruction andthe layout of the FXSAVE image.4. Check the value in the MXCSR_MASK field in the FXSAVE image (bytes 28through 31).— If the value of the MXCSR_MASK field is 00000000H, the DAZ flag anddenormals-are-zero mode are not supported.11-28 Vol. 1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)— If the value of the MXCSR_MASK field is non-zero and bit 6 is set, the DAZflag and denormals-are-zero mode are supported.If the DAZ flag is not supported, then it is a reserved bit and attempting to write a 1to it will cause a general-protection exception (#GP).
See Section 11.6.6, “Guidelinesfor Writing to the MXCSR Register,” for general guidelines for preventing generalprotection exceptions when writing to the MXCSR register.11.6.4Initialization of SSE/SE2 ExtensionsThe SSE and SSE2 state is contained in the XMM and MXCSR registers. Upon a hardware reset of the processor, this state is initialized as follows (see Table 11-2):•All SIMD floating-point exceptions are masked (bits 7 through 12 of the MXCSRregister is set to 1).•All SIMD floating-point exception flags are cleared (bits 0 through 5 of the MXCSRregister is set to 0).•The rounding control is set to round-nearest (bits 13 and 14 of the MXCSRregister are set to 00B).••The flush-to-zero mode is disabled (bit 15 of the MXCSR register is set to 0).•The denormals-are-zeros mode is disabled (bit 6 of the MXCSR register is set to0).
If the denormals-are-zeros mode is not supported, this bit is reserved and willbe set to 0 on initialization.Each of the XMM registers is cleared (set to all zeros).Vol. 1 11-29PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)Table 11-2. SSE and SSE2 State Following a Power-up/Reset or INITRegistersXMM0 through XMM7MXCSRPower-Up orResetINIT+0.0Unchanged1F80HUnchangedIf the processor is reset by asserting the INIT# pin, the SSE and SSE2 state is notchanged.11.6.5Saving and Restoring the SSE/SSE2 StateThe FXSAVE instruction saves the x87 FPU, MMX, SSE and SSE2 states (whichincludes the contents of eight XMM registers and the MXCSR registers) in a 512-byteblock of memory.
The FXRSTOR instruction restores the saved SSE and SSE2 statefrom memory. See the FXSAVE instruction in Chapter 3 of the Intel® 64 and IA-32Architectures Software Developer’s Manual, Volume 2A, for the layout of the512-byte state block.In addition to saving and restoring the SSE and SSE2 state, FXSAVE and FXRSTORalso save and restore the x87 FPU state (because MMX registers are aliased to thex87 FPU data registers this includes saving and restoring the MMX state). For greatercode efficiency, it is suggested that FXSAVE and FXRSTOR be substituted for theFSAVE, FNSAVE and FRSTOR instructions in the following situations:••When a context switch is being made in a multitasking environmentDuring calls and returns from interrupt and exception handlersIn situations where the code is switching between x87 FPU and MMX technologycomputations (without a context switch or a call to an interrupt or exception), theFSAVE/FNSAVE and FRSTOR instructions are more efficient than the FXSAVE andFXRSTOR instructions.11.6.6Guidelines for Writing to the MXCSR RegisterThe MXCSR has several reserved bits, and attempting to write a 1 to any of these bitswill cause a general-protection exception (#GP) to be generated.
To allow software toidentify these reserved bits, the MXCSR_MASK value is provided. Software can determine this mask value as follows:1. Establish a 512-byte FXSAVE area in memory.2. Clear the FXSAVE area to all 0s.3. Execute the FXSAVE instruction, using the address of the first byte of the clearedFXSAVE area as a source operand. See “FXSAVE—Save x87 FPU, MMX, SSE, andSSE2 State” in Chapter 3 of the Intel® 64 and IA-32 Architectures Software11-30 Vol.
1PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)Developer’s Manual, Volume 2A, for a description of FXSAVE and the layout of theFXSAVE image.4. Check the value in the MXCSR_MASK field in the FXSAVE image (bytes 28through 31).— If the value of the MXCSR_MASK field is 00000000H, then the MXCSR_MASKvalue is the default value of 0000FFBFH.
Note that this value indicates that bit6 of the MXCSR register is reserved; this setting indicates that thedenormals-are-zero mode is not supported on the processor.— If the value of the MXCSR_MASK field is non-zero, the MXCSR_MASK valueshould be used as the MXCSR_MASK.All bits set to 0 in the MXCSR_MASK value indicate reserved bits in the MXCSRregister. Thus, if the MXCSR_MASK value is AND’d with a value to be written into theMXCSR register, the resulting value will be assured of having all its reserved bits setto 0, preventing the possibility of a general-protection exception being generatedwhen the value is written to the MXCSR register.For example, the default MXCSR_MASK value when 00000000H is returned in theFXSAVE image is 0000FFBFH. If software AND’s a value to be written to MXCSRregister with 0000FFBFH, bit 6 of the result (the DAZ flag) will be ensured of beingset to 0, which is the required setting to prevent general-protection exceptions onprocessors that do not support the denormals-are-zero mode.To prevent general-protection exceptions, the MXCSR_MASK value should be AND’dwith the value to be written into the MXCSR register in the following situations:•Operating system routines that receive a parameter from an application programand then write that value to the MXCSR register (either with an FXRSTOR orLDMXCSR instruction)•Any application program that writes to the MXCSR register and that needs to runrobustly on several different IA-32 processorsNote that all bits in the MXCSR_MASK value that are set to 1 indicate features thatare supported by the MXCSR register; they can be treated as feature flags for identifying processor capabilities.11.6.7Interaction of SSE/SSE2 Instructions with x87 FPU and MMXInstructionsThe XMM registers and the x87 FPU and MMX registers represent separate executionenvironments, which has certain ramifications when executing SSE, SSE2, MMX, andx87 FPU instructions in the same code module or when mixing code modules thatcontain these instructions:•Those SSE and SSE2 instructions that operate only on XMM registers (such as thepacked and scalar floating-point instructions and the 128-bit SIMD integerinstructions) in the same instruction stream with 64-bit SIMD integer or x87 FPUinstructions without any restrictions.