Volume 1 Basic Architecture (794100), страница 87
Текст из файла (страница 87)
If this is not done, and also the SMMcode saves the x87 FPU state, AND an x87 FPU error handler is being used whichrelies on IGNNE# assertion, then (very rarely) the x87 FPU handler will nest insideitself and malfunction. The following example shows how this can happen.Suppose that the x87 FPU exception handler includes the following sequence:FNSTSW save_swOUT0F0H, AL....FLDCW new_cw....FCLEXD-22 Vol. 1; save the x87 FPU status word; using a no-wait x87 FPU instruction; clears IRQ13 & activates IGNNE#; loads new CW ignoring x87 FPU errors,; since IGNNE# is assumed active; or any; other x87 FPU instruction that is not a no-wait; type will cause the same problem; clear the x87 FPU error conditions & thus; turn off FERR# & reset the IGNNE# FFGUIDELINES FOR WRITING X87 FPU EXCEPTION HANDLERSThe problem will only occur if the processor enters SMM between the OUT and theFLDCW instructions.
But if that happens, AND the SMM code saves the x87 FPU stateusing FNSAVE, then the IGNNE# Flip Flop will be cleared (because FNSAVE clears thex87 FPU errors and thus de-asserts FERR#). When the processor returns from SMM itwill restore the x87 FPU state with FRSTOR, which will re-assert FERR#, but theIGNNE# Flip Flop will not get set.
Then when the x87 FPU error handler executes theFLDCW instruction, the active error condition will cause the processor to re-enter thex87 FPU error handler from the beginning. This may cause the handler to malfunction.To avoid this problem, Intel recommends two measures:1. Do not use the x87 FPU for calculations inside SMM code. (The normal powermanagement, and sometimes security, functions provided by SMM have no needfor x87 FPU calculations; if they are needed for some special case, use scaling oremulation instead.) This eliminates the need to do FNSAVE/FRSTOR inside SMMcode, except when going into a 0 V suspend state (in which, in order to savepower, the CPU is turned off completely, requiring its complete state to be saved).2. The system should not call upon SMM code to put the processor into 0 V suspendwhile the processor is running x87 FPU calculations, or just after an interrupt hasoccurred.
Normal power management protocol avoids this by going into powerdown states only after timed intervals in which no system activity occurs.D.3.6Considerations When x87 FPU Shared Between TasksThe IA-32 architecture allows speculative deferral of floating-point state swaps ontask switches. This feature allows postponing an x87 FPU state swap until an x87 FPUinstruction is actually encountered in another task. Since kernel tasks rarely usefloating-point, and some applications do not use floating-point or use it infrequently,the amount of time saved by avoiding unnecessary stores of the floating-point stateis significant. Speculative deferral of x87 FPU saves does, however, place an extraburden on the kernel in three key ways:1.
The kernel must keep track of which thread owns the x87 FPU, which may bedifferent from the currently executing thread.2. The kernel must associate any floating-point exceptions with the generating task.This requires special handling since floating-point exceptions are deliveredasynchronous with other system activity.3. There are conditions under which spurious floating-point exception interrupts aregenerated, which the kernel must recognize and discard.D.3.6.1Speculatively Deferring x87 FPU Saves, General OverviewIn order to support multitasking, each thread in the system needs a save area for thegeneral-purpose registers, and each task that is allowed to use floating-point needsan x87 FPU save area large enough to hold the entire x87 FPU stack and associatedx87 FPU state such as the control word and status word.
(See Section 8.1.10,Vol. 1 D-23GUIDELINES FOR WRITING X87 FPU EXCEPTION HANDLERS“Saving the x87 FPU’s State with FSTENV/FNSTENV and FSAVE/FNSAVE,” for acomplete description of the x87 FPU save image.) If the processor and the operatingsystem support Streaming SIMD Extensions, the save area should be large enoughand aligned correctly to hold x87 FPU and Streaming SIMD Extensions state.On a task switch, the general-purpose registers are swapped out to their save areafor the suspending thread, and the registers of the resuming thread are loaded. Thex87 FPU state does not need to be saved at this point.
If the resuming thread doesnot use the x87 FPU before it is itself suspended, then both a save and a load of thex87 FPU state has been avoided. It is often the case that several threads may beexecuted without any usage of the x87 FPU.The processor supports speculative deferral of x87 FPU saves via interrupt 7 “DeviceNot Available” (DNA), used in conjunction with CR0 bit 3, the “Task Switched” bit(TS). (See “Control Registers” in Chapter 2 of the Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual, Volume 3A.) Every task switch via the hardwaresupported task switching mechanism (see “Task Switching” in Chapter 6 of the Intel®64 and IA-32 Architectures Software Developer’s Manual, Volume 3A) sets TS. Multithreaded kernels that use software task switching1 can set the TS bit by reading CR0,ORing a “1” into2 bit 3, and writing back CR0.
Any subsequent floating-point instructions (now being executed in a new thread context) will fault via interrupt 7 beforeexecution.This allows a DNA handler to save the old floating-point context and reload the x87FPU state for the current thread.
The handler should clear the TS bit before exit usingthe CLTS instruction. On return from the handler the faulting thread will proceed withits floating-point computation.Some operating systems save the x87 FPU context on every task switch, typicallybecause they also change the linear address space between tasks. The problem andsolution discussed in the following sections apply to these operating systems also.D.3.6.2Tracking x87 FPU OwnershipSince the contents of the x87 FPU may not belong to the currently executing thread,the thread identifier for the last x87 FPU user needs to be tracked separately. This isnot complicated; the kernel should simply provide a variable to store the thread identifier of the x87 FPU owner, separate from the variable that stores the identifier forthe currently executing thread.
This variable is updated in the DNA exception1 In a software task switch, the operating system uses a sequence of instructions to save the suspending thread’s state and restore the resuming thread’s state, instead of the single long noninterruptible task switch operation provided by the IA-32 architecture.2 Although CR0, bit 2, the emulation flag (EM), also causes a DNA exception, do not use the EM bit asa surrogate for TS. EM means that no x87 FPU is available and that floating-point instructionsmust be emulated.
Using EM to trap on task switches is not compatible with the MMX technology.If the EM flag is set, MMX instructions raise the invalid opcode exception.D-24 Vol. 1GUIDELINES FOR WRITING X87 FPU EXCEPTION HANDLERShandler, and is used by the DNA exception handler to find the x87 FPU save areas ofthe old and new threads. A simplified flow for a DNA exception handler is then:1.
Use the “x87 FPU Owner” variable to find the x87 FPU save area of the last threadto use the x87 FPU.2. Save the x87 FPU contents to the old thread’s save area, typically using anFNSAVE or FXSAVE instruction.3. Set the x87 FPU Owner variable to the identify the currently executing thread.4. Reload the x87 FPU contents from the new thread’s save area, typically using anFRSTOR or FXSTOR instruction.5. Clear TS using the CLTS instruction and exit the DNA exception handler.While this flow covers the basic requirements for speculatively deferred x87 FPUstate swaps, there are some additional subtleties that need to be handled in a robustimplementation.D.3.6.3Interaction of x87 FPU State Saves and Floating-Point ExceptionAssociationRecall these key points from earlier in this document: When considering floatingpoint exceptions across all implementations of the IA-32 architecture, and across allfloating-point instructions, a floating-point exception can be initiated from any timeduring the excepting floating-point instruction, up to just before the next floatingpoint instruction.