Volume 3A System Programming Guide_ Part 1 (794103), страница 79
Текст из файла (страница 79)
Note that code below can BIT OR the values of PACKGE and CORE IDbecause they have not been shifted right.The algorithm below assumes there is symmetry across package boundary if more than one socketis populated in an MP system.//Bucketing PACKAGE and CORE IDs and computing processor mask for every coreCoreNum = 1;CoreIDBucket[0] = PackageID[0] | CoreID[0];ProcessorMask = 1;CoreProcessorMask[0] = ProcessorMask;For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) {ProcessorMask << = 1;For (i=0; i < CoreNum; i++) {// we may be comparing bit-fields of logical processors residing in different// packages, the code below assume package symmetryIf ((PackageID[ProcessorNum] | CoreID[ProcessorNum]) == CoreIDBucket[i]) {CoreProcessorMask[i] |= ProcessorMask;Break; // found in existing bucket, skip to next iteration}}if (i == CoreNum) {//Did not match any bucket, start new bucketCoreIDBucket[i] = PackageID[ProcessorNum] | CoreID[ProcessorNum];CoreProcessorMask[i] = ProcessorMask;CoreNum++;}7-46 Vol.
3MULTIPLE-PROCESSOR MANAGEMENT}// CoreNum has the number of cores started in the OS// CoreProcessorMask[] array has the processor set of each coreOther processor relationships such as processor mask of sibling cores can becomputed from set operations of the PackageProcessorMask[] and CoreProcessorMask[].The algorithm shown above can be applied to earlier generations of single-core IA-32processors that support Hyper-Threading Technology and in situations that thedeterministic cache parameter leaf is not supported (provided CPUID supports initialAPIC ID). This is handled by ensuring MaxCoresPerPackage() return 1 in those situations.7.11MANAGEMENT OF IDLE AND BLOCKED CONDITIONSWhen a logical processor in an MP system (including multi-core processor or processors supporting Hyper-Threading Technology) is idle (no work to do) or blocked (on alock or semaphore), additional management of the core execution engine resourcecan be accomplished by using the HLT (halt), PAUSE, or the MONITOR/MWAITinstructions.7.11.1HLT InstructionThe HLT instruction stops the execution of the logical processor on which it isexecuted and places it in a halted state until further notice (see the description of theHLT instruction in Chapter 3 of the Intel® 64 and IA-32 Architectures SoftwareDeveloper’s Manual, Volume 2A).
When a logical processor is halted, active logicalprocessors continue to have full access to the shared resources within the physicalpackage. Here shared resources that were being used by the halted logical processorbecome available to active logical processors, allowing them to execute at greaterefficiency. When the halted logical processor resumes execution, shared resourcesare again shared among all active logical processors. (See Section 7.11.6.3, “HaltIdle Logical Processors,” for more information about using the HLT instruction withprocessors supporting Hyper-Threading Technology.)7.11.2PAUSE InstructionThe PAUSE instruction can improves the performance of processors supportingHyper-Threading Technology when executing “spin-wait loops” and other routineswhere one thread is accessing a shared lock or semaphore in a tight polling loop.When executing a spin-wait loop, the processor can suffer a severe performancepenalty when exiting the loop because it detects a possible memory order violationand flushes the core processor’s pipeline.
The PAUSE instruction provides a hint toVol. 3 7-47MULTIPLE-PROCESSOR MANAGEMENTthe processor that the code sequence is a spin-wait loop. The processor uses this hintto avoid the memory order violation and prevent the pipeline flush. In addition, thePAUSE instruction de-pipelines the spin-wait loop to prevent it from consumingexecution resources excessively. (See Section 7.11.6.1, “Use the PAUSE Instructionin Spin-Wait Loops,” for more information about using the PAUSE instruction withIA-32 processors supporting Hyper-Threading Technology.)7.11.3Detecting Support MONITOR/MWAIT InstructionStreaming SIMD Extensions 3 introduced two instructions (MONITOR and MWAIT) tohelp multithreaded software improve thread synchronization.
In the initial implementation, MONITOR and MWAIT are available to software at ring 0. The instructionsare conditionally available at levels greater than 0. Use the following steps to detectthe availability of MONITOR and MWAIT:••Use CPUID to query the MONITOR bit (CPUID.1.ECX[3] = 1).If CPUID indicates support, execute MONITOR inside a TRY/EXCEPT exceptionhandler and trap for an exception.
If an exception occurs, MONITOR and MWAITare not supported at a privilege level greater than 0. See Example 7-4.Example 7-4. Verifying MONITOR/MWAIT Supportboolean MONITOR_MWAIT_works = TRUE;try {_asm {xor ecx, ecxxor edx, edxmov eax, MemAreamonitor}// Use monitor} except (UNWIND) {// if we get here, MONITOR/MWAIT is not supportedMONITOR_MWAIT_works = FALSE;}7.11.4MONITOR/MWAIT InstructionOperating systems usually implement idle loops to handle thread synchronization.
Ina typical idle-loop scenario, there could be several “busy loops” and they would use aset of memory locations. An impacted processor waits in a loop and poll a memorylocation to determine if there is available work to execute. The posting of work istypically a write to memory (the work-queue of the waiting processor). The time forinitiating a work request and getting it scheduled is on the order of a few bus cycles.7-48 Vol.
3MULTIPLE-PROCESSOR MANAGEMENTFrom a resource sharing perspective (logical processors sharing executionresources), use of the HLT instruction in an OS idle loop is desirable but has implications. Executing the HLT instruction on a idle logical processor puts the targetedprocessor in a non-execution state. This requires another processor (when postingwork for the halted logical processor) to wake up the halted processor using an interprocessor interrupt. The posting and servicing of such an interrupt introduces a delayin the servicing of new work requests.In a shared memory configuration, exits from busy loops usually occur because of astate change applicable to a specific memory location; such a change tends to betriggered by writes to the memory location by another agent (typically a processor).MONITOR/MWAIT complement the use of HLT and PAUSE to allow for efficient partitioning and un-partitioning of shared resources among logical processors sharingphysical resources.
MONITOR sets up an effective address range that is monitored forwrite-to-memory activities; MWAIT places the processor in an optimized state (thismay vary between different implementations) until a write to the monitored addressrange occurs.In the initial implementation of MONITOR and MWAIT, they are available at CPL = 0only.Both instructions rely on the state of the processor’s monitor hardware.
The monitorhardware can be either armed (by executing the MONITOR instruction) or triggered(due to a variety of events, including a store to the monitored memory region). Ifupon execution of MWAIT, monitor hardware is in a triggered state: MWAIT behavesas a NOP and execution continues at the next instruction in the execution stream.The state of monitor hardware is not architecturally visible except through thebehavior of MWAIT.Multiple events other than a write to the triggering address range can cause aprocessor that executed MWAIT to wake up.
These include events that would lead tovoluntary or involuntary context switches, such as:•••External interrupts, including NMI, SMI, INIT, BINIT, MCERR, A20M#•Voluntary transitions due to fast system call and far calls (occurring prior toissuing MWAIT but after setting the monitor)Faults, Aborts (including Machine Check)Architectural TLB invalidations including writes to CR0, CR3, CR4 and certain MSRwrites; execution of LMSW (occurring prior to issuing MWAIT but after setting themonitor)Power management related events (such as Thermal Monitor 2 or chipset drivenSTPCLK# assertion) will not cause the monitor event pending flag to be cleared.Faults will not cause the monitor event pending flag to be cleared.Software should not allow for voluntary context switches in betweenMONITOR/MWAIT in the instruction flow.
Note that execution of MWAIT does not rearm the monitor hardware. This means that MONITOR/MWAIT need to be executed ina loop. Also note that exits from the MWAIT state could be due to a condition otherthan a write to the triggering address; software should explicitly check the triggeringVol.
3 7-49MULTIPLE-PROCESSOR MANAGEMENTdata location to determine if the write occurred. Software should also check the valueof the triggering address following the execution of the monitor instruction (and priorto the execution of the MWAIT instruction). This check is to identify any writes to thetriggering address that occurred during the course of MONITOR execution.The address range provided to the MONITOR instruction must be of write-backcaching type. Only write-back memory type stores to the monitored address rangewill trigger the monitor hardware. If the address range is not in memory of writeback type, the address monitor hardware may not be set up properly or the monitorhardware may not be armed. Software is also responsible for ensuring that•Writes that are not intended to cause the exit of a busy loop do not write to alocation within the address region being monitored by the monitor hardware,•Writes intended to cause the exit of a busy loop are written to locations within themonitored address region.Not doing so will lead to more false wakeups (an exit from the MWAIT state not dueto a write to the intended data location).
These have negative performance implications. It might be necessary for software to use padding to prevent false wakeups.CPUID provides a mechanism for determining the size data locations for monitoringas well as a mechanism for determining the size of a the pad.7.11.5Monitor/Mwait Address Range DeterminationTo use the MONITOR/MWAIT instructions, software should know the length of theregion monitored by the MONITOR/MWAIT instructions and the size of the coherenceline size for cache-snoop traffic in a multiprocessor system.