Volume 3A System Programming Guide_ Part 1 (794103), страница 69
Текст из файла (страница 69)
The hardware provides no resource thatguarantees fairness to participating agents. It is the responsibility ofsoftware to manage the fairness of semaphores and exclusive lockingfunctions.7-2 Vol. 3MULTIPLE-PROCESSOR MANAGEMENTThe mechanisms for handling locked atomic operations have evolved with thecomplexity of IA-32 processors. More recent IA-32 processors (such as thePentium 4, Intel Xeon, and P6 family processors) and Intel 64 provide a more refinedlocking mechanism than earlier processors.
These mechanisms are described in thefollowing sections.7.1.1Guaranteed Atomic OperationsThe Intel486 processor (and newer processors since) guarantees that the followingbasic memory operations will always be carried out atomically:•••Reading or writing a byteReading or writing a word aligned on a 16-bit boundaryReading or writing a doubleword aligned on a 32-bit boundaryThe Pentium processor (and newer processors since) guarantees that the followingadditional memory operations will always be carried out atomically:••Reading or writing a quadword aligned on a 64-bit boundary16-bit accesses to uncached memory locations that fit within a 32-bit data busThe P6 family processors (and newer processors since) guarantee that the followingadditional memory operation will always be carried out atomically:•Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cachelineAccesses to cacheable memory that are split across bus widths, cache lines, andpage boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel CoreDuo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors.The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6family processors provide bus control signals that permit external memorysubsystems to make split accesses atomic; however, nonaligned data accesses willseriously impact the performance of the processor and should be avoided.7.1.2Bus LockingIntel 64 and IA-32 processors provide a LOCK# signal that is asserted automaticallyduring certain critical memory operations to lock the system bus.
While this outputsignal is asserted, requests from other processors or bus agents for control of the busare blocked. Software can specify other occasions when the LOCK semantics are tobe followed by prepending the LOCK prefix to an instruction.In the case of the Intel386, Intel486, and Pentium processors, explicitly lockedinstructions will result in the assertion of the LOCK# signal. It is the responsibility ofthe hardware designer to make the LOCK# signal available in system hardware tocontrol memory accesses among processors.Vol.
3 7-3MULTIPLE-PROCESSOR MANAGEMENTFor the P6 and more recent processor families, if the memory area being accessed iscached internally in the processor, the LOCK# signal is generally not asserted;instead, locking is only applied to the processor’s caches (see Section 7.1.4, “Effectsof a LOCK Operation on Internal Processor Caches”).7.1.2.1Automatic LockingThe operations on which the processor automatically follows the LOCK semantics areas follows:•••When executing an XCHG instruction that references memory.When setting the B (busy) flag of a TSS descriptor — The processor testsand sets the busy flag in the type field of the TSS descriptor when switching to atask. To insure that two processors do not switch to the same task simultaneously, the processor follows the LOCK semantics while testing and setting thisflag.When updating segment descriptors — When loading a segment descriptor,the processor will set the accessed flag in the segment descriptor if the flag isclear.
During this operation, the processor follows the LOCK semantics so that thedescriptor will not be modified by another processor while it is being updated. Forthis action to be effective, operating-system procedures that update descriptorsshould use the following steps:— Use a locked operation to modify the access-rights byte to indicate that thesegment descriptor is not-present, and specify a value for the type field thatindicates that the descriptor is being updated.— Update the fields of the segment descriptor.
(This operation may requireseveral memory accesses; therefore, locked operations cannot be used.)— Use a locked operation to modify the access-rights byte to indicate that thesegment descriptor is valid and present.•The Intel386 processor always updates the accessed flag in the segmentdescriptor, whether it is clear or not. The Pentium 4, Intel Xeon, P6 family,Pentium, and Intel486 processors only update this flag if it is not already set.•When updating page-directory and page-table entries — When updatingpage-directory and page-table entries, the processor uses locked cycles to setthe accessed and dirty flag in the page-directory and page-table entries.•Acknowledging interrupts — After an interrupt request, an interrupt controllermay use the data bus to send the interrupt vector for the interrupt to theprocessor.
The processor follows the LOCK semantics during this time to ensurethat no other data appears on the data bus when the interrupt vector is beingtransmitted.7-4 Vol. 3MULTIPLE-PROCESSOR MANAGEMENT7.1.2.2Software Controlled Bus LockingTo explicitly force the LOCK semantics, software can use the LOCK prefix with thefollowing instructions when they are used to modify a memory location. An invalidopcode exception (#UD) is generated when the LOCK prefix is used with any otherinstruction or when no write operation is made to memory (that is, when the destination operand is in a register).••••The bit test and modify instructions (BTS, BTR, and BTC).•The following two-operand arithmetic and logical instructions: ADD, ADC, SUB,SBB, AND, OR, and XOR.The exchange instructions (XADD, CMPXCHG, and CMPXCHG8B).The LOCK prefix is automatically assumed for XCHG instruction.The following single-operand arithmetic and logical instructions: INC, DEC, NOT,and NEG.A locked instruction is guaranteed to lock only the area of memory defined by thedestination operand, but may be interpreted by the system as a lock for a largermemory area.Software should access semaphores (shared memory used for signalling betweenmultiple processors) using identical addresses and operand lengths.
For example, ifone processor accesses a semaphore using a word access, other processors shouldnot access the semaphore using a byte access.NOTEDo not implement semaphores using the WC memory type. Do notperform non-temporal stores to a cache line containing a locationused to implement a semaphore.The integrity of a bus lock is not affected by the alignment of the memory field. TheLOCK semantics are followed for as many bus cycles as necessary to update theentire operand. However, it is recommend that locked accesses be aligned on theirnatural boundaries for better system performance:••••Any boundary for an 8-bit access (locked or otherwise).16-bit boundary for locked word accesses.32-bit boundary for locked doubleword accesses.64-bit boundary for locked quadword accesses.Locked operations are atomic with respect to all other memory operations and allexternally visible events.
Only instruction fetch and page table accesses can passlocked instructions. Locked instructions can be used to synchronize data written byone processor and read by another processor.For the P6 family processors, locked operations serialize all outstanding load andstore operations (that is, wait for them to complete). This rule is also true for thePentium 4 and Intel Xeon processors, with one exception. Load operations that refer-Vol. 3 7-5MULTIPLE-PROCESSOR MANAGEMENTence weakly ordered memory types (such as the WC memory type) may not be serialized.Locked instructions should not be used to insure that data written can be fetched asinstructions.NOTEThe locked instructions for the current versions of the Pentium 4,Intel Xeon, P6 family, Pentium, and Intel486 processors allow datawritten to be fetched as instructions.
However, Intel recommendsthat developers who require the use of self-modifying code use adifferent synchronizing mechanism, described in the followingsections.7.1.3Handling Self- and Cross-Modifying CodeThe act of a processor writing data into a currently executing code segment withthe intent of executing that data as code is called self-modifying code.