Volume 1 Application Programming (794095), страница 30
Текст из файла (страница 30)
The processor can read memory out-of-order to prevent stalling instructions that areexecuted out-of-order.Speculative reads are allowed. A speculative read occurs when the processor begins executing amemory-read instruction before it knows whether the instruction’s result will actually be needed.For example, the processor can predict a branch to occur and begin executing instructionsfollowing the predicted branch, before it knows whether the prediction is valid. When one of thespeculative instructions reads data from memory, the read itself is speculative.Reads can usually be reordered ahead of writes. Reads are generally given a higher priority by theprocessor than writes because instruction execution stalls if the read data required by an instructionis not immediately available.
Allowing reads ahead of writes usually maximizes softwareperformance.Reads can be reordered ahead of writes, except that a read cannot be reordered ahead of a priorwrite if the read is from the same location as the prior write. In this case, the read instruction stallsuntil the write instruction is committed. This is because the result of the write instruction isrequired by the read instruction for software to operate correctly.General-Purpose Programming93AMD64 Technology24592—Rev.
3.13—July 2007Some system devices might be sensitive to reads. Normally, applications do not have direct access tosystem devices, but instead call an operating-system service routine to perform the access on theapplication’s behalf. In this case, it is system software’s responsibility to enforce strong read-ordering.Write Ordering.
Writes affect program order because they affect the state of software-visibleresources. The rules governing write ordering are restrictive:••Generally, out-of-order writes are not allowed. Write instructions executed out-of-order cannotcommit (write) their result to memory until all previous instructions have completed in programorder. The processor can, however, hold the result of an out-of-order write instruction in a privatebuffer (not visible to software) until that result can be committed to memory.System software can create non-cacheable write-combining regions in memory when the order ofwrites is known to not affect system devices.
When writes are performed to write-combiningmemory, they can appear to complete out of order relative to other writes. See “Memory System”in Volume 2 for additional information.Speculative writes are not allowed. As with out-of-order writes, speculative write instructionscannot commit their result to memory until all previous instructions have completed in programorder. Processors can hold the result in a private buffer (not visible to software) until the result canbe committed.3.9.2 Forcing Memory OrderSpecial instructions are provided for application software to force memory ordering in situationswhere such ordering is important. These instructions are:•••Load Fence—The LFENCE instruction forces ordering of memory loads (reads). All memoryloads preceding the LFENCE (in program order) are completed prior to completing memory loadsfollowing the LFENCE.
Memory loads cannot be reordered around an LFENCE instruction, butother non-serializing instructions (such as memory writes) can be reordered around the LFENCE.Store Fence—The SFENCE instruction forces ordering of memory stores (writes). All memorystores preceding the SFENCE (in program order) are completed prior to completing memorystores following the SFENCE.
Memory stores cannot be reordered around an SFENCE instruction,but other non-serializing instructions (such as memory loads) can be reordered around theSFENCE.Memory Fence—The MFENCE instruction forces ordering of all memory accesses (reads andwrites). All memory accesses preceding the MFENCE (in program order) are completed prior tocompleting any memory access following the MFENCE. Memory accesses cannot be reorderedaround an MFENCE instruction, but other non-serializing instructions that do not access memorycan be reordered around the MFENCE.Although they serve different purposes, other instructions can be used as read/write barriers when theorder of memory accesses must be strictly enforced. These read/write barrier instructions force allprior reads and writes to complete before subsequent reads or writes are executed.
Unlike the fenceinstructions listed above, these other instructions alter the software-visible state. This makes theseinstructions less general and more difficult to use as read/write barriers than the fence instructions,94General-Purpose Programming24592—Rev. 3.13—July 2007AMD64 Technologyalthough their use may reduce the total number of instructions executed. The following instructions areusable as read/write barriers:•••Serializing instructions—Serializing instructions force the processor to commit the serializinginstruction and all previous instructions before the next instruction is fetched from memory. Theserializing instructions available to applications are CPUID and IRET.
A serializing instruction iscommitted when the following operations are complete:- The instruction has executed.- All registers modified by the instruction are updated.- All memory updates performed by the instruction are complete.- All data held in the write buffers have been written to memory. (Write buffers are described in“Write Buffering” on page 97).I/O instructions—Reads from and writes to I/O-address space use the IN and OUT instructions,respectively.
When the processor executes an I/O instruction, it orders it with respect to other loadsand stores, depending on the instruction:- IN instructions (IN, INS, and REP INS) are not executed until all previous stores to memoryand I/O-address space are complete.- Instructions following an OUT instruction (OUT, OUTS, or REP OUTS) are not executed untilall previous stores to memory and I/O-address space are complete, including the storeperformed by the OUT.Locked instructions—A locked instruction is one that contains the LOCK instruction prefix.
Alocked instruction is used to perform an atomic read-modify-write operation on a memoryoperand, so it needs exclusive access to the memory location for the duration of the operation.Locked instructions order memory accesses in the following way:- All previous loads and stores (in program order) are completed prior to executing the lockedinstruction.- The locked instruction is completed before allowing loads and stores for subsequentinstructions (in program order) to occur.Only certain instructions can be locked. See “Lock Prefix” in Volume 3 for a list of instructions thatcan use the LOCK prefix.3.9.3 CachesDepending on the instruction, operands can be encoded in the instruction opcode or located inregisters, I/O ports, or memory locations.
An operand that is located in memory can actually bephysically present in one or more locations within a system’s memory hierarchy.Memory Hierarchy. A system’s memory hierarchy may have some or all of the following levels:•Main Memory—Main memory is external to the processor chip and is the memory-hierarchy levelfarthest from the processor’s execution units.
All physical-memory addresses are present in mainmemory, which is implemented using relatively slow, but high-density memory devices.General-Purpose Programming95AMD64 Technology••24592—Rev. 3.13—July 2007External Caches—External caches are external to the processor chip, but are implemented usinglower-capacity, higher-performance memory devices than system memory. The system usesexternal caches to hold copies of frequently-used instructions and data found in main memory.
Asubset of the physical-memory addresses can be present in the external caches at any time. Asystem can contain any number of external caches, or none at all.Internal Caches—Internal caches are present on the processor chip itself, and are the closestmemory-hierarchy level to the processor’s execution units. Because of their presence on theprocessor chip, access to internal caches is very fast.