Volume 2 System Programming (794096), страница 52
Текст из файла (страница 52)
There are no constraints on the relative order ofStore A and Load A in processor 0, and store B and Load B in processor 1.If a very strong memory ordering model is required that does not allow local store-load bypasses,an MFENCE instruction should be used between the store and the subsequent load or asynchronizing instruction such as LOCK XCHG should be used for the store. This memoryordering is stronger than total store ordering.164Memory System24593—Rev. 3.13—July 2007Processor 0Store A ← 1MFENCELoad r1 ALoad r2 BAMD64 TechnologyProcessor 1Store B ← 1MFENCELoad r3 BLoad r4 AThe MFENCE instruction ensures that any buffered stores are globally visible before the loads areallowed to execute, so the result r1 = 1, r2 = 0, r3 = 1 and r4 = 0 is not allowed.
Similarly, a LOCKXCHG would ensure the loads don't execute until its store operation is globally visible.7.3Memory Coherency and ProtocolImplementations that support caching support a cache-coherency protocol for maintaining coherencybetween main memory and the caches. The cache-coherency protocol is also used to maintaincoherency between all processors in a multiprocessor system. The cache-coherency protocolsupported by the AMD64 architecture is the MOESI (modified, owned, exclusive, shared, invalid)protocol. The states of the MOESI protocol are:•••••Invalid—A cache line in the invalid state does not hold a valid copy of the data.
Valid copies of thedata can be either in main memory or another processor cache.Exclusive—A cache line in the exclusive state holds the most recent, correct copy of the data. Thecopy in main memory is also the most recent, correct copy of the data. No other processor holds acopy of the data.Shared—A cache line in the shared state holds the most recent, correct copy of the data. Otherprocessors in the system may hold copies of the data in the shared state, as well. If no otherprocessor holds it in the owned state, then the copy in main memory is also the most recent.Modified—A cache line in the modified state holds the most recent, correct copy of the data.
Thecopy in main memory is stale (incorrect), and no other processor holds a copy.Owned—A cache line in the owned state holds the most recent, correct copy of the data. Theowned state is similar to the shared state in that other processors can hold a copy of the most recent,correct data. Unlike the shared state, however, the copy in main memory can be stale (incorrect).Only one processor can hold the data in the owned state—all other processors must hold the data inthe shared state.Figure 7-2 on page 166 shows the general MOESI state transitions possible with various types ofmemory accesses.
This is a logical software view, not a hardware view, of how cache-line statetransitions. Instruction-execution activity and external-bus transactions can both be used to modify thecache MOESI state in multiprocessing or multi-mastering systems.Memory System165AMD64 Technology24593—Rev. 3.13—July 2007ResetINVD, WBINVDRead HitProbe Write HitExclusiveInvalidProbe Write HitRead MProWriteProPHitBm)SharedProbe ReadOwnedRead HitProbe Read Hit(WHitrobssoryteWriaeeRMiWriteembeitdHbeHitWriteiss, SharedRead Miss, ExclusiveHitModifiedWrite HitRead HitWrite HitWrite HitRead HitProbe Read Hit513-212.epsFigure 7-2.MOESI State TransitionsTo maintain memory coherency, external bus masters (typically other processors with their owninternal caches) need to acquire the most recent copy of data before caching it internally. That copy canbe in main memory or in the internal caches of other bus-mastering devices.
When an external masterhas a cache read-miss or write-miss, it probes the other mastering devices to determine whether themost recent copy of data is held in any of their caches. If one of the other mastering devices holds themost recent copy, it provides it to the requesting device. Otherwise, the most recent copy is providedby main memory.166Memory System24593—Rev. 3.13—July 2007AMD64 TechnologyThere are two general types of bus-master probes:••Read probes indicate the external master is requesting the data for read purposes.Write probes indicate the external master is requesting the data for the purpose of modifying it.Referring back to Figure 7-2 on page 166, the state transitions involving probes are initiated by otherprocessors and external bus masters into the processor.
Some read probes are initiated by devices thatintend to cache the data. Others, such as those initiated by I/O devices, do not intend to cache the data.Some processor implementations do not change the data MOESI state if the read probe is initiated by adevice that does not intend to cache the data.State transitions involving read misses and write misses can cause the processor to generate probesinto external bus masters and to read main memory.Read hits do not cause a MOESI-state change. Write hits generally cause a MOESI-state change intothe modified state. If the cache line is already in the modified state, a write hit does not change its state.The specific operation of external-bus signals and transactions and how they influence a cache MOESIstate are implementation dependent.
For example, an implementation could convert a write miss to aWB memory type into two separate MOESI-state changes. The first would be a read-miss placing thecache line in the exclusive state. This would be followed by a write hit into the exclusive cache line,changing the cache-line state to modified.7.3.1 Special Coherency ConsiderationsIn some cases, data can be modified in a manner that is impossible for the memory-coherency protocolto handle due to the effects of instruction prefetching. In such situations software must use serializinginstructions and/or cache-invalidation instructions to guarantee subsequent data accesses are coherent.An example of this type of a situation is a page-table update followed by accesses to the physical pagesreferenced by the updated page tables. The following sequence of events shows what can happen whensoftware changes the translation of virtual-page A from physical-page M to physical-page N:1.
Software invalidates the TLB entry. The tables that translate virtual-page A to physical-page M arenow held only in main memory. They are not cached by the TLB.2. Software changes the page-table entry for virtual-page A in main memory to point to physicalpage N rather than physical-page M.3. Software accesses data in virtual-page A.During Step 3, software expects the processor to access the data from physical-page N. However, it ispossible for the processor to prefetch the data from physical-page M before the page table for virtualpage A is updated in Step 2.
This is because the physical-memory references for the page tables aredifferent than the physical-memory references for the data. Because the physical-memory referencesare different, the processor does not recognize them as requiring coherency checking and believes it issafe to prefetch the data from virtual-page A, which is translated into a read from physical page M.Similar behavior can occur when instructions are prefetched from beyond the page table updateinstruction.Memory System167AMD64 Technology24593—Rev.
3.13—July 2007To prevent this problem, software must use an INVLPG or MOV CR3 instruction immediately afterthe page-table update to ensure that subsequent instruction fetches and data accesses use the correctvirtual-page-to-physical-page translation. It is not necessary to perform a TLB invalidation operationpreceding the table update.7.4Memory TypesThe AMD64 architecture defines the following memory types:•••••Uncacheable (UC)—Reads from, and writes to, UC memory are not cacheable. Reads from UCmemory cannot be speculative.
Write-combining to UC memory is not allowed. Reads from UCmemory cause the write buffers to be written to memory and invalidated.The UC memory type is useful for memory-mapped I/O devices where strict ordering of reads andwrites is important.Cache Disable (CD)—The CD memory type is a form of uncacheable memory type that occurswhen caches are disabled (CR0.CD=1).
With CD memory, it is possible for the address to becached due to an earlier cacheable access, or due to two virtual-addresses aliasing to a singlephysical address.For the L1 data cache and the L2 cache, reads from, and writes to, CD memory that hit the cachecause the cache line to be invalidated before accessing main memory. If the cache line is in themodified state, the line is written to main memory and then invalidated.For the L1 instruction cache, reads from CD memory that hit the cache read the cached instructionsrather than access main memory. Reads that miss the cache access main memory and do not causecache-line replacement.Write-Combining (WC)—Reads from, and writes to, WC memory are not cacheable.