Volume 2 System Programming (794096), страница 53
Текст из файла (страница 53)
Reads fromWC memory can be speculative.Writes to this memory type can be combined internally by the processor and written to memory asa single write operation to reduce memory accesses. For example, four word writes to consecutiveaddresses can be combined by the processor into a single quadword write, resulting in one memoryaccess instead of four.The WC memory type is useful for graphics-display memory buffers where the order of writes isnot important.Write-Protect (WP)—Reads from WP memory are cacheable and allocate cache lines on a readmiss. Reads from WP memory can be speculative.Writes to WP memory that hit in the cache do not update the cache.
Instead, all writes updatememory (write to memory), and writes that hit in the cache invalidate the cache line. Writebuffering of WP memory is allowed.The WP memory type is useful for shadowed-ROM memory where updates must be immediatelyvisible to all devices that read the shadow locations.Writethrough (WT)—Reads from WT memory are cacheable and allocate cache lines on a readmiss.
Reads from WT memory can be speculative.168Memory System24593—Rev. 3.13—July 2007•AMD64 TechnologyAll writes to WT memory update main memory, and writes that hit in the cache update the cacheline (cache lines remain in the same state after a write that hits a cache line). Writes that miss thecache do not allocate a cache line. Write buffering of WT memory is allowed.Writeback (WB)—Reads from WB memory are cacheable and allocate cache lines on a read miss.Cache lines can be allocated in the shared, exclusive, or modified states.
Reads from WB memorycan be speculative.All writes that hit in the cache update the cache line and place the cache line in the modified state.Writes that miss the cache allocate a new cache line and place the cache line in the modified state.Writes to main memory only take place during writeback operations.
Write buffering of WBmemory is allowed.The WB memory type provides the highest-possible performance and is useful for most softwareand data stored in system memory (DRAM).Table 7-1 shows the memory access ordering possible for each memory type supported by the AMD64architecture. Table 7-3 on page 171 shows the ordering behavior of various operations on variousmemory types in greater detail. Table 7-2 on page 170 shows the caching policy for the same memorytypes.Table 7-1.Memory Access by Memory TypeMemory TypeMemory AccessAllowedUC/CDWCWPWTWBOut-of-OrdernoyesyesyesyesSpeculativenoyesyesyesyesReorder Before WritenoyesyesyesyesOut-of-OrdernoyesnononoSpeculativenononononoBufferingnoyesyesyesyesnoyesnoyesyesReadWrite1CombiningNote:1.
Write-combining buffers are separate from write buffers.Memory System169AMD64 TechnologyTable 7-2.24593—Rev. 3.13—July 2007Caching Policy by Memory TypeCaching PolicyMemory TypeUCCDWCWPWTWBRead CacheablenononoyesyesyesWrite CacheablenonononoyesyesRead AllocatenononoyesyesyesWrite AllocatenononononoyesyesnoWrite Hits Update Memory2yesyes1yes2yes3Note:1. For the L1 data cache and the L2 cache, if an access hits the cache, the cache line is invalidated. If the cache lineis in the modified state, the line is written to main memory and then invalidated.
For the L1 instruction cache, readhits access the cache rather than main memory.2. The data is not cached, so a cache write hit cannot occur. However, memory is updated.3. Write hits update memory and invalidate the cache line.7.4.1 Memory Barrier Interaction with Memory TypesMemory types other than WB may allow weaker ordering in certain respects. When the ordering ofmemory accesses to differing memory types must be strictly enforced, software can use the LFENCE,MFENCE or SFENCE barrier instructions to force loads and stores to proceed in program order.Table 7-3 on page 171 summarizes the cases where a memory barrier must be inserted between twomemory operations.The table is read as follows: the ROW is the first memory operation in program order, followed by theCOLUMN, which is the second memory operation in program order.
The footnotes indicate the rulesfor memory ordering.170Memory System24593—Rev. 3.13—July 2007Table 7-3.AMD64 TechnologyMemory Access Ordering RulesLoad (wp, wt, wb)Load (uc)Load (wc, wc+)Store (wp, wt, wb)Store (uc)Store(wc, wc+, non-temporal)Load/Store (io)Lock (atomic)Serialize instructions/Interrupts/ExceptionsSecond Memory OperationLoad (wp, wt, wb)aab (lf)cccdddLoad (uc)aab (lf)cccdddLoad (wc, wc+)aab (lf)cccddde (mf)fe (mf)ggh (sf)dddifiggh (sf)dddStore (wc, wc+, non-temporal) e (mf)fe (mf)j (sf)j (sf)h (sf)dddLoad/Store (io)kkkkkld, kd, kd, kLock (atomic)kkkkkkd, kd, kd, kSerialize instruction/Interrupts/Exceptionslllllld, ld, ld, lFirst Memory OperationStore (wp, wt, wb)Store (uc)a — A load (wp,wt,wb,wc,wc+) may pass a previous non-conflicting store (wp,wt,wb,wc,wc+).b — A load (wc,wc+) may pass a previous load (wp,wt,wb,uc,wc,wc+).
To ensure memory order, anLFENCE instruction must be inserted between the two loads.c — A store (wp,wt,wb,uc,wc,wc+) may not pass a previous load (wp,wt,wb,uc,wc,wc+).d — All previous loads and stores complete to memory or I/O space before a memory access for an I/O,locked or serializing instruction is issued.e — A load (wp,wt,wb,wc,wc+) may pass a previous non-conflicting store (wp,wt,wb,wc,wc+).
To ensurememory order, an MFENCE instruction must be inserted between the store and the load.f — A load (uc) does not pass a previous store (wp,wt,wb,uc,wc,wc+).g — A store (wp,wt,wb,uc) does not pass a previous store (wp,wt,wb,uc).h — A store (wc,wc+) may pass a previous store (wp,wt,wb,uc) or non-conflicting store (wc,wc+). Toensure memory order, an SFENCE instruction must be inserted between these two stores. A store(wc,wc+) does not pass a previous conflicting store (wc,wc+).i — A load (wp,wt,wb,wc,wc+) does not pass a previous store (uc).j — A store (wp,wt,wb,uc) may pass a previous store (wc,wc+).
To ensure memory order, an SFENCEinstruction must be inserted between these two stores.k — All loads and stores associated with the I/O and locked instructions complete to memory (no bufferedstores) before a load or store from a subsequent instruction is issued.l — All loads and stores complete to memory for the serializing instruction before the subsequentinstruction fetch is issued.Memory System171AMD64 Technology7.524593—Rev. 3.13—July 2007Buffering and Combining Memory Writes7.5.1 Write BufferingWrites to memory (main memory and caches) can be stored internally by the processor in write buffers(also known as store buffers) before actually writing the data into a memory location. Systemperformance can be improved by buffering writes, as shown in the following examples:••When higher-priority memory transactions, such as reads, compete for memory access with writes,writes can be delayed in favor of reads, which minimizes or eliminates an instruction-executionstall due to a memory-operand read.When the memory is busy, buffering writes while the memory is busy removes the writes from theinstruction-execution pipeline, which frees instruction-execution resources.The processor manages the write buffer so that it is transparent to software.
Memory accesses checkthe write buffer, and the processor completes writes into memory from the buffer in program order.Also, the processor completely empties the write buffer by writing the contents to memory as a resultof performing any of the following operations:••••••SFENCE Instruction—Executing a store-fence (SFENCE) instruction forces all memory writesbefore the SFENCE (in program order) to be written into memory (or, for WB type, the cache)before memory writes that follow the SFENCE instruction. The memory-fence (MFENCE)instruction has a similar effect, but it forces the ordering of loads in addition to stores.Serializing Instructions—Executing a serializing instruction forces the processor to retire theserializing instruction (complete both instruction execution and result writeback) before the nextinstruction is fetched from memory.I/O instructions—Before completing an I/O instruction, all previous reads and writes must bewritten to memory, and the I/O instruction must complete before completing subsequent reads orwrites.
Writes to I/O-address space (OUT instruction) are never buffered.Locked Instructions—A locked instruction (an instruction executed using the LOCK prefix) mustcomplete after all previous reads and writes and before subsequent reads and writes. Locked writesare never buffered, although locked reads and writes are cacheable.Interrupts and Exceptions—Interrupts and exceptions are serializing events that force theprocessor to write all results from the write buffer to memory before fetching the first instructionfrom the interrupt or exception service routine.UC-Memory Reads—UC-memory reads are not reordered ahead of writes.Write buffers can behave similarly to write-combining buffers because multiple writes may becollected internally before transferring the data to caches or main memory.
See the following sectionfor a description of write combining.172Memory System24593—Rev. 3.13—July 2007AMD64 Technology7.5.2 Write CombiningWrite-combining memory uses a different buffering scheme than write buffering described above.Writes to write-combining (WC) memory can be combined internally by the processor in a buffer formore efficient transfer to main memory at a later time. For example, 16 doubleword writes toconsecutive memory addresses can be combined in the WC buffers and transferred to main memory asa single burst operation rather than as individual memory writes.The following instructions perform writes to WC memory:••••MASKMOVDQUMASKMOVQMOVNTDQMOVNTI•••MOVNTPDMOVNTPSMOVNTQWC memory is not cacheable.
A WC buffer writes its contents only to main memory.The size and number of WC buffers available is implementation dependent. The processor assigns anaddress range to an empty WC buffer when a WC-memory write occurs. The size and alignment of thisaddress range is equal to the buffer size. All subsequent writes to WC memory that fall within thisaddress range can be stored by the processor in the WC-buffer entry until an event occurs that causesthe processor to write the WC buffer to main memory.
After the WC buffer is written to main memory,the processor can assign a new address range on a subsequent WC-memory write.Writes to consecutive addresses in WC memory are not required for the processor to combine them.The processor combines any WC memory write that falls within the active-address range for a buffer.Multiple writes to the same address overwrite each other (in program order) until the WC buffer iswritten to main memory.It is possible for writes to proceed out of program order when WC memory is used. For example, awrite to cacheable memory that follows a write to WC memory can be written into the cache before theWC buffer is written to main memory. For this reason, and the reasons listed in the previous paragraph,software that is sensitive to the order of memory writes should avoid using WC memory.WC buffers are written to main memory under the same conditions as the write buffers, namely when:•Executing a store-fence (SFENCE) instruction.••••Executing a serializing instruction.Executing an I/O instruction.Executing a locked instruction (an instruction executed using the LOCK prefix).An interrupt or exception occurs.Memory System173AMD64 Technology24593—Rev.
3.13—July 2007WC buffers are also written to main memory when:••A subsequent non-write-combining operation has a write address that matches the WC-bufferactive-address range.A write to WC memory falls outside the WC-buffer active-address range. The existing buffercontents are written to main memory, and a new address range is established for the latest WCwrite.7.6Memory CachesThe AMD64 architecture supports the use of internal and external caches.