Volume 2 System Programming (794096), страница 50
Текст из файла (страница 50)
After the writesare complete, the instruction invalidates all cache lines. This instruction operates on all caches in thememory hierarchy, including caches that are external to the processor.INVD Instruction. The invalidate (INVD) instruction is used to invalidate all cache lines in all cachesin the memory hierarchy. Unlike the WBINVD instruction, no modified cache lines are written tomemory. The INVD instruction should only be used in situations where memory coherency is notrequired.6.6.2 TLB InvalidationINVLPG Instruction.
The invalidate TLB entry (INVLPG) instruction can be used to invalidatespecific entries within the TLB. The source operand is a virtual-memory address that specifies theTLB entry to be invalidated. Invalidating a TLB entry does not remove the associated page-table entryfrom the data cache.
See “Translation-Lookaside Buffer (TLB)” on page 139 for more information.System-Management Instructions155AMD64 Technology15624593—Rev. 3.13—July 2007System-Management Instructions24593—Rev. 3.13—July 20077AMD64 TechnologyMemory SystemThis chapter describes:••••••Cache coherency mechanismsCache control mechanismsMemory typingMemory mapped I/OMemory ordering rulesSerializing instructionsFigure 7-1 on page 158 shows a conceptual picture of a processor and memory system, and how dataand instructions flow between the various components. This diagram is not intended to represent aspecific microarchitectural implementation but instead is used to illustrate the major memory-systemcomponents covered by this chapter.Memory System157AMD64 Technology24593—Rev.
3.13—July 2007Main MemorySystem Bus InterfaceL2 CacheL1Instruction CacheL1Data CacheWrite-CombiningBuffersWrite BuffersLoad/Store UnitExecution UnitsProcessor Chip513-211.epsFigure 7-1.Processor and Memory SystemThe memory-system components described in this chapter are shown as unshaded boxes in Figure 7-1.Those items are summarized in the following paragraphs.Main memory is external to the processor chip and is the memory-hierarchy level farthest from theprocessor execution units.Caches are the memory-hierarchy levels closest to the processor execution units. They are muchsmaller and much faster than main memory, and can be either internal or external to the processor chip.Caches contain copies of the most frequently used instructions and data.
By allowing fast access tofrequently used data, software can run much faster than if it had to access that data from main memory.Figure 7-1 shows three caches, all internal to the processor:158Memory System24593—Rev. 3.13—July 2007•••AMD64 TechnologyL1 Data Cache—The L1 (level-1) data cache holds the data most recently read or written by thesoftware running on the processor.L1 Instruction Cache—The L1 instruction cache is similar to the L1 data cache except that it holdsonly the instructions executed most frequently.
In some processor implementations, the L1instruction cache can be combined with the L1 data cache to form a unified L1 cache.L2 Cache—The L2 (level-2) cache is usually several times larger than the L1 caches, but it is alsoslower. It is common for L2 caches to be implemented as a unified cache containing bothinstructions and data. Recently used instructions and data that do not fit within the L1 caches canreside in the L2 cache. The L2 cache can be exclusive, meaning it does not cache informationcontained in the L1 cache.
Conversely, inclusive L2 caches contain a copy of the L1-cachedinformation.Memory-read operations from cacheable memory first check the cache to see if the requestedinformation is available. A read hit occurs if the information is available in the cache, and a read missoccurs if the information is not available. Likewise, a write hit occurs if the memory write can bestored in the cache, and a write miss occurs if it cannot be stored in the cache.Caches are divided into fixed-size blocks called cache lines.
The cache allocates lines to correspond toregions in memory of the same size as the cache line, aligned on an address boundary equal to thecache-line size. For example, in a cache with 32-byte lines, the cache lines are aligned on 32-byteboundaries and byte addresses 0007h and 001Eh are both located in the same cache line.
The size of acache line is implementation dependent. Most implementations have either 32-byte or 64-byte cachelines.The process of loading data into a cache is a cache-line fill. Even if only a single byte is requested, allbytes in a cache line are loaded from memory. Typically, a cache-line fill must remove (evict) anexisting cache line to make room for the new line loaded from memory. This process is called cacheline replacement. If the existing cache line was modified before the replacement, the processorperforms a cache-line writeback to main memory when it performs the cache-line fill.Cache-line writebacks help maintain coherency (consistency) between the caches and main memory.Internally, the processor can also maintain cache coherency by internally probing (checking) the othercaches and write buffers for a more recent version of the requested data.
External devices can alsocheck processor caches for more recent versions of data by externally probing the processor.Throughout this document, the term probe is used to refer to external probes, while internal probes arealways qualified with the word internal.Write buffers temporarily hold data writes when main memory or the caches are busy with othermemory accesses.
The existence of write buffers is implementation dependent.Implementations of the architecture can use write-combining buffers if the order and size of noncacheable writes to main memory is not important to the operation of software. These buffers cancombine multiple, individual writes to main memory and transfer the data in fewer bus transactions.Memory System159AMD64 Technology7.124593—Rev. 3.13—July 2007Single-Processor Memory Access OrderingThe flexibility in which memory accesses can be ordered is closely related to the flexibility in which aprocessor implementation can execute and retire instructions.
Instruction execution creates results andstatus and determines whether or not the instruction causes an exception. Instruction retirementcommits the results of instruction execution, in program order, to software-visible resources such asmemory, caches, write-combining buffers, and registers, or it causes an exception to occur ifinstruction execution created one.Implementations of the AMD64 architecture retire instructions in program order, but implementationscan execute instructions in any order, subject only to data dependencies.
Implementations can alsospeculatively execute instructions—executing instructions before knowing they are needed. Internally,implementations manage data reads and writes so that instructions complete in order. However,because implementations can execute instructions out of order and speculatively, the sequence ofmemory accesses performed by the hardware can appear to be out of program order. The followingsections describe the rules governing memory accesses to which processor implementations adhere.These rules may be further restricted, depending on the memory type being accessed.
Further, theserules govern single processor operation; see “Multiprocessor Memory Access Ordering” on page 162for multiprocessor ordering rules.7.1.1 Read OrderingGenerally, reads do not affect program order because they do not affect the state of software-visibleresources other than register contents. However, some system devices might be sensitive to reads. Insuch a situation software can map a read-sensitive device to a memory type that enforces strong readordering, or use read/write barrier instructions to force strong read-ordering.For cacheable memory types, the following rules govern read ordering:••••Out-of-order reads are allowed to the extent that they can be performed transparently to software,such that the appearance of in-order execution is maintained.