Volume 3A System Programming Guide_ Part 1 (794103), страница 76
Текст из файла (страница 76)
They alsosupport cross-modifying code, where on an MP system writes generated by oneprocessor modify instructions cached or currently in flight on another. See Section7.1.3, “Handling Self- and Cross-Modifying Code,” for a description of the requirements for self- and cross-modifying code in an IA-32 processor.7.8.13Implementation-Specific HT Technology FacilitiesThe following non-architectural facilities are implementation-specific in IA-32 processors supporting Hyper-Threading Technology:•••CachesTranslation lookaside buffers (TLBs)Thermal monitoring facilitiesThe Intel Xeon processor MP implementation is described in the following sections.7.8.13.1Processor CachesFor processors supporting Hyper-Threading Technology, the caches are shared. Anycache manipulation instruction that is executed on one logical processor has a globaleffect on the cache hierarchy of the physical processor.
Note the following:•WBINVD instruction — The entire cache hierarchy is invalidated after modifieddata is written back to memory. All logical processors are stopped from executinguntil after the write-back and invalidate operation is completed. A special buscycle is sent to all caching agents.•INVD instruction — The entire cache hierarchy is invalidated without writingback modified data to memory.
All logical processors are stopped from executinguntil after the invalidate operation is completed. A special bus cycle is sent to allcaching agents.•CLFLUSH instruction — The specified cache line is invalidated from the cachehierarchy after any modified data is written back to memory and a bus cycle issent to all caching agents, regardless of which logical processor caused the cacheline to be filled.•CD flag in control register CR0 — Each logical processor has its own CR0control register, and thus its own CD flag in CR0. The CD flags for the two logical7-32 Vol.
3MULTIPLE-PROCESSOR MANAGEMENTprocessors are ORed together, such that when any logical processor sets its CDflag, the entire cache is nominally disabled.7.8.13.2Processor Translation Lookaside Buffers (TLBs)In processors supporting Hyper-Threading Technology, data cache TLBs are shared.The instruction cache TLB is duplicated in each logical processor.Entries in the TLBs are tagged with an ID that indicates the logical processor thatinitiated the translation.
This tag applies even for translations that are marked globalusing the page global feature for memory paging.When a logical processor performs a TLB invalidation operation, only the TLB entriesthat are tagged for that logical processor are flushed. This protocol applies to all TLBinvalidation operations, including writes to control registers CR3 and CR4 and uses ofthe INVLPG instruction.7.8.13.3Thermal MonitorIn a processor that supports Hyper-Threading Technology, logical processors sharethe catastrophic shutdown detector and the automatic thermal monitoring mechanism (see Section 13.5, “Thermal Monitoring and Protection”). Sharing results in thefollowing behavior:•If the processor’s core temperature rises above the preset catastrophic shutdowntemperature, the processor core halts execution, which causes both logicalprocessors to stop execution.•When the processor’s core temperature rises above the preset automatic thermalmonitor trip temperature, the clock speed of the processor core is automaticallymodulated, which effects the execution speed of both logical processors.For software controlled clock modulation, each logical processor has its ownIA32_CLOCK_MODULATION MSR, allowing clock modulation to be enabled ordisabled on a logical processor basis.
Typically, if software controlled clock modulation is going to be used, the feature must be enabled for all the logical processorswithin a physical processor and the modulation duty cycle must be set to the samevalue for each logical processor. If the duty cycle values differ between the logicalprocessors, the processor clock will be modulated at the highest duty cycle selected.7.8.13.4External Signal CompatibilityThis section describes the constraints on external signals received through the pinsof a processor supporting Hyper-Threading Technology and how these signals areshared between its logical processors.•STPCLK# — A single STPCLK# pin is provided on the physical package of theIntel Xeon processor MP.
External control logic uses this pin for powermanagement within the system. When the STPCLK# signal is asserted, theprocessor core transitions to the stop-grant state, where instruction execution isVol. 3 7-33MULTIPLE-PROCESSOR MANAGEMENThalted but the processor core continues to respond to snoop transactions.Regardless of whether the logical processors are active or halted when theSTPCLK# signal is asserted, execution is stopped on both logical processors andneither will respond to interrupts.In MP systems, the STPCLK# pins on all physical processors are generally tiedtogether. As a result this signal affects all the logical processors within the systemsimultaneously.•LINT0 and LINT1 pins — A processor supporting Hyper-Threading Technologyhas only one set of LINT0 and LINT1 pins, which are shared between the logicalprocessors.
When one of these pins is asserted, both logical processors respondunless the pin has been masked in the APIC local vector tables for one or both ofthe logical processors.Typically in MP systems, the LINT0 and LINT1 pins are not used to deliverinterrupts to the logical processors. Instead all interrupts are delivered to thelocal processors through the I/O APIC.•A20M# pin — On an IA-32 processor, the A20M# pin is typically provided forcompatibility with the Intel 286 processor.
Asserting this pin causes bit 20 of thephysical address to be masked (forced to zero) for all external bus memoryaccesses. Processors supporting Hyper-Threading Technology provide oneA20M# pin, which affects the operation of both logical processors within thephysical processor.7.9MULTI-CORE ARCHITECTUREThis section describes the architecture of Intel 64 and IA-32 processors supportingdual-core and quad-core technology.
The discussion is applicable to the Intel Pentiumprocessor Extreme Edition, Pentium D, Intel Core Duo, Intel Core 2 Duo, Dual-coreIntel Xeon processor, Intel Core 2 Quad processors, and quad-core Intel Xeonprocessors. Features vary across different microarchitectures and are detectableusing CPUID.In general, each processor core has dedicated microarchitectural resources identicalto a single-processor implementation of the underlying microarchitecture withouthardware multi-threading capability.
Each logical processor in a dual-core processor(whether supporting Hyper-Threading Technology or not) has its own APIC functionality, PAT, machine check architecture, debug registers and extensions. Each logicalprocessor handles serialization instructions or self-modifying code on its own.Memory order is handled the same way as in Hyper-Threading Technology.The topology of the cache hierarchy (with respect to whether a given cache level isshared by one or more processor cores or by all logical processors in the physicalpackage) depends on the processor implementation. Software must use the deterministic cache parameter leaf of CPUID instruction to discover the cache-sharingtopology between the logical processors in a multi-threading environment.7-34 Vol.
3MULTIPLE-PROCESSOR MANAGEMENT7.9.1Logical Processor SupportThe topological composition of processor cores and logical processors in a multi-coreprocessor can be discovered using CPUID. Within each processor core, one or morelogical processors may be available.System software must follow the requirement MP initialization sequences (seeSection 7.5, “Multiple-Processor (MP) Initialization”) to recognize and enable logicalprocessors. At runtime, software can enumerate those logical processors enabled bysystem software to identify the topological relationships between these logicalprocessors.
(See Section 7.10.4, “Identifying Topological Relationships in a MPSystem”).7.9.2Memory Type Range Registers (MTRR)MTRR is shared between two logical processors sharing a processor core if the physical processor supports Hyper-Threading Technology. MTRR is not shared betweenlogical processors located in different cores or different physical packages.The Intel 64 and IA-32 architectures require that all logical processors in an MPsystem use an identical MTRR memory map.
This gives software a consistent view ofmemory, independent of the processor on which it is running.See Section 10.11, “Memory Type Range Registers (MTRRs).”7.9.3Performance Monitoring CountersPerformance counters and their companion control MSRs are shared between twological processors sharing a processor core if the processor core supports HyperThreading Technology. They are not shared between logical processors in differentcores or different physical packages.
As a result, software must manage the use ofthese resources, based on the topology of performance monitoring resources. Performance counter interrupts, events, and precise event monitoring support can be setup and allocated on a per thread (per logical processor) basis.See Section 18.16, “Performance Monitoring and Hyper-Threading Technology.”7.9.4IA32_MISC_ENABLE MSRThe IA32_MISC_ENABLE MSR (MSR address 1A0H) is shared between two logicalprocessors sharing a processor core if the processor core supports Hyper-ThreadingTechnology.
The MSR is not shared between logical processors in different cores ordifferent physical packages. This means that the architectural features that thisregister controls are set the same for the logical processors in the same core.Vol. 3 7-35MULTIPLE-PROCESSOR MANAGEMENT7.9.5MICROCODE UPDATE ResourcesMicrocode update facilities are shared between two logical processors sharing aprocessor core if the physical package supports Hyper-Threading Technology.
Theyare not shared between logical processors in different cores or different physicalpackages. Either logical processor that has access to the microcode update facilitycan initiate an update.Each logical processor has its own BIOS signature MSR (IA32_BIOS_SIGN_ID at MSRaddress 8BH). When a logical processor performs an update for the physicalprocessor, the IA32_BIOS_SIGN_ID MSRs for resident logical processors areupdated with identical information. If logical processors initiate an update simultaneously, the processor core provides the synchronization needed to ensure that onlyone update is performed at a time.7.10PROGRAMMING CONSIDERATIONS FOR HARDWAREMULTI-THREADING CAPABLE PROCESSORSIn a multi-threading environment, there may be certain hardware resources that arephysically shared at some level of the hardware topology.