Volume 3B System Programming Guide_ Part 2 (794104), страница 10
Текст из файла (страница 10)
These MSRs have thefollowing properties:•IA32_PMCx MSRs start at address 0C1H and occupy a contiguous block of MSRaddress space; the number of MSRs per logical processor is reported usingCPUID.0AH.•IA32_PERFEVTSELx MSRs start at address 186H and occupy a contiguous blockof MSR address space.
Each performance event select register is paired with acorresponding performance counter in the 0C1H address block.•The bit width of an IA32_PMCx MSR is reported using the CPUID.0AH leaf. Bitsbeyond the width of the programmable counter are undefined, and are ignoredwhen written to. In the initial implementation, the bit width for read operations isreported using CPUID; write operations are limited to the low 32 bits of registers.•Bit field layout of IA32_PERFEVTSELx MSRs is defined architecturally.See Figure 18-12 for the bit field layout of IA32_PERFEVTSELx MSRs.
The bit fieldsare:•Event select field (bits 0 through 7) — Selects the event logic unit used todetect microarchitectural conditions (see Table 18-6, for a list of architecturalVol. 3 18-41DEBUGGING AND PERFORMANCE MONITORINGevents and their 8-bit codes). The set of values for this field is defined architecturally; each value corresponds to an event logic unit for use with an architecturalperformance event. The number of architectural events is queried usingCPUID.0AH:EAX. A processor may support only a subset of pre-defined values.633124 23 22 21 20 19 18 17 16 15Counter Mask I EN(CMASK)V N08 7IUN P E O S Unit Mask (UMASK)CSTREvent SelectINV—Invert counter maskEN—Enable countersINT—APIC interrupt enablePC—Pin controlE—Edge detectOS—Operating system modeUSR—User ModeReservedFigure 18-12. Layout of IA32_PERFEVTSELx MSRs•Unit mask (UMASK) field (bits 8 through 15) — These bits qualify thecondition that the selected event logic unit detects.
Valid UMASK values for eachevent logic unit are specific to the unit. For each architectural performance event,its corresponding UMASK value defines a specific microarchitectural condition.A pre-defined microarchitectural condition associated with an architectural eventmay not be applicable to a given processor.
The processor then reports only asubset of pre-defined architectural events. Pre-defined architectural events arelisted in Table 18-6; support for pre-defined architectural events is enumeratedusing CPUID.0AH:EBX. Architectural performance events available in the initialimplementation are listed in Table A-1.•USR (user mode) flag (bit 16) — Specifies that the selected microarchitecturalcondition is counted only when the logical processor is operating at privilegelevels 1, 2 or 3. This flag can be used with the OS flag.•OS (operating system mode) flag (bit 17) — Specifies that the selectedmicroarchitectural condition is counted only when the logical processor isoperating at privilege level 0.
This flag can be used with the USR flag.•E (edge detect) flag (bit 18) — Enables (when set) edge detection of theselected microarchitectural condition. The logical processor counts the number ofdeasserted to asserted transitions for any condition that can be expressed by theother fields. The mechanism does not permit back-to-back assertions to bedistinguished.18-42 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGThis mechanism allows software to measure not only the fraction of time spent ina particular state, but also the average length of time spent in such a state (forexample, the time spent waiting for an interrupt to be serviced).•PC (pin control) flag (bit 19) — When set, the logical processor toggles thePMi pins and increments the counter when performance-monitoring eventsoccur; when clear, the processor toggles the PMi pins when the counteroverflows.
The toggling of a pin is defined as assertion of the pin for a single busclock followed by deassertion.•INT (APIC interrupt enable) flag (bit 20) — When set, the logical processorgenerates an exception through its local APIC on counter overflow.•EN (Enable Counters) Flag (bit 22) — When set, performance counting isenabled in the corresponding performance-monitoring counter; when clear, thecorresponding counter is disabled. The event logic unit for a UMASK must bedisabled by setting IA32_PERFEVTSELx[bit 22] = 0, before writing toIA32_PMCx.•INV (invert) flag (bit 23) — Inverts the result of the counter-mask comparisonwhen set, so that both greater than and less than comparisons can be made.•Counter mask (CMASK) field (bits 24 through 31) — When this field is notzero, a logical processor compares this mask to the events count of the detectedmicroarchitectural condition during a single cycle.
If the event count is greaterthan or equal to this mask, the counter is incremented by one. Otherwise thecounter is not incremented.This mask is intended for software to characterize microarchitectural conditionsthat can count multiple occurrences per cycle (for example, two or more instructions retired per clock; or bus queue occupations). If the counter-mask field is 0,then the counter is incremented each cycle by the event count associated withmultiple occurrences.18.12.2 Architectural Performance Monitoring Version 2The enhanced features provided by architectural performance monitoring version 2include the following:•Fixed-function performance counter register and associated controlregister — Three of the architectural performance events are counted usingthree fixed-function MSRs (IA32_FIXED_CTR0 through IA32_FIXED_CTR2).
Eachof the fixed-function PMC can count only one architectural performance event.Configuring the fixed-function PMCs is done by writing to bit fields in the MSR(IA32_FIXED_CTR_CTRL) located at address 38DH. Unlike configuringperformance events for general-purpose PMCs (IA32_PMCx) via UMASK field in(IA32_PERFEVTSELx), configuring, programming IA32_FIXED_CTR_CTRL forfixed-function PMCs do not require any UMASK.•Simplified event programming — Most frequent operation in programmingperformance events are enabling/disabling event counting and checking theVol. 3 18-43DEBUGGING AND PERFORMANCE MONITORINGstatus of counter overflows.
Architectural performance event version 2 providesthree architectural MSRs:— IA32_PERF_GLOBAL_CTRL allows software to enable/disable event countingof all or any combination of fixed-function PMCs (IA32_FIXED_CTRx) or anygeneral-purpose PMCs via WRMSR once.— IA32_PERF_GLOBAL_STATUS allows software to query counter overflowconditions on any combination of fixed-function PMCs or general-purposePMCs via RDMSR once.— IA32_PERF_GLOBAL_OVF_CTRL allows software to clear counter overflowconditions on any combination of fixed-function PMCs or general-purposePMCs via WRMSR once.18.12.2.1 Architectural Performance Monitoring Version 2 FacilitiesThe facilities provided by architectural performance monitoring version 2 can bequeried from CPUID leaf 0AH by examining the content of register EDX:•Bits 0 through 5 of CPUID.0AH.EDX indicates the number of fixed-functionperformance counters available per core,•Bits 5 through 12 of CPUID.0AH.EDX indicates the bit-width of fixed-functionperformance counters.
Bits beyond the width of the fixed-function counter arereserved and must be written as zeros.NOTEEarly generation of processors based on Intel Core microarchitecturemay report in CPUID.0AH:EDX of support for version 2 but indicatingincorrect information of version 2 facilities.The IA32_FIXED_CTR_CTRL MSR include multiple sets of 4-bit field, each 4 bitfield controls the operation of a fixed-function performance counter. Figure 18-13shows the layout of 4-bit controls for each fixed-function PMC. Two sub-fields arecurrently defined within each control.
The definitions of the bit fields are:18-44 Vol. 3DEBUGGING AND PERFORMANCE MONITORING6312 11PMI9 8 7ENPMI5 43 2 1 0ENPMIENCntr2 — Controls for IA32_FIXED_CTR2Cntr1 — Controls for IA32_FIXED_CTR1PMI — Enable PMI on overflowCntr0 — Controls for IA32_FIXED_CTR0ENABLE — 0: disable; 1: OS; 2: User; 3: All ring levelsReservedFigure 18-13. Layout of IA32_FIXED_CTR_CTRL MSR•Enable field (lowest 2 bits within each 4-bit control) — When bit 0 is set,performance counting is enabled in the corresponding fixed-functionperformance counter to increment while the target condition associated with thearchitecture performance event occurred at ring 0.
When bit 1 is set,performance counting is enabled in the corresponding fixed-functionperformance counter to increment while the target condition associated with thearchitecture performance event occurred at ring greater than 0.
Writing 0 to bothbits stops the performance counter. Writing a value of 11B enables the counter toincrement irrespective of privilege levels.•PMI field (the fourth bit within each 4-bit control) — When set, the logicalprocessor generates an exception through its local APIC on overflow condition ofthe respective fixed-function counter.IA32_PERF_GLOBAL_CTRL MSR provides single-bit controls to enable counting ofeach performance counter.
Figure 18-14 shows the layout ofIA32_PERF_GLOBAL_CTRL. Writing 1 to each enable bit inIA32_PERF_GLOBAL_CTRL is equivalent to writing 1s to the enable bits for all privilege level in the respective IA32_PERFEVTSELx or IA32_FIXED_CTR_CTRL.Vol.