Volume 3B System Programming Guide_ Part 2 (794104), страница 12
Текст из файла (страница 12)
One or more ofthese sub-fields may apply to specific events on an event-by-event basis. Details arelisted in Table A-3 in Appendix A, “Performance-Monitoring Events.”In addition, the UMASK filed may also contain a sub-field that allows detection specificity related to snoop responses. Bits of the snoop response qualification sub-fieldare defined in Table 18-11.Table 18-11. Bus Snoop Qualification Definitions within a Non-Architectural UmaskIA32_PERFEVTSELx MSRsBit Position 11:8DescriptionBit 11HITM responseBit 10ReservedBit 9HIT responseBit 8CLEAN responseThere are also non-architectural events that support qualification of different types ofsnoop operation. The corresponding bit field for snoop type qualification are listed inTable 18-12.Table 18-12.
Snoop Type Qualification Definitions within a Non-Architectural UmaskIA32_PERFEVTSELx MSRsBit Position 9:8DescriptionBit 9CMP2I snoopsBit 8CMP2S snoops18-52 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGNo more than one sub-field of MESI, snoop response, and snoop type qualificationsub-fields can be supported in a performance event.NOTESoftware must write known values to the performance counters priorto enabling the counters. The content of general-purpose countersand fixed-function counters are undefined after INIT or RESET.18.14.1 Fixed-function Performance CountersProcessors based on Intel Core microarchitecture provide three fixed-function performance counters.
Bits beyond the width of the fixed counter are reserved and must bewritten as zeros. Model-specific fixed-function performance counters on processorsthat support Architectural Perfmon version 1 are 40 bits wide.Each of the fixed-function counter is dedicated to count a pre-defined performancemonitoring events. The performance monitoring events associated with fixed-function counters and the addresses of these counters are listed in Table 18-13.Table 18-13. Association of Fixed-Function Performance Counters withArchitectural Performance EventsEvent NameFixed-Function PMCPMC AddressINSTR_RETIRED.ANYMSR_PERF_FIXED_CTR0/I 309HA32_FIXED_CTR0CPU_CLK_UNHALTED.COREMSR_PERF_FIXED_CTR1// 30AHIA32_FIXED_CTR1CPU_CLK_UNHALTED.REFMSR_PERF_FIXED_CTR2// 30BHIA32_FIXED_CTR2Programming the fixed-function performance counters does not involve any of theIA32_PERFEVTSELx MSRs, and does not require specifying any event masks.Instead, the MSR MSR_PERF_FIXED_CTR_CTRL provides multiple sets of 4-bit fields;each 4-bit field controls the operation of a fixed-function performance counter (PMC).See Figures 18-17.
Two sub-fields are defined for each control. See Figure 18-17; bitfields are:•Enable field (low 2 bits in each 4-bit control) — When bit 0 is set,performance counting is enabled in the corresponding fixed-functionperformance counter to increment when the target condition associated with thearchitecture performance event occurs at ring 0.When bit 1 is set, performance counting is enabled in the corresponding fixedfunction performance counter to increment when the target condition associatedwith the architecture performance event occurs at ring greater than 0.Vol.
3 18-53DEBUGGING AND PERFORMANCE MONITORINGWriting 0 to both bits stops the performance counter. Writing 11B causes thecounter to increment irrespective of privilege levels.6312 11PMI9 8 7ENPMI5 43 2 1 0ENPMIENCntr2 — Controls for MSR_PERF_FIXED_CTR2Cntr1 — Controls for MSR_PERF_FIXED_CTR1PMI — Enable PMI on overflowCntr0 — Controls for MSR_PERF_FIXED_CTR0ENABLE — 0: disable; 1: OS; 2: User; 3: All ring levelsReservedFigure 18-17. Layout of MSR_PERF_FIXED_CTR_CTRL MSR•PMI field (fourth bit in each 4-bit control) — When set, the logical processorgenerates an exception through its local APIC on overflow condition of therespective fixed-function counter.18.14.2 Global Counter Control FacilitiesProcessors based on Intel Core microarchitecture provides simplified performancecounter control that simplifies the most frequent operations in programming performance events, i.e. enabling/disabling event counting and checking the status ofcounter overflows.
This is done by the following three MSRs:•MSR_PERF_GLOBAL_CTRL allows software to enable/disable event counting forall or any combination of fixed-function PMCs (MSR_PERF_FIXED_CTRx) orgeneral-purpose PMCs via WRMSR once.•MSR_PERF_GLOBAL_STATUS allows software to query counter overflowconditions on any combination of fixed-function PMCs (MSR_PERF_FIXED_CTRx)or general-purpose PMCs via RDMSR once.•MSR_PERF_GLOBAL_OVF_CTRL allows software to clear counter overflowconditions on any combination of fixed-function PMCs (MSR_PERF_FIXED_CTRx)or general-purpose PMCs via WRMSR once.MSR_PERF_GLOBAL_CTRL MSR provides single-bit controls to enable counting ineach performance counter (see Figure 18-18). Writing 1 to enable bits inMSR_PERF_GLOBAL_CTRL is equivalent to writing 1s to enable bits for all privilegelevels in the respective IA32_PERFEVTSELx or MSR_PERF_FIXED_CTR_CTRL MSRs.18-54 Vol.
3DEBUGGING AND PERFORMANCE MONITORING6335 34 33 32 312 1 0FIXED_CTR2 enableFIXED_CTR1 enableFIXED_CTR0 enablePMC1 enablePMC0 enableReservedFigure 18-18. Layout of MSR_PERF_GLOBAL_CTRL MSRMSR_PERF_GLOBAL_STATUS MSR provides single-bit status used by software toquery the overflow condition of each performance counter. The MSR also providesadditional status bit to indicate overflow conditions when counters are programmedfor precise-event-based sampling (PEBS). The MSR_PERF_GLOBAL_STATUS MSRalso provides a ‘sticky bit’ to indicate changes to the state of performance monitoringhardware (see Figure 18-19).
A value of 1 in bits 34:32, 1, 0 indicates an overflowcondition has occurred in the associated counter.63 6235 34 33 32 312 1 0CondChgdOvfBufferFIXED_CTR2 OverflowFIXED_CTR1 OverflowFIXED_CTR0 OverflowPMC1 OverflowPMC0 OverflowReservedFigure 18-19. Layout of MSR_PERF_GLOBAL_STATUS MSRWhen a performance counter is configured for PEBS, an overflow condition in thecounter generates a performance-monitoring interrupt this signals a PEBS event. Ona PEBS event, the processor stores data records in the buffer area (see Section18.15.5), clears the counter overflow status, and sets the OvfBuffer bit inMSR_PERF_GLOBAL_STATUS.Vol.
3 18-55DEBUGGING AND PERFORMANCE MONITORINGMSR_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow the indicatorsfor general-purpose or fixed-function counters using WRMSR once (seeFigure 18-20). Clear overflow indications when:•Setting up new values in the event select and/or UMASK field for counting orsampling••Reloading counter values to continue samplingDisabling event counting or sampling63 6235 34 33 32 312 1 0ClrCondChgdClrOvfBufferFIXED_CTR2 ClrOverflowFIXED_CTR1 ClrOverflowFIXED_CTR0 ClrOverflowPMC1 ClrOverflowPMC0 ClrOverflowReservedFigure 18-20. Layout of MSR_PERF_GLOBAL_OVF_CTRL MSR18.14.3 At-Retirement EventsMany non-architectural performance events are impacted by the speculative natureof out-of-order execution.
A subset of non-architectural performance events onprocessors based on Intel Core microarchitecture are enhanced with a tagging mechanism (similar to that found in Intel NetBurst microarchitecture) that exclude contributions that arise from speculative execution. The at-retirement events available inprocessors based on Intel Core microarchitecture does not require special MSRprogramming control (see Section 18.15.7, “At-Retirement Counting”), but is limitedto IA32_PMC0. See Table 18-14 for a list of events available to processors based onIntel Core microarchitecture.18-56 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGTable 18-14.
At-Retirement Performance Events for Intel Core MicroarchitectureEvent NameUMaskEvent SelectITLB_MISS_RETIRED00HC9HMEM_LOAD_RETIRED.L1D_MISS01HCBHMEM_LOAD_RETIRED.L1D_LINE_MISS02HCBHMEM_LOAD_RETIRED.L2_MISS04HCBHMEM_LOAD_RETIRED.L2_LINE_MISS08HCBHMEM_LOAD_RETIRED.DTLB_MISS10HCBH18.14.4 Precise Even Based Sampling (PEBS)Processors based on Intel Core microarchitecture also support precise event basedsampling (PEBS). This feature was introduced by processors based on Intel NetBurstmicroarchitecture.PEBS uses a debug store mechanism and a performance monitoring interrupt tostore a set of architectural state information for the processor (See Section 18.15.8).The information provides architectural state of the instruction executed immediatelyafter the instruction that caused the event.In cases where the same instruction causes BTS and PEBS to be activated, PEBS isprocessed before BTS are processed.
The PMI request is held until the processorcompletes processing of PEBS and BTS.For processors based on Intel Core microarchitecture, events that support precisesampling are listed in Table 18-15. The procedure for detecting availability of PEBS isthe same as described in Section 18.15.8.1.Table 18-15. PEBS Performance Events for Intel Core MicroarchitectureEvent NameUMaskEvent SelectINSTR_RETIRED.ANY_P00HC0HX87_OPS_RETIRED.ANYFEHC1HBR_INST_RETIRED.MISPRED00HC5HSIMD_INST_RETIRED.ANY1FHC7HMEM_LOAD_RETIRED.L1D_MISS01HCBHMEM_LOAD_RETIRED.L1D_LINE_MISS02HCBHMEM_LOAD_RETIRED.L2_MISS04HCBHMEM_LOAD_RETIRED.L2_LINE_MISS08HCBHMEM_LOAD_RETIRED.DTLB_MISS10HCBHVol.
3 18-57DEBUGGING AND PERFORMANCE MONITORING18.14.4.1 Setting up the PEBS BufferFor processors based on Intel Core microarchitecture, PEBS is available usingIA32_PMC0 only. Use the following procedure to set up the processor andIA32_PMC0 counter for PEBS:1. Set up the precise event buffering facilities. Place values in the precise eventbuffer base, precise event index, precise event absolute maximum, precise eventinterrupt threshold, and precise event counter reset fields of the DS buffermanagement area. In processors based on Intel Core microarchitecture, PEBSrecords consist of 64-bit address entries. See Figure 18-27 to set up the preciseevent records buffer in memory.2.
Enable PEBS. Set the Enable PEBS on PMC0 flag (bit 0) in IA32_PEBS_ENABLEMSR.3. Set up the IA32_PMC0 performance counter and IA32_PERFEVTSEL0 for anevent listed in Table 18-15.18.14.4.2 Writing a PEBS Interrupt Service RoutineThe PEBS facilities share the same interrupt vector and interrupt service routine(called the DS ISR) with the non-precise event-based sampling and BTS facilities. Tohandle PEBS interrupts, PEBS handler code must be included in the DS ISR.