Volume 3B System Programming Guide_ Part 2 (794104), страница 14
Текст из файла (страница 14)
The records of architectural state provideadditional information for use in performance tuning. Precise event-basedsampling can be used to count only a subset of at-retirement events.The following sections describe the MSRs and data structures used for performancemonitoring in the Pentium 4 and Intel Xeon processors.18.15.1 ESCR MSRsThe 45 ESCR MSRs (see Table 18-17) allow software to select specific events to becountered.
Each ESCR is usually associated with a pair of performance counters (seeTable 18-17) and each performance counter has several ESCRs associated with it(allowing the events counted to be selected from a variety of events).Figure 18-21 shows the layout of an ESCR MSR. The functions of the flags and fieldsare:•USR flag, bit 2 — When set, events are counted when the processor is operatingat a current privilege level (CPL) of 1, 2, or 3. These privilege levels are generallyused by application code and unprotected operating system code.•OS flag, bit 3 — When set, events are counted when the processor is operatingat CPL of 0.
This privilege level is generally reserved for protected operatingsystem code. (When both the OS and USR flags are set, events are counted at allprivilege levels.)18-64 Vol. 3DEBUGGING AND PERFORMANCE MONITORING31 3025 24EventSelect5 4 3 2 1 09 8TagValueEvent MaskTag EnableOSUSRReserved6332ReservedFigure 18-21. Event Selection Control Register (ESCR) for Pentium 4and Intel Xeon Processors without HT Technology Support•Tag enable, bit 4 — When set, enables tagging of μops to assist in at-retirementevent counting; when clear, disables tagging. See Section 18.15.7, “AtRetirement Counting.”•Tag value field, bits 5 through 8 — Selects a tag value to associate with a μopto assist in at-retirement event counting.•Event mask field, bits 9 through 24 — Selects events to be counted from theevent class selected with the event select field.•Event select field, bits 25 through 30) — Selects a class of events to becounted. The events within this class that are counted are selected with the eventmask field.When setting up an ESCR, the event select field is used to select a specific class ofevents to count, such as retired branches.
The event mask field is then used to selectone or more of the specific events within the class to be counted. For example, whencounting retired branches, four different events can be counted: branch not takenpredicted, branch not taken mispredicted, branch taken predicted, and branch takenmispredicted. The OS and USR flags allow counts to be enabled for events that occurwhen operating system code and/or application code are being executed.
If neitherthe OS nor USR flag is set, no events will be counted.The ESCRs are initialized to all 0s on reset. The flags and fields of an ESCR are configured by writing to the ESCR using the WRMSR instruction. Table 18-17 gives theaddresses of the ESCR MSRs.Writing to an ESCR MSR does not enable counting with its associated performancecounter; it only selects the event or events to be counted.
The CCCR for the selectedperformance counter must also be configured. Configuration of the CCCR includesselecting the ESCR and enabling the counter.Vol. 3 18-65DEBUGGING AND PERFORMANCE MONITORING18.15.2 Performance CountersThe performance counters in conjunction with the counter configuration controlregisters (CCCRs) are used for filtering and counting the events selected by theESCRs. The Pentium 4 and Intel Xeon processors provide 18 performance countersorganized into 9 pairs.
A pair of performance counters is associated with a particularsubset of events and ESCR’s (see Table 18-17). The counter pairs are partitioned intofour groups:•The BPU group, includes two performance counter pairs:— MSR_BPU_COUNTER0 and MSR_BPU_COUNTER1.— MSR_BPU_COUNTER2 and MSR_BPU_COUNTER3.•The MS group, includes two performance counter pairs:— MSR_MS_COUNTER0 and MSR_MS_COUNTER1.— MSR_MS_COUNTER2 and MSR_MS_COUNTER3.•The FLAME group, includes two performance counter pairs:— MSR_FLAME_COUNTER0 and MSR_FLAME_COUNTER1.— MSR_FLAME_COUNTER2 and MSR_FLAME_COUNTER3.•The IQ group, includes three performance counter pairs:— MSR_IQ_COUNTER0 and MSR_IQ_COUNTER1.— MSR_IQ_COUNTER2 and MSR_IQ_COUNTER3.— MSR_IQ_COUNTER4 and MSR_IQ_COUNTER5.The MSR_IQ_COUNTER4 counter in the IQ group provides support for the PEBS.Alternate counters in each group can be cascaded: the first counter in one pair canstart the first counter in the second pair and vice versa.
A similar cascading ispossible for the second counters in each pair. For example, within the BPU group ofcounters, MSR_BPU_COUNTER0 can start MSR_BPU_COUNTER2 and vice versa, andMSR_BPU_COUNTER1 can start MSR_BPU_COUNTER3 and vice versa (see Section18.15.6.6, “Cascading Counters”). The cascade flag in the CCCR register for theperformance counter enables the cascading of counters.Each performance counter is 40-bits wide (see Figure 18-22). The RDPMC instructionhas been enhanced in the Pentium 4 and Intel Xeon processors to allow reading ofeither the full counter-width (40-bits) or the low 32-bits of the counter. Reading thelow 32-bits is faster than reading the full counter width and is appropriate in situations where the count is small enough to be contained in 32 bits.The RDPMC instruction can be used by programs or procedures running at any privilege level and in virtual-8086 mode to read these counters.
The PCE flag in controlregister CR4 (bit 8) allows the use of this instruction to be restricted to only programsand procedures running at privilege level 0.18-66 Vol. 3DEBUGGING AND PERFORMANCE MONITORING310Counter633239ReservedCounterFigure 18-22. Performance Counter (Pentium 4 and Intel Xeon Processors)The RDPMC instruction is not serializing or ordered with other instructions. Thus, itdoes not necessarily wait until all previous instructions have been executed beforereading the counter.
Similarly, subsequent instructions may begin execution beforethe RDPMC instruction operation is performed.Only the operating system, executing at privilege level 0, can directly manipulate theperformance counters, using the RDMSR and WRMSR instructions. A secure operating system would clear the PCE flag during system initialization to disable directuser access to the performance-monitoring counters, but provide a user-accessibleprogramming interface that emulates the RDPMC instruction.Some uses of the performance counters require the counters to be preset beforecounting begins (that is, before the counter is enabled).
This can be accomplished bywriting to the counter using the WRMSR instruction. To set a counter to a specifiednumber of counts before overflow, enter a 2s complement negative integer in thecounter. The counter will then count from the preset value up to -1 and overflow.Writing to a performance counter in a Pentium 4 or Intel Xeon processor with theWRMSR instruction causes all 40 bits of the counter to be written.18.15.3 CCCR MSRsEach of the 18 performance counters in a Pentium 4 or Intel Xeon processor has oneCCCR MSR associated with it (see Table 18-17).
The CCCRs control the filtering andcounting of events as well as interrupt generation. Figure 18-23 shows the layout ofan CCCR MSR. The functions of the flags and fields are as follows:•Enable flag, bit 12 — When set, enables counting; when clear, the counter isdisabled. This flag is cleared on reset.•ESCR select field, bits 13 through 15 — Identifies the ESCR to be used toselect events to be counted with the counter associated with the CCCR.•Compare flag, bit 18 — When set, enables filtering of the event count; whenclear, disables filtering.
The filtering method is selected with the threshold,complement, and edge flags.•Complement flag, bit 19 — Selects how the incoming event count is comparedwith the threshold value. When set, event counts that are less than or equal tothe threshold value result in a single count being delivered to the performanceVol. 3 18-67DEBUGGING AND PERFORMANCE MONITORINGcounter; when clear, counts greater than the threshold value result in a countbeing delivered to the performance counter (see Section 18.15.6.2, “FilteringEvents”).
The complement flag is not active unless the compare flag is set.•Threshold field, bits 20 through 23 — Selects the threshold value to be usedfor comparisons. The processor examines this field only when the compare flag isset, and uses the complement flag setting to determine the type of thresholdcomparison to be made. The useful range of values that can be entered in thisfield depend on the type of event being counted (see Section 18.15.6.2, “FilteringEvents”).•Edge flag, bit 24 — When set, enables rising edge (false-to-true) edgedetection of the threshold comparison output for filtering event counts; whenclear, rising edge detection is disabled.