Volume 3B System Programming Guide_ Part 2 (794104), страница 17
Текст из файла (страница 17)
(Optional) Select rising edge filtering by setting the CCCR edge flag.This setup procedure is continued in the next section, Section 18.15.6.3, “StartingEvent Counting.”Vol. 3 18-79DEBUGGING AND PERFORMANCE MONITORINGProcessor ClockOutput fromThreshold FilterCounter IncrementsOn Rising Edge(False-to-True)Figure 18-30. Effects of Edge Filtering18.15.6.3 Starting Event CountingEvent counting by a performance counter can be initiated in either of two ways. Thetypical way is to set the enable flag in the counter’s CCCR. Following the instructionto set the enable flag, event counting begins and continues until it is stopped (seeSection 18.15.6.5, “Halting Event Counting”).The following procedural step shows how to start event counting. This step is acontinuation of the setup procedure introduced in Section 18.15.6.2, “FilteringEvents.”9.
To start event counting, use the WRMSR instruction to set the CCCR enable flagfor the performance counter.This setup procedure is continued in the next section, Section 18.15.6.4, “Reading aPerformance Counter’s Count.”The second way that a counter can be started by using the cascade feature. Here, theoverflow of one counter automatically starts its alternate counter (see Section18.15.6.6, “Cascading Counters”).18.15.6.4 Reading a Performance Counter’s CountThe Pentium 4 and Intel Xeon processors’ performance counters can be read usingeither the RDPMC or RDMSR instructions. The enhanced functions of the RDPMCinstruction (including fast read) are described in Section 18.15.2, “PerformanceCounters.” These instructions can be used to read a performance counter while it iscounting or when it is stopped.The following procedural step shows how to read the event counter.
This step is acontinuation of the setup procedure introduced in Section 18.15.6.3, “Starting EventCounting.”10. To read a performance counters current event count, execute the RDPMCinstruction with the counter number obtained from Table 18-17 used as anoperand.18-80 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGThis setup procedure is continued in the next section, Section 18.15.6.5, “HaltingEvent Counting.”18.15.6.5 Halting Event CountingAfter a performance counter has been started (enabled), it continues counting indefinitely.
If the counter overflows (goes one count past its maximum count), it wrapsaround and continues counting. When the counter wraps around, it sets its OVF flagto indicate that the counter has overflowed. The OVF flag is a sticky flag that indicates that the counter has overflowed at least once since the OVF bit was lastcleared.To halt counting, the CCCR enable flag for the counter must be cleared.The following procedural step shows how to stop event counting. This step is acontinuation of the setup procedure introduced in Section 18.15.6.4, “Reading aPerformance Counter’s Count.”11.
To stop event counting, execute a WRMSR instruction to clear the CCCR enableflag for the performance counter.To halt a cascaded counter (a counter that was started when its alternate counteroverflowed), either clear the Cascade flag in the cascaded counter’s CCCR MSR orclear the OVF flag in the alternate counter’s CCCR MSR.18.15.6.6 Cascading CountersAs described in Section 18.15.2, “Performance Counters,” eighteen performancecounters are implemented in pairs. Nine pairs of counters and associated CCCRs arefurther organized as four blocks: BPU, MS, FLAME, and IQ (see Table 18-17).
The firstthree blocks contain two pairs each. The IQ block contains three pairs of counters (12through 17) with associated CCCRs (MSR_IQ_CCCR0 through MSR_IQ_CCCR5).The first 8 counter pairs (0 through 15) can be programmed using ESCRs to detectperformance monitoring events. Pairs of ESCRs in each of the four blocks allow manydifferent types of events to be counted.
The cascade flag in the CCCR MSR allowsnested monitoring of events to be performed by cascading one counter to a secondcounter located in another pair in the same block (see Figure 18-23 for the locationof the flag).Counters 0 and 1 form the first pair in the BPU block. Either counter 0 or 1 can beprogrammed to detect an event via MSR_MO B_ESCR0. Counters 0 and 2 can becascaded in any order, as can counters 1 and 3. It’s possible to set up 4 counters inthe same block to cascade on two pairs of independent events. The pairing describedalso applies to subsequent blocks. Since the IQ PUB has two extra counters,cascading operates somewhat differently if 16 and 17 are involved.
In the IQ block,counter 16 can only be cascaded from counter 14 (not from 12); counter 14 cannotbe cascaded from counter 16 using the CCCR cascade bit mechanism. Similar restrictions apply to counter 17.Vol. 3 18-81DEBUGGING AND PERFORMANCE MONITORINGExample 18-1. Counting EventsAssume a scenario where counter X is set up to count 200 occurrences of event A;then counter Y is set up to count 400 occurrences of event B. Each counter is set upto count a specific event and overflow to the next counter. In the above example,counter X is preset for a count of -200 and counter Y for a count of -400; this setupcauses the counters to overflow on the 200th and 400th counts respectively.Continuing this scenario, counter X is set up to count indefinitely and wraparound onoverflow.
This is described in the basic performance counter setup procedure thatbegins in Section 18.15.6.1, “Selecting Events to Count.” Counter Y is set up with thecascade flag in its associated CCCR MSR set to 1 and its enable flag set to 0.To begin the nested counting, the enable bit for the counter X is set.
Once enabled,counter X counts until it overflows. At this point, counter Y is automatically enabledand begins counting. Thus counter X overflows after 200 occurrences of event A.Counter Y then starts, counting 400 occurrences of event B before overflowing. Whenperformance counters are cascaded, the counter Y would typically be set up togenerate an interrupt on overflow.
This is described in Section 18.15.6.8, “Generating an Interrupt on Overflow.”The cascading counters mechanism can be used to count a single event. Thecounting begins on one counter then continues on the second counter after the firstcounter overflows. This technique doubles the number of event counts that can berecorded, since the contents of the two counters can be added together.18.15.6.7 EXTENDED CASCADINGExtended cascading is a model-specific feature in the Intel NetBurst microarchitecture.
The feature is available to Pentium 4 and Xeon processor family with familyencoding of 15 and model encoding greater than or equal to 2. This feature uses bit11 in CCCRs associated with the IQ block. See Table 18-19.Table 18-19. CCR Names and Bit PositionsCCCR Name:Bit PositionBit NameDescriptionMSR_IQ_CCCR1|2:11ReservedMSR_IQ_CCCR0:11CASCNT4INTO0Allow counter 4 to cascade intocounter 0MSR_IQ_CCCR3:11CASCNT5INTO3Allow counter 5 to cascade intocounter 3MSR_IQ_CCCR4:11CASCNT5INTO4Allow counter 5 to cascade intocounter 4MSR_IQ_CCCR5:11CASCNT4INTO5Allow counter 4 to cascade intocounter 518-82 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGThe extended cascading feature can be adapted to the sampling usage model forperformance monitoring. However, it is known that performance counters do notgenerate PMI in cascade mode or extended cascade mode due to an erratum.
Thiserratum applies to Pentium 4 and Intel Xeon processors with model encoding of 2.For Pentium 4 and Intel Xeon processors with model encoding of 0 and 1, the erratumapplies to processors with stepping encoding greater than 09H.Counters 16 and 17 in the IQ block are frequently used in precise event-basedsampling or at-retirement counting of events indicating a stalled condition in thepipeline.
Neither counter 16 or 17 can initiate the cascading of counter pairs usingthe cascade bit in a CCCR.Extended cascading permits performance monitoring tools to use counters 16 and 17to initiate cascading of two counters in the IQ block. Extended cascading fromcounter 16 and 17 is conceptually similar to cascading other counters, but instead ofusing CASCADE bit of a CCCR, one of the four CASCNTxINTOy bits is used.Example 18-2.
Scenario for Extended CascadingA usage scenario for extended cascading is to sample instructions retired on logicalprocessor 1 after the first 4096 instructions retired on logical processor 0. A procedure to program extended cascading in this scenario is outlined below:1. Write the value 0 to counter 12.2. Write the value 04000603H to MSR_CRU_ESCR0 (corresponding to selecting theNBOGNTAG and NBOGTAG event masks with qualification restricted to logicalprocessor 1).3. Write the value 04038800H to MSR_IQ_CCCR0.