Volume 3B System Programming Guide_ Part 2 (794104), страница 79
Текст из файла (страница 79)
3SEG_RENAME_STALLS.ANYSEG_REG_RENAMES.ESAny(ES/DS/FS/GS)segmentrename stallThis event counts the number of stalls dueto the lack of renaming resources for theES, DS, FS, and GS segment registers.Segmentrenames - ESThis event counts the number of times theES segment register is renamed.If a segment is renamed but not retiredand a second update to the same segmentoccurs, a stall occurs in the front-end ofthe pipeline until the renamed segmentretires.PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueEvent NameDefinitionDescription andCommentD5H02HSEG_REG_RENAMES.DSSegmentrenames - DSThis event counts the number of times theDS segment register is renamed.D5H04HSEG_REG_RENAMES.FSSegmentrenames - FSThis event counts the number of times theFS segment register is renamed.D5H08HSEG_REG_RENAMES.GSSegmentrenames - GSThis event counts the number of times theGS segment register is renamed.D5H0FHSEG_REG_RENAMES.ANYAny(ES/DS/FS/GS)segmentrenameThis event counts the number of timesany of the four segment registers(ES/DS/FS/GS) is renamed.DCH01HRESOURCE_Cycles duringSTALLS.ROB_FULL which the ROBfullThis event counts the number of cycleswhen the number of instructions in thepipeline waiting for retirement reachesthe limit the processor can handle.A high count for this event indicates thatthere are long latency operations in thepipe (possibly load and store operationsthat miss the L2 cache, and otherinstructions that depend on these cannotexecute until the former instructionscomplete execution).
In this situation newinstructions can not enter the pipe andstart execution.DCH02HRESOURCE_STALLS.RS_FULLCycles duringwhich the RSfullThis event counts the number of cycleswhen the number of instructions in thepipeline waiting for execution reaches thelimit the processor can handle.A high count of this event indicates thatthere are long latency operations in thepipe (possibly load and store operationsthat miss the L2 cache, and otherinstructions that depend on these cannotexecute until the former instructionscomplete execution).
In this situation newinstructions can not enter the pipe andstart execution.Vol. 3 A-43PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueDCH04Event NameDefinitionRESOURCE_STALLS.LD_STCycles duringwhich thepipeline hasexceeded loador store limit orwaiting tocommit allstoresDescription andCommentThis event counts the number of cycleswhile resource-related stalls occur due to:• The number of load instructions in thepipeline reached the limit the processorcan handle.
The stall ends when aloading instruction retires.• The number of store instructions in thepipeline reached the limit the processorcan handle. The stall ends when astoring instruction commits its data tothe cache or memory.• There is an instruction in the pipe thatcan be executed only when all previousstores complete and their data iscommitted in the caches or memory.For example, the SFENCE and MFENCEinstructions require this behavior.DCH08HRESOURCE_STALLS.FPCWCycles stalleddue to FPUcontrol wordwriteThis event counts the number of cycleswhile execution was stalled due to writingthe floating-point unit (FPU) control word.DCH10HRESOURCE_Cycles stalledSTALLS.BR_MISS_C due to branchLEARmispredictionThis event counts the number of cyclesafter a branch misprediction is detected atexecution until the branch and all oldermicro-ops retire.
During this time newmicro-ops cannot enter the out-of-orderpipeline.DCH1FHRESOURCE_STALLS.ANYThis event counts the number of cycleswhile resource-related stalls occurs forany conditions described by the followingevents:Resourcerelated stalls•••••E0H00HA-44 Vol. 3BR_INST_DECODEDBranchinstructionsdecodedRESOURCE_STALLS.ROB_FULLRESOURCE_STALLS.RS_FULLRESOURCE_STALLS.LD_STRESOURCE_STALLS.FPCWRESOURCE_STALLS.BR_MISS_CLEARThis event counts the number of branchinstructions decoded.PERFORMANCE-MONITORING EVENTSTable A-3.
Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueEvent NameDefinitionDescription andCommentE4H00HBOGUS_BRBogus branches This event counts the number of bytesequences that were mistakenly detectedas taken branch instructions.This results in a BACLEAR event. Thisoccurs mainly after task switches.E6H00HBACLEARSBACLEARSassertedThis event counts the number of times thefront end is resteered, mainly when theBPU cannot provide a correct predictionand this is corrected by other branchhandling mechanisms at the front and.This can occur if the code has manybranches such that they cannot beconsumed by the BPU.Each BACLEAR asserted costsapproximately 7 cycles of instructionfetch.
The effect on total execution timedepends on the surrounding code.F000HPREF_RQSTS_UPUpwardprefetchesissued fromDPLThis event counts the number of upwardprefetches issued from the Data PrefetchLogic (DPL) to the L2 cache. A prefetchrequest issued to the L2 cache cannot becancelled and the requested cache line isfetched to the L2 cache.F800HPREF_RQSTS_DNDownwardprefetchesissued fromDPL.This event counts the number ofdownward prefetches issued from theData Prefetch Logic (DPL) to the L2 cache.A prefetch request issued to the L2 cachecannot be cancelled and the requestedcache line is fetched to the L2 cache.A.3PERFORMANCE MONITORING EVENTS FOR INTEL®CORE™ SOLO AND INTEL® CORE™ DUO PROCESSORSTable A-4 lists non-architectural performance events for Intel Core Duo processors. Ifa non-architectural event requires qualification in core specificity, it is indicated in thecomment column. Table A-4 also applies to Intel Core Solo processors; bits in theunit mask corresponding to core-specificity are reserved and should be 00B.Vol.
3 A-45PERFORMANCE-MONITORING EVENTSTable A-4. Non-Architectural Performance Eventsin Intel Core Solo and Intel Core Duo ProcessorsEventNum.Event MaskMnemonicUmaskValue03HLD_Blocks00HDescriptionLoad operations delayed due tostore buffer blocks.The preceding store may beblocked due to unknown address,unknown data, or conflict due topartial overlap between the loadand store.04HSD_Drains00HCycles while draining store buffers05HMisalign_Mem_Ref00HMisaligned data memoryreferences (MOB splits of loadsand stores).06HSeg_Reg_Loads00HSegment register loads07HSSE_PrefNta_Ret00HSSE software prefetch instructionPREFETCHNTA retired07HSSE_PrefT1_Ret01HSSE software prefetch instructionPREFETCHT1 retired07HSSE_PrefT2_Ret02HSSE software prefetch instructionPREFETCHT2 retired07HSSE_NTStores_Ret03HSSE streaming store instructionretired10HFP_Comps_Op_Exe00HFP computational Instructionexecuted.
FADD, FSUB, FCOM,FMULs, MUL, IMUL, FDIVs, DIV, IDIV,FPREMs, FSQRT are included; butexclude FADD or FMUL used in themiddle of a transcendentalinstruction.11HFP_Assist00HFP exceptions experiencedmicrocode assists12HMul00HMultiply operations (a speculativecount, including FP and integermultiplies).13HDiv00HDivide operations (a speculativecount, including FP and integerdivisions).14HCycles_Div_Busy00HCycles the divider is busyA-46 Vol. 3CommentPERFORMANCE-MONITORING EVENTSTable A-4. Non-Architectural Performance Eventsin Intel Core Solo and Intel Core Duo Processors (Contd.)EventNum.Event MaskMnemonicUmaskValueDescriptionComment21HL2_ADS00HL2 Address strobesRequires corespecificity22HDbus_Busy00HCore cycle during which data buswas busy (increments by 4)Requires corespecificity23HDbus_Busy_Rd00HCycles data bus is busytransferring data to a core(increments by 4)Requires corespecificity24HL2_Lines_In00HL2 cache lines allocatedRequires corespecificity andHW prefetchqualification25HL2_M_Lines_In00HL2 Modified-state cache linesallocatedRequires corespecificity26HL2_Lines_Out00HL2 cache lines evicted27HL2_M_Lines_Out00HL2 Modified-state cache linesevictedRequires corespecificity andHW prefetchqualification28HL2_IFetchRequiresMESIqualificationL2 instruction fetches frominstruction fetch unit (includesspeculative fetches)Requires corespecificity29HL2_LDRequiresMESIqualificationL2 cache readsRequires corespecificity2AHL2_STRequiresMESIqualificationL2 cache writes (includesspeculation)Requires corespecificity2EHL2_RqstsRequiresMESIqualificationL2 cache reference requests30HL2_Reject_CyclesRequiresMESIqualificationCycles L2 is busy and rejectingnew requests.Requires corespecificity, HWprefetchqualification32HL2_No_Request_CyclesRequiresMESIqualificationCycles there is no request toaccess L2.3AHEST_Trans_All00HAny Intel Enhanced SpeedStep(R)Technology transitionsVol.
3 A-47PERFORMANCE-MONITORING EVENTSTable A-4. Non-Architectural Performance Eventsin Intel Core Solo and Intel Core Duo Processors (Contd.)EventNum.Event MaskMnemonicUmaskValue3AHEST_Trans_All10HIntel Enhanced SpeedStepTechnology frequency transitions3BHThermal_TripC0HDuration in a thermal trip based on Use edgethe current core clocktrigger to countoccurrence3CHNonHlt_Ref_Cycles01HNon-halted bus cycles3CHSerial_Execution_Cycles02HNon-halted bus cycles of this coreexecuting code while the othercore is halted40HDCache_Cache_LDRequiresMESIqualificationL1 cacheable data read operations41HDCache_Cache_STRequiresMESIqualificationL1 cacheable data writeoperations42HDCache_Cache_LockRequiresMESIqualificationL1 cacheable lock read operationsto invalid state43HData_Mem_Ref01HL1 data read and writes ofcacheable and non-cacheabletypes44HData_Mem_Cache_Ref02HL1 data cacheable read and writeoperations45HDCache_Repl0FHL1 data cache line replacements46HDCache_M_Repl00HL1 data M-state cache lineallocated47HDCache_M_Evict00HL1 data M-state cache line evicted48HDCache_Pend_Miss00HWeighted cycles of L1 missoutstanding49HDtlb_Miss00HData references that missed TLB4BHSSE_PrefNta_Miss00HPREFETCHNTA missed all caches4BHSSE_PrefT1_Miss01HPREFETCHT1 missed all caches4BHSSE_PrefT2_Miss02HPREFETCHT2 missed all caches4BHSSE_NTStores_Miss03HSSE streaming store instructionmissed all cachesA-48 Vol.