Volume 3B System Programming Guide_ Part 2 (794104), страница 73
Текст из файла (страница 73)
Loads thatmiss that DTLB0 and hit the DTLB1 canincur two-cycle penalty.08H08HDTLB_MISSES.MISS_STTLB misses due This event counts the number of Datato storeTable Lookaside Buffer (DTLB) misses dueoperationsto store operations.This count includes misses detected as aresult of speculative accesses. Addresstranslation for store operations isperformed in the DTLB1.09H01HA-8 Vol. 3MEMORY_DISAMBIGUATION.RESETMemoryThis event counts the number of cyclesdisambiguation during which memory disambiguationreset cyclesmisprediction occurs. As a result theexecution pipeline is cleaned andexecution of the mispredicted loadinstruction and all succeeding instructionsrestarts.PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueEvent NameDefinitionDescription andCommentThis event occurs when the data addressaccessed by a load instruction, collidesinfrequently with preceding stores, butusually there is no collision.
It happensrarely, and may have a penalty of about 20cycles.09H02HMEMORY_DISAMBI Number ofGUATION.SUCCESS loadssuccessfullydisambiguated.This event counts the number of loadoperations that were successfullydisambiguated. Loads are preceded by astore with an unknown address, but theyare not blocked.0CH01HPAGE_WALKS.COUNTThis event counts the number of pagewalks executed due to either a DTLB orITLB miss.Number ofpage-walksexecutedThe page walk duration,PAGE_WALKS.CYCLES, divided by numberof page walks is the average duration of apage walk. The average can hint whethermost of the page-walks are satisfied bythe caches or cause an L2 cache miss.0CH02HPAGE_WALKS.CYCLESDuration ofpage-walks incore cyclesThis event counts the duration of pagewalks in core cycles.
The paging mode inuse typically affects the duration of pagewalks.Page walk duration divided by number ofpage walks is the average duration ofpage-walks. The average can hint atwhether most of the page-walks aresatisfied by the caches or cause an L2cache miss.10H00HFP_COMP_OPS_EXEFloating pointcomputationalmicro-opsexecutedThis event counts the number of floatingpoint computational micro-ops executed.11H00HFP_ASSISTFloating pointassistsThis event counts the number of floatingpoint operations executed that requiredmicro-code assist intervention. Assists arerequired in the following cases:• Streaming SIMD Extensions (SSE)instructions:Vol. 3 A-9PERFORMANCE-MONITORING EVENTSTable A-3.
Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueEvent NameDefinitionDescription andComment• Denormal input when the DAZ(Denormals Are Zeros) flag is off• Underflow result when the FTZ (FlushTo Zero) flag is off• X87 instructions:• NaN or denormal are loaded to aregister or used as input from memory• Division by 0• Underflow output12H00HMULMultiplyoperationsexecutedThis event counts the number of multiplyoperations executed. This includes integeras well as floating point multiplyoperations.13H00HDIVDivideoperationsexecutedThis event counts the number of divideoperations executed.
This includes integerdivides, floating point divides and squareroot operations executed.14H00HCYCLES_DIV_BUSYCycles thedivider busyThis event counts the number of cyclesthe divider is busy executing divide orsquare root operations. The divide can beinteger, X87 or Streaming SIMDExtensions (SSE). The square rootoperation can be either X87 or SSE.18H00HIDLE_DURING_DIVCycles thedivider is busyand all otherexecution unitsare idle.This event counts the number of cyclesthe divider is busy (with a divide or asquare root operation) and no otherexecution unit or load operation is inprogress.Load operations are assumed to hit the L1data cache.
This event considers onlymicro-ops dispatched after the dividerstarted operating.19H00HA-10 Vol. 3DELAYED_BYPASS.FPDelayed bypass This event counts the number of timesto FP operation floating point operations use dataimmediately after the data was generatedby a non-floating point execution unit.Such cases result in one penalty cycle dueto data bypass between the units.PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValue19H19HDescription andCommentEvent NameDefinition01HDELAYED_BYPASS.SIMDDelayed bypass This event counts the number of timesto SIMDSIMD operations use data immediatelyoperationafter the data was generated by a nonSIMD execution unit.
Such cases result inone penalty cycle due to data bypassbetween the units.02HDELAYED_BYPASS.LOADDelayed bypass This event counts the number of delayedto loadbypass penalty cycles that a loadoperationoperation incurred.When load operations use dataimmediately after the data was generatedby an integer execution unit, they may(pending on certain dynamic internalconditions) incur one penalty cycle due todelayed data bypass between the units.21HSeeTable18-7L2_ADS.(Core)Cycles L2address bus isin useThis event counts the number of cyclesthe L2 address bus is being used foraccesses to the L2 cache or bus queue. Itcan count occurrences for this core or bothcores.23HSeeTable18-7L2_DBUS_BUSY_RD.(Core)Cycles the L2transfers datato the coreThis event counts the number of cyclesduring which the L2 data bus is busytransferring data from the L2 cache to thecore.
It counts for all L1 cache misses (dataand instruction) that hit the L2 cache.This event can count occurrences for thiscore or both cores.24HCombinedmaskfromTable18-7andTable18-9L2_LINES_IN.(Core, Prefetch)L2 cachemissesThis event counts the number of cachelines allocated in the L2 cache. Cache linesare allocated in the L2 cache as a result ofrequests from the L1 data and instructioncaches and the L2 hardware prefetchersto cache lines that are missing in the L2cache.This event can count occurrences for thiscore or both cores.
It can also countdemand requests and L2 hardwareprefetch requests together or separately.Vol. 3 A-11PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValue25HSeeTable18-7Event NameDefinitionL2_M_LINES_IN.(Core)L2 cache linemodificationsDescription andCommentThis event counts whenever a modifiedcache line is written back from the L1 datacache to the L2 cache.This event can count occurrences for thiscore or both cores.26H27H28HSeeTable18-7andTable18-9L2_LINES_OUT.(Core, Prefetch)SeeTable18-7andTable18-9L2_M_LINES_OUT.( Modified linesCore, Prefetch)evicted fromthe L2 cacheCombinedmaskfromTable18-7andTable18-10L2_IFETCH.(Core,Cache Line State)A-12 Vol.
3L2 cache linesevictedThis event counts the number of L2 cachelines evicted.This event can count occurrences for thiscore or both cores. It can also countevictions due to demand requests and L2hardware prefetch requests together orseparately.This event counts the number of L2modified cache lines evicted. These linesare written back to memory unless theyalso exist in a modified-state in one of theL1 data caches.This event can count occurrences for thiscore or both cores. It can also countevictions due to demand requests and L2hardware prefetch requests together orseparately.L2 cacheableinstructionfetch requestsThis event counts the number ofinstruction cache line requests from theIFU. It does not include fetch requestsfrom uncacheable memory. It does notinclude ITLB miss accesses.This event can count occurrences for thiscore or both cores.
It can also countaccesses to cache lines at different MESIstates.PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValue29HCombin L2_LD.(Core,ed mask Prefetch, CachefromLine State)Table18-7,Table18-9,andTable18-102AH2BH2EHEvent NameDefinitionL2 cache readsDescription andCommentThis event counts L2 cache read requestscoming from the L1 data cache and L2prefetchers.The event can count occurrences:• for this core or both cores• due to demand requests and L2hardware prefetch requests together orseparately• of accesses to cache lines at differentMESI statesSeeTable18-7andTable18-10L2_ST.(Core, Cache L2 storeLine State)requestsSeeTable18-7andTable18-10L2_LOCK.(Core,Cache Line State)SeeTable18-7,Table18-9,andTable18-10L2_RQSTS.(Core,Prefetch, CacheLine State)This event counts all store operations thatmiss the L1 data cache and request thedata from the L2 cache.The event can count occurrences for thiscore or both cores.
It can also countaccesses to cache lines at different MESIstates.L2 lockedaccessesThis event counts all locked accesses tocache lines that miss the L1 data cache.The event can count occurrences for thiscore or both cores. It can also countaccesses to cache lines at different MESIstates.L2 cacherequestsThis event counts all completed L2 cacherequests. This includes L1 data cachereads, writes, and locked accesses, L1 dataprefetch requests, instruction fetches, andall L2 hardware prefetch requests.This event can count occurrences:• for this core or both cores.• due to demand requests and L2hardware prefetch requests together,or separately• of accesses to cache lines at differentMESI statesVol. 3 A-13PERFORMANCE-MONITORING EVENTSTable A-3.
Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValue2EH41HEvent NameDefinitionDescription andCommentL2_RQSTS.SELF.DEMAND.I_STATEL2 cachedemandrequests fromthis core thatmissed the L2This event counts all completed L2 cachedemand requests from this core that missthe L2 cache.