Volume 3B System Programming Guide_ Part 2 (794104), страница 22
Текст из файла (страница 22)
Performance monitoring capabilities available to Pentium 4 and Intel Xeon processors with the same values (see Section 18.11and Section 18.16) apply to the 64-bit Intel Xeon processor MP with an L3 cache.The level 3 cache is connected between the system bus and IOQ through additionalcontrol logic. See Figure 18-33.18-100 Vol. 3DEBUGGING AND PERFORMANCE MONITORING6\VWHP%XVL%864DQGL6134UG/HYHO&DFKHRUZD\L)6%,243URFHVVRU&RUH)URQWHQG([HFXWLRQ5HWLUHPHQW//Figure 18-33. Block Diagram of 64-bit Intel Xeon Processor MP with 8-MByte L3Additional performance monitoring capabilities and facilities unique to 64-bit IntelXeon processor MP with an L3 cache are described in this section. The facility formonitoring events consists of a set of dedicated model-specific registers (MSRs),each dedicated to a specific event.
Programming of these MSRs requires usingRDMSR/WRMSR instructions with 64-bit values.The lower 32-bits of the MSRs at addresses 107CC through 107D3 are treated as 32bit performance counter registers. These performance counters can be accessedusing RDPMC instruction with the index starting from 18 through 25. The EDXregister returns zero when reading these 8 PMCs.The performance monitoring capabilities consist of four events. These are:•IBUSQ event — This event detects the occurrence of micro-architecturalconditions related to the iBUSQ unit.
It provides two MSRs: MSR_IFSB_IBUSQ0and MSR_IFSB_IBUSQ1. Configure sub-event qualification and enable/disablefunctions using the high 32 bits of these MSRs. The low 32 bits act as a 32-bitevent counter. Counting starts after software writes a non-zero value to one ormore of the upper 32 bits. See Figure 18-34.Vol. 3 18-101DEBUGGING AND PERFORMANCE MONITORINGMSR_IFSB_IBUSQx, Addresses: 107CCH and 107CDH6360 59 58 57 56 5549 4846 45Reserved38 37 36 35 34 33 321 1SaturateFill_matchEviction_matchL3_state_matchSnoop_matchType_matchT1_matchT0_match31032 bit event countFigure 18-34.
MSR_IFSB_IBUSQx, Addresses: 107CCH and 107CDH•ISNPQ event — This event detects the occurrence of microarchitecturalconditions related to the iSNPQ unit. It provides two MSRs: MSR_IFSB_ISNPQ0and MSR_IFSB_ISNPQ1. Configure sub-event qualifications and enable/disablefunctions using the high 32 bits of the MSRs.
The low 32-bits act as a 32-bit eventcounter. Counting starts after software writes a non-zero value to one or more ofthe upper 32-bits. See Figure 18-35.18-102 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGMSR_IFSB_ISNPQx, Addresses: 107CEH and 107CFH6360 59 58 57 56 554846 45Reserved39 38 37 36 35 34 33 32SaturateL3_state_matchSnoop_matchType_matchAgent_matchT1_matchT0_match31032 bit event countFigure 18-35. MSR_IFSB_ISNPQx, Addresses: 107CEH and 107CFH•EFSB event — This event can detect the occurrence of micro-architecturalconditions related to the iFSB unit or system bus.
It provides two MSRs:MSR_EFSB_DRDY0 and MSR_EFSB_DRDY1. Configure sub-event qualificationsand enable/disable functions using the high 32 bits of the 64-bit MSR. The low32-bit act as a 32-bit event counter. Counting starts after software writes a nonzero value to one or more of the qualification bits in the upper 32-bits of the MSR.See Figure 18-36.Vol. 3 18-103DEBUGGING AND PERFORMANCE MONITORINGMSR_EFSB_DRDYx, Addresses: 107D0H and 107D1H6360 59 58 57 56 5550 49 48Reserved39 38 37 36 35 34 33 32SaturateOtherOwn31032 bit event countFigure 18-36. MSR_EFSB_DRDYx, Addresses: 107D0H and 107D1H•IBUSQ Latency event — This event accumulates weighted cycle counts forlatency measurement of transactions in the iBUSQ unit.
The count is enabled bysetting MSR_IFSB_CTRL6[bit 26] to 1; the count freezes after software setsMSR_IFSB_CTRL6[bit 26] to 0. MSR_IFSB_CNTR7 acts as a 64-bit eventcounter for this event. See Figure 18-37.18-104 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGMSR_IFSB_CTL6 Address: 107D2H6359057EnableReservedMSR_IFSB_CNTR7 Address: 107D3H06364 bit event countFigure 18-37. MSR_IFSB_CTL6, Address: 107D2H;MSR_IFSB_CNTR7, Address: 107D3H18.20PERFORMANCE MONITORING ON DUAL-CORE INTELXEON PROCESSOR 7100 SERIESThe Dual-Core Intel Xeon processor 7100 Series have a CPUID signature of family[0FH], model [06H] and a unified L3 cache shared between two cores. Each core inan Intel Xeon processor 7100 series supports Intel Hyper-Threading Technology,providing two logical processors per core.Intel Xeon processor 7100 series are based on Intel NetBurst microarchitecture, butthe IOQ logic in each processor core is replaced with a Simple Direct Interface (SDI)logic.
The L3 cache is connected between the system bus and the SDI through additional control logic. See Figure 18-38.Almost all of the performance monitoring capabilities available to processors with thesame CPUID signatures (see Section 18.11 and Section 18.16) apply to the IntelXeon processor 7100 series. The IOQ_allocation and IOQ_active_entries events arenot supported. Additional performance monitoring capabilities available to Intel Xeonprocessor 7100 series are described in this section.Vol.
3 18-105DEBUGGING AND PERFORMANCE MONITORINGD u a l-C o r e In te l X e o n P r o c e s s o r 7 1 0 0 S e r ie sS y s te m B u sG B S Q , G S N P Q , G IN T Q , ..3 rd L e v e l C a c h e1 6 o r 8 -w a yS D I – B S Q and SN P QS D I – B S Q and S N P QP ro c e s s o r C o reP ro c e s s o r C o reFigure 18-38. Block Diagram of Intel Xeon Processor 7100 SeriesThe facility for monitoring events consists of a set of dedicated model-specificregisters (MSRs). There are eight event select/counting MSRs that are dedicated tocounting events associated with specified microarchitectural conditions. Programming of these MSRs requires using RDMSR/WRMSR instructions with 64-bit values.In addition, an MSR MSR_EMON_L3_GL_CTL provides simplified interface to controlfreezing, resetting, re-enabling operation of any combination of these eventselect/counting MSRs.The eight MSRs dedicated to count occurrences of specific conditions are furtherdivided to count three sub-classes of microarchitectural conditions:•Two MSRs (MSR_EMON_L3_CTR_CTL0 and MSR_EMON_L3_CTR_CTL1) arededicated to counting GBSQ events.
Up to two GBSQ events can be programmedand counted simultaneously.•Two MSRs (MSR_EMON_L3_CTR_CTL2 and MSR_EMON_L3_CTR_CTL3) arededicated to counting GSNPQ events. Up to two GBSQ events can beprogrammed and counted simultaneously.•Four MSRs (MSR_EMON_L3_CTR_CTL4, MSR_EMON_L3_CTR_CTL5,MSR_EMON_L3_CTR_CTL6, and MSR_EMON_L3_CTR_CTL7) are dedicated tocounting external bus operations.The bit fields in each of eight MSRs share the following common characteristics:18-106 Vol. 3DEBUGGING AND PERFORMANCE MONITORING•Bits 63:32 is the event control field that includes an event mask and other bitfields that control counter operation. The event mask field specifies details of themicroarchitectural condition, and its definition differs across GBSQ, GSNPQ, FSB.•Bits 31:0 is the event count field. If the specified condition is met during eachrelevant clock domain of the event logic, the matched condition signals thecounter logic to increment the associated event count field.
The lower 32-bits ofthese 8 MSRs at addresses 107CC through 107D3 are treated as 32 bitperformance counter registers. These performance counters can be accessedusing RDPMC instruction with the index starting from 18 through 25. The EDXregister returns zero when reading these 8 PMCs.18.20.1 GBSQ Event InterfaceThe layout of MSR_EMON_L3_CTR_CTL0 and MSR_EMON_L3_CTR_CTL1 is given inFigure 18-39.
Counting starts after software writes a non-zero value to one or moreof the upper 32 bits.The event mask field (bits 58:32) consists of the following eight attributes:•Agent_Select (bits 35:32): Each bit specifies a logical processor in the physicalpackage. The lower two bits corresponds to two logical processors in the firstprocessor core, the upper two bits corresponds to two logical processors in thesecond processor core. 0FH encoding matches transactions from any logicalprocessor.Vol. 3 18-107DEBUGGING AND PERFORMANCE MONITORINGMSR_EMON_L3_CTR_CTL0/1, Addresses: 107CCH/107CDH6360 59 58 57 56 55 54 5347 4644 43Reserved38 37 36 3532SaturateCross_snoopFill_evictionCore_selectL2_stateSnoop_matchType_matchData_flowAgent_select31032 bit event countFigure 18-39.
MSR_EMON_L3_CTR_CTL0/1, Addresses: 107CCH/107CDH•Data_Flow (bits 37:36): Bit 36 specifies demand transactions, bit 37 specifiesprefetch transactions.•Type_Match (bits 43:38): Specifies transaction types. If all six bits are set, eventcount will include all transaction types.•Snoop_Match: (bits 46:44): The three bits specify (in ascending bit position)clean snoop result, HIT snoop result, and HITM snoop results respectively.••L2_State (bits 53:47): Each bit specifies an L2 coherency state.Core_Select (bits 55:54): The valid encodings are— 00B: Match transactions from any core in the physical package— 01B: Match transactions from this core only— 10B: Match transactions from the other core in the physical package— 11B: Match transaction from both cores in the physical package•Fill_Eviction (bits 57:56): The valid encodings are— 00B: Match any transactions— 01B: Match transactions that fill L218-108 Vol.
3DEBUGGING AND PERFORMANCE MONITORING— 10B: Match transactions that fill L2 without an eviction— 11B: Match transaction fill L2 with an eviction•Cross_Snoop (bit 58): The encodings are\— 0B: Match any transactions— 1B: Match cross snoop transactionsFor each counting clock domain, if all eight attributes match, event logic signals toincrement the event count field.18.20.2 GSNPQ Event InterfaceThe layout of MSR_EMON_L3_CTR_CTL2 and MSR_EMON_L3_CTR_CTL3 is given inFigure 18-40. Counting starts after software writes a non-zero value to one or moreof the upper 32 bits.The event mask field (bits 58:32) consists of the following six attributes:•Agent_Select (bits 37:32): Each of the lowest 4 bits specifies a logical processorin the physical package. The lowest two bits corresponds to two logicalprocessors in the first processor core, the next two bits corresponds to two logicalprocessors in the second processor core.