Volume 3B System Programming Guide_ Part 2 (794104), страница 77
Текст из файла (страница 77)
A high count of this event is good,since each automatic addition performedby the decoder saves a micro-op from theexecution units.To maximize the number of ESP additionsperformed automatically by the decoder,choose instructions that implicitly use theESP, such as PUSH, POP, CALL, and RETinstructions whenever possible.B0H00HSIMD_UOPS_EXECSIMD micro-ops This event counts all the SIMD micro-opsexecutedexecuted. It does not count MOVQ and(excludingMOVD stores from register to memory.stores)B1H00HSIMD_SAT_UOP_EXECSIMD saturated This event counts the number of SIMDarithmeticsaturated arithmetic micro-ops executed.micro-opsexecutedB3H01HSIMD_UOP_TYPE_EXEC.MULSIMD packedmultiply microops executedThis event counts the number of SIMDpacked multiply micro-ops executed.B3H02HSIMD_UOP_TYPE_EXEC.SHIFTSIMD packedshift micro-opsexecutedThis event counts the number of SIMDpacked shift micro-ops executed.B3H04HSIMD_UOP_TYPE_EXEC.PACKSIMD packmicro-opsexecutedThis event counts the number of SIMDpack micro-ops executed.B3H08HSIMD_UOP_TYPE_EXEC.UNPACKSIMD unpackmicro-opsexecutedThis event counts the number of SIMDunpack micro-ops executed.B3H10HSIMD_UOP_TYPE_EXEC.LOGICALSIMD packedlogical microops executedThis event counts the number of SIMDpacked logical micro-ops executed.B3H20HSIMD_UOP_TYPE_EXEC.ARITHMETICSIMD packedarithmeticmicro-opsexecutedThis event counts the number of SIMDpacked arithmetic micro-ops executed.A-30 Vol.
3PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueC0H00HEvent NameDefinitionINST_RETIRED.ANY_PInstructionsretiredDescription andCommentThis event counts the number ofinstructions that retire execution. Forinstructions that consist of multiple microops, this event counts the retirement ofthe last micro-op of the instruction.
Thecounter continue counting duringhardware interrupts, traps, and insideinterrupt handlers.INST_RETIRED.ANY_P is an architecturalperformance event.C0H01HINST_RETIRED.LOADSInstructionsretired, whichcontain a loadThis event counts the number ofinstructions retired that contain a loadoperation.C0H02HINST_RETIRED.STORESInstructionsretired, whichcontain a storeThis event counts the number ofinstructions retired that contain a storeoperation.C0H04HINST_RETIRED.OTHERInstructionsThis event counts the number ofretired, with no instructions retired that do not contain aload or storeload or a store operation.operationC1H01HX87_OPS_RETIRED.FXCHFXCHinstructionsretiredThis event counts the number of FXCHinstructions retired.
Modern compilersgenerate more efficient code and are lesslikely to use this instruction. If you obtain ahigh count for this event considerrecompiling the code.C1HFEHX87_OPS_RETIRED.ANYRetiredfloating-pointcomputationaloperations(precise event)This event counts the number of floatingpoint computational operations retired. Itcounts:• floating point computational operationsexecuted by the assist handler• sub-operations of complex floatingpoint instructions like transcendentalinstructionsVol. 3 A-31PERFORMANCE-MONITORING EVENTSTable A-3.
Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueEvent NameDefinitionDescription andCommentThis event does not count:• floating-point computational operationsthat cause traps or assists.• floating-point loads and stores.When this event is captured with theprecise event mechanism, the collectedsamples contain the address of theinstruction that was executed immediatelyafter the instruction that caused theevent.C2H01HUOPS_RETIRED.LD_IND_BRFused load+op This event counts the number of retiredor load+indirect micro-ops that fused a load with anotherbranch retired operation.
This includes:• Fusion of a load and an arithmeticoperation, such as with the followinginstruction: ADD EAX, [EBX] where thecontent of the memory locationspecified by EBX register is loaded,added to EXA register, and the result isstored in EAX.• Fusion of a load and a branch in anindirect branch operation, such as withthe following instructions:• JMP [RDI+200]• RET• Fusion decreases the number of microops in the processor pipeline. A highvalue for this event count indicates thatthe code is using the processorresources effectively.C2H02HUOPS_RETIRED.STD_STAFused storeaddress + dataretiredThis event counts the number of storeaddress calculations that are fused withstore data emission into one micro-op.Traditionally, each store operationrequired two micro-ops.This event counts fusion of retired microops only. Fusion decreases the number ofmicro-ops in the processor pipeline.
A highvalue for this event count indicates thatthe code is using the processor resourceseffectively.A-32 Vol. 3PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueC2H04HEvent NameDefinitionUOPS_RETIRED.MACRO_FUSIONRetiredinstructionpairs fused intoone micro-opDescription andCommentThis event counts the number of timesCMP or TEST instructions were fused witha conditional branch instruction into onemicro-op.
It counts fusion by retired microops only.Fusion decreases the number of micro-opsin the processor pipeline. A high value forthis event count indicates that the codeuses the processor resources moreeffectively.C2H07HUOPS_RETIRED.FUSEDFused microops retiredThis event counts the total number ofretired fused micro-ops.
The countsinclude the following fusion types:• Fusion of load operation with anarithmetic operation or with an indirectbranch (counted by eventUOPS_RETIRED.LD_IND_BR)• Fusion of store address and data(counted by eventUOPS_RETIRED.STD_STA)• Fusion of CMP or TEST instruction witha conditional branch instruction(counted by eventUOPS_RETIRED.MACRO_FUSION)Fusion decreases the number of micro-opsin the processor pipeline. A high value forthis event count indicates that the code isusing the processor resources effectively.C2H08HUOPS_RETIRED.NON_FUSEDNon-fusedmicro-opsretiredThis event counts the number of microops retired that were not fused.C2H0FHUOPS_RETIRED.ANYMicro-opsretiredThis event counts the number of microops retired.
The processor decodescomplex macro instructions into asequence of simpler micro-ops. Mostinstructions are composed of one or twomicro-ops.Vol. 3 A-33PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueEvent NameDefinitionDescription andCommentSome instructions are decoded into longersequences such as repeat instructions,floating point transcendental instructions,and assists. In some cases micro-opsequences are fused or whole instructionsare fused into one micro-op.See other UOPS_RETIRED events fordifferentiating retired fused and nonfused micro-ops.C3H01HMACHINE_NUKES.SMCSelf-ModifyingCode detectedThis event counts the number of timesthat a program writes to a code section.Self-modifying code causes a severpenalty in all Intel 64 and IA-32processors.C3H04HMACHINE_NUKES.MEM_ORDERExecutionpipeline restartdue to memoryorderingconflict ormemorydisambiguationmispredictionThis event counts the number of times thepipeline is restarted due to either multithreaded memory ordering conflicts ormemory disambiguation misprediction.A multi-threaded memory ordering conflictoccurs when a store, which is executed inanother core, hits a load that is executedout of order in this core but not yet retired.As a result, the load needs to be restartedto satisfy the memory ordering model.See Chapter 7, “Multiple-ProcessorManagement” in the Intel® 64 and IA-32Architectures Software Developer’sManual, Volume 3A.To count memory disambiguationmispredictions, use the eventMEMORY_DISAMBIGUATION.RESET.C4H00HBR_INST_RETIRED.
Retired branchANYinstructionsThis event counts the number of branchinstructions retired. This is an architecturalperformance event.C4H01HBR_INST_RETIRED. Retired branchPRED_NOT_instructionsTAKENthat werepredicted nottakenThis event counts the number of branchinstructions retired that were correctlypredicted to be not-taken.A-34 Vol. 3PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueC4H02HBR_INST_RETIRED. Retired branchMISPRED_NOT_instructionsTAKENthat weremispredictednot-takenC4H04HBR_INST_RETIRED.
Retired branch This event counts the number of branchPRED_TAKENinstructionsinstructions retired that were correctlythat werepredicted to be taken.predicted takenC4H08HBR_INST_RETIRED. Retired branchMISPRED_TAKENinstructionsthat weremispredictedtakenThis event counts the number of branchinstructions retired that weremispredicted and taken.C4H0CHBR_INST_RETIRED. Retired takenTAKENbranchinstructionsThis event counts the number of branchesretired that were taken.C5H00HBR_INST_RETIRED. RetiredMISPREDmispredictedbranchinstructions.(precise event)This event counts the number of retiredbranch instructions that weremispredicted by the processor.
A branchmisprediction occurs when the processorpredicts that the branch would be taken,but it is not, or vice-versa.C6H01HCYCLES_INT_MASKEDCycles duringwhichinterrupts aredisabledThis event counts the number of cyclesduring which interrupts are disabled.C6H02HCYCLES_INT_PENDING_AND_MASKEDCycles duringwhichinterrupts arepending anddisabledThis event counts the number of cyclesduring which there are pending interruptsbut interrupts are disabled.C7H01HSIMD_INST_RETIRED.PACKED_SINGLERetired SSEpacked-singleinstructionsThis event counts the number of SSEpacked-single instructions retired.C7H02HSIMD_INST_RETIRED.SCALAR_SINGLERetired SSEscalar-singleinstructionsThis event counts the number of SSEscalar-single instructions retired.Event NameDefinitionDescription andCommentThis event counts the number of branchinstructions retired that weremispredicted and not-taken.This is an architectural performance event.Vol.
3 A-35PERFORMANCE-MONITORING EVENTSTable A-3. Non-Architectural Performance Eventsin Processors Based on Intel Core Microarchitecture (Contd.)EventNumUmaskValueC7HDescription andCommentEvent NameDefinition04HSIMD_INST_RETIRED.PACKED_DOUBLERetired SSE2packed-doubleinstructionsThis event counts the number of SSE2packed-double instructions retired.C7H08HSIMD_INST_RETIRED.SCALAR_DOUBLERetired SSE2scalar-doubleinstructionsThis event counts the number of SSE2scalar-double instructions retired.C7H10HSIMD_INST_RETIRED.VECTORRetired SSE2vector integerinstructionsThis event counts the number of SSE2vector integer instructions retired.C7H1FHSIMD_INST_RETIRED.ANYRetiredStreaming SIMDinstructions(precise event)This event counts the overall number ofSIMD instructions retired. To count eachtype of SIMD instruction separately, usethe following events:• SIMD_INST_RETIRED.PACKED_SINGLE• SIMD_INST_RETIRED.SCALAR_SINGLE• SIMD_INST_RETIRED.PACKED_DOUBLE• SIMD_INST_RETIRED.SCALAR_DOUBLE• and SIMD_INST_RETIRED.VECTORWhen this event is captured with theprecise event mechanism, the collectedsamples contain the address of theinstruction that was executed immediatelyafter the instruction that caused theevent.C8H00HHW_INT_RCVHardwareinterruptsreceivedC9H00HITLB_MISS_RETIREDRetiredThis event counts the number of retiredinstructionsinstructions that missed the ITLB whenthat missed the they were fetched.ITLBA-36 Vol.