Volume 3B System Programming Guide_ Part 2 (794104), страница 6
Текст из файла (страница 6)
LBR MSR Branch Record Layout for the Pentium 4and Intel Xeon Processor FamilyAdditional information is saved if an exception or interrupt occurs in conjunction witha branch instruction. If a branch instruction generates a trap type exception, twobranch records are stored in the LBR stack: a branch record for the branch instructionfollowed by a branch record for the exception.If a branch instruction generates a fault type exception, a branch record is stored inthe LBR stack for the exception, but not for the branch instruction itself. Here, thelocation of the branch instruction can be determined from the CS and EIP registers inthe exception stack frame that is written by the processor onto the stack.Vol. 3 18-23DEBUGGING AND PERFORMANCE MONITORINGIf a branch instruction is immediately followed by an interrupt, a branch record isstored in the LBR stack for the branch instruction followed by a record for theinterrupt.18.6.3.1LBR Stack and Intel® 64 ProcessorsIn Intel 64 architecture, LBR MSRs are 64-bits.
If IA-32e mode is disabled, only thelower 32-bits are accessible. If IA-32e mode is enabled, the processor writes 64-bitvalues into the MSR.In 64-bit mode, last branch records store 64-bit addresses; in compatibility mode,the upper 32-bits of last branch records are cleared.18.6.4Monitoring Branches, Exceptions, and InterruptsWhen the LBR flag in the MSR_DEBUGCTLA MSR is set, the processor automaticallybegins recording branch records for taken branches, interrupts, and exceptions(except for debug exceptions) in the LBR stack MSRs.When the processor generates a a debug exception (#DB), it automatically clears theLBR flag before executing the exception handler.
This action does not clear previouslystored LBR stack MSRs. The branch record for the last four taken branches, interruptsand/or exceptions are retained for analysis.A debugger can use the linear addresses in the LBR stack to reset breakpoints in thebreak-point address registers (DR0 through DR3). This allows a backward trace fromthe manifestation of a articular bug toward its source.If the LBR flag is cleared and TR flag in the MSR_DEBUGCTLA MSR remains set, theprocessor will continue to update LBR stack MSRs. This is because BTM informationmust be generated from entries in the LBR stack (see 14.5.5).
A #DB does not automatically clear the TR flag.18.6.5Single-Stepping on Branches, Exceptions, and InterruptsWhen software sets both the BTF flag in the MSR_DEBUGCTLA MSR and the TF flag inthe EFLAGS register, the processor generates a single-step debug exception the nexttime it takes a branch, services an interrupt, or generates an exception. This mechanism allows the debugger to single-step on control transfers caused by branches,interrupts, and exceptions. This “control-flow single stepping” helps isolate a bug toa particular block of code before instruction single-stepping further narrows thesearch. If the BTF flag is set when the processor generates a debug exception, theprocessor clears the BTF flag along with the TF flag.
The debugger must reset the BTFand TF flags before resuming program execution to continue control-flow single stepping.18-24 Vol. 3DEBUGGING AND PERFORMANCE MONITORING18.6.6Branch Trace MessagesSetting The TR flag in the MSR_DEBUGCTLA (see Figure 18-5), IA32_DEBUG (seeFigure 18-7), or MSR_DEBUGB (see Figure 18-9) MSR enables branch tracemessages (BTMs). Thereafter, when the processor detects a branch, exception, orinterrupt, it sends a branch record out on the system bus as a BTM.
A debuggingdevice that is monitoring the system bus can read these messages and synchronizeoperations with taken branch, interrupt, and exception events.When interrupts or exceptions occur in conjunction with a taken branch, additionalBTMs are sent out on the bus, as described in Section 18.6.4, “Monitoring Branches,Exceptions, and Interrupts.”Setting this flag (BTS) alone will greatly reduces the performance of the processor.CPL-qualified last branch recording mechanism can help mitigate the performanceimpact of logging branch trace messages. See Section 18.6.1, “CPL-Qualified LastBranch Recording Mechanism.”Unlike the P6 family processors, the Pentium 4 and Intel Xeon processors can collectbranch records in the LBR stack MSRs while at the same time sending BTMs out onthe system bus when both the TR and LBR flags are set in the MSR_DEBUGCTLAMSR.18.6.7Last Exception RecordsThe Pentium 4 and Intel Xeon processors provide two 32 bit MSRs (theMSR_LER_TO_LIP and the MSR_LER_FROM_LIP MSRs) that duplicate the functionsof the LastExceptionToIP and LastExceptionFromIP MSRs found in the P6 familyprocessors.
The MSR_LER_TO_LIP and MSR_LER_FROM_LIP MSRs contain a branchrecord for the last branch that the processor took prior to an exception or interruptbeing generated.18.6.7.1Last Exception Records and Intel 64 ArchitectureIn Intel 64 architecture, the MSRs that store last exception records are 64-bits. IfIA-32e mode is disabled, only the lower 32-bits are accessible.
If IA-32e mode isenabled, the processor writes 64-bit values into the MSR. In 64-bit mode, last exception records stores 64-bit addresses; in compatibility mode, the upper 32-bits of lastexception records are cleared.18.6.8Branch Trace Store (BTS)A trace of taken branches, interrupts, and exceptions is useful for debugging code byproviding a method of determining the decision path taken to reach a particular codelocation. The Pentium 4 and Intel Xeon processors provide a mechanism forcapturing records of taken branches, interrupts, and exceptions and saving them inthe last branch record (LBR) stack MSRs and/or sending them out onto the systemVol.
3 18-25DEBUGGING AND PERFORMANCE MONITORINGbus as BTMs. The branch trace store (BTS) mechanism provides the additional capability of saving the branch records in a memory-resident BTS buffer, which is part ofthe DS save area. The BTS buffer can be configured to be circular so that the mostrecent branch records are always available or it can be configured to generate aninterrupt when the buffer is nearly full so that all the branch records can be saved.See Section 18.15.5, “DS Save Area.”18.6.8.1Detection of the BTS FacilitiesThe DS feature flag (bit 21) returned by the CPUID instruction indicates (when set)the availability of the DS mechanism in the processor, which supports the BTS (andPEBS) facilities. When this bit is set, the following BTS facilities are available:•The BTS_UNAVAILABLE flag in the IA32_MISC_ENABLE MSR indicates (whenclear) the availability of the BTS facilities, including the ability to set the BTS andBTINT bits in the MSR_DEBUGCTLA MSR.•The IA32_DS_AREA MSR can be programmed to point to the DS save area.18.6.8.2Setting Up the DS Save AreaTo save branch records with the BTS buffer, the DS save area must first be set up inmemory as described in the following procedure.
See Section 18.6.8.3, “Setting Upthe BTS Buffer,” and Section 18.15.8.3, “Setting Up the PEBS Buffer,” for instructionsfor setting up a BTS buffer and/or a PEBS buffer, respectively, in the DS save area:1. Create the DS buffer management information area in memory (see Section18.15.5, “DS Save Area,” and Section 18.15.5.1, “DS Save Area and IA-32e ModeOperation”). Also see the additional notes in this section.2. Write the base linear address of the DS buffer management area into theIA32_DS_AREA MSR.3. Set up the performance counter entry in the xAPIC LVT for fixed delivery andedge sensitive. See Section 8.5.1, “Local Vector Table.”4.
Establish an interrupt handler in the IDT for the vector associated with theperformance counter entry in the xAPIC LVT.5. Write an interrupt service routine to handle the interrupt. See Section 18.6.8.5,“Writing the DS Interrupt Service Routine.”The following restrictions should be applied to the DS save area.•The three DS save area sections should be allocated from a non-paged pool, andmarked accessed and dirty. It is the responsibility of the operating system tokeep the pages that contain the buffer present and to mark them accessed anddirty. The implication is that the operating system cannot do “lazy” page-tableentry propagation for these pages.•The DS save area can be larger than a page, but the pages must be mapped tocontiguous linear addresses.
The buffer may share a page, so it need not be18-26 Vol. 3DEBUGGING AND PERFORMANCE MONITORINGaligned on a 4-KByte boundary. For performance reasons, the base of the buffermust be aligned on a doubleword boundary and should be aligned on a cache lineboundary.•It is recommended that the buffer size for the BTS buffer and the PEBS buffer bean integer multiple of the corresponding record sizes.•The precise event records buffer should be large enough to hold the number ofprecise event records that can occur while waiting for the interrupt to beserviced.•The DS save area should be in kernel space.
It must not be on the same page ascode, to avoid triggering self-modifying code actions.•There are no memory type restrictions on the buffers, although it isrecommended that the buffers be designated as WB memory type forperformance considerations.•Either the system must be prevented from entering A20M mode while DS savearea is active, or bit 20 of all addresses within buffer bounds must be 0.•Pages that contain buffers must be mapped to the same physical addresses for allprocesses, such that any change to control register CR3 will not change the DSaddresses.•The DS save area is expected to used only on systems with an enabled APIC. TheLVT Performance Counter entry in the APCI must be initialized to use an interruptgate instead of the trap gate.18.6.8.3Setting Up the BTS BufferThree flags in the MSR_DEBUGCTLA MSR (see Table 18-4), IA32_DEBUGCTL (seeFigure 18-7), or MSR_DEBUGCTLB (see Figure 18-9) control the generation ofbranch records and storing of them in the BTS buffer; these are TR, BTS, and BTINT.The TR flag enables the generation of BTMs.