Volume 2 System Programming (794096), страница 72
Текст из файла (страница 72)
If this is the case, writes to bits [60:0] of this register are ignored and do not generate a fault.Software must check the Locked bit before writing into the thresholding register.This field is write-enabled by MSR C001_0015 Hardware Configuration Register[MCSTATUSWrEn].LVT Offset (LVTOFF)—Bits 55–52.
This field specifies the address of the APIC LVT entry todeliver the threshold counter interrupt. Software must initialize the APIC LVT entry beforeenabling the threshold counter to generate the APIC interrupt; otherwise, undefined operation(#UD) exception may be generated.APIC LVT address = (MCi_MISC[LvtOff] << 4) + 500hCounter Enable (CNTE)—Bit 51. When set to 1, counting of implementation-dependent errors isenabled; otherwise, counting is disabled.Interrupt Type (INTT)—Bits 50–49. The value of this field specifies the type of interrupt signaledwhen the value of the overflow bit changes from 0 to 1.- 00b = No interrupt- 01b = APIC-based interrupt- 10b = Reserved- 11b = ReservedOverflow (OF)—Bit 48.
The value of this field is maintained through a warm reset. This bit is setby hardware when the error counter increments to its maximum implementation-supported value(from FFFEh to FFFFh for the maximum implementation-supported value). This is defined as thethreshold level. When the overflow bit is set, the interrupt selected by the interrupt type field isgenerated. Software must reset this bit to zero in the interrupt handler routine when they update theerror counter.Error Counter (ERRCT)—Bits 47–32. This field is maintained through a warm reset.
The size ofthe threshold counter is implementation-dependent. Implementations with less than 16 bits fill themost significant unimplemented bits with zeros.Software enumerates the counter bits to discover the size of the counter and the threshold level(when counter increments to the maximum count implemented). Software sets the starting errorcount as follows:Starting error count = threshold level – desired software error count to cause overflowThe error counter is incremented by hardware when errors for the associated error counter arelogged. When this counter overflows, it stays at the maximum error count (with no rollover).Block pointer for additional MISC registers (BLKP)—Bits 31–24. This field is only valid whenvalid (VAL) bit is set.
When non-zero, this field is used to calculate a pointer to a contiguous MISC264Machine Check Mechanism24593—Rev. 3.13—July 2007AMD64 TechnologyMSR block as follows: MCi_MISC1 = (MCi_MISC0[BlkPtr] shifted left 3 bits) + C000_0400h.BlkPtr has the same value for all MCi_MISCj.For more information, see the appropriate BIOS and Kernel Developer’s Guide for the processor for itsimplementation of the AMD64 architecture.9.4Initializing the Machine-Check MechanismFollowing a processor reset, all machine-check error-reporting enable bits are disabled. Systemsoftware must enable these bits before machine-check errors can be reported. Generally, systemsoftware should initialize the machine-check mechanism using the following process:•••Execute the CPUID instruction and verify that the processor supports the machine-check exception(MCE) and machine-check registers (MCA).
MCE is supported when EDX bit 7 is set to 1, andMCA is supported when EDX bit 14 is set to 1. Software should not proceed with initializing themachine-check mechanism if the machine-check registers are not supported.If the machine-check registers are supported, system software should take the following steps:- Check to see if the MCG_CTL_P bit in the MCG_CAP register is set to 1. If it is, then theMCG_CTL register is supported by the processor.
If the MCG_CTL register is supported,software should set its enable bits to 1 for the machine-check features it uses. Software canload MCG_CTL with all 1s to enable all machine-check features.- Read the COUNT field from the MCG_CAP register to determine the number of errorreporting register banks supported by the processor. For each error-reporting register bank,software should set the enable bits to 1 in the MCi_CTL register for the error types it wants theprocessor to report. Software can load each MCi_CTL with all 1s to enable all error-reportingmechanisms.The error-reporting register banks are numbered from 0 to one less than the value found in theMCG_CAP.COUNT field. For example, if the COUNT field indicates five register banks aresupported, they are numbered 0 to 4.- For each error-reporting register bank, software should clear all status fields in theMCi_STATUS register by writing all 0s to the register.It is possible that valid error-status is already reported by the MCi_STATUS registers at thetime software clears them.
The status can reflect fatal errors recorded before a warm reset, orerrors recorded during the system power-up and boot process. Before clearing theMCi_STATUS registers, software should examine their contents and log any errors found.As a final step in the initialization process, system software should enable the machine-checkexception by setting CR4.MCE (bit 6) to 1.9.5Using Machine Check FeaturesSystem software can detect and handle machine-check errors using two methods:Machine Check Mechanism265AMD64 Technology••24593—Rev.
3.13—July 2007Software can periodically examine the machine-check status registers for reported errors, and logany errors found.Software can enable the machine-check exception (#MC). When an uncorrectable error occurs, theprocessor immediately transfers control to the machine-check exception handler. In this case,system software provides a machine-check exception handler that, at a minimum, logs detectederrors. The exception handler can be designed for a specific processor implementation or can begeneralized to work on multiple implementations.9.5.1 Handling Machine Check ExceptionsThe processor uses the interrupt control-transfer mechanism to invoke an exception handler after amachine-check exception occurs.
This requires system software to initialize the interrupt-descriptortable (IDT) with either an interrupt gate or a trap gate that references the interrupt handler. See“Legacy Protected-Mode Interrupt Control Transfers” on page 229 and “Long-Mode Interrupt ControlTransfers” on page 239 for more information on interrupt control transfers.At a minimum, the machine-check exception handler must be capable of logging errors for laterexamination. This can be a sufficient implementation for some handlers. More thorough exceptionhandler implementations can analyze the error to determine if it is unrecoverable, and whether it can berecovered in software.Machine-check exception handlers that attempt to correct unrecoverable errors must be thorough intheir analysis and their corrective actions. The following guidelines should be used when writing sucha handler:••••All status registers in the error-reporting register banks must be examined to identify the cause orcauses of the machine-check exception.
Read the COUNT field from MCG_CAP to determine thenumber of status registers supported by the processor. The status registers are numbered from 0 toone less than the value found in the MCG_CAP.COUNT field. For example, if the COUNT fieldindicates five status registers are supported, they are named MC0_STATUS to MC4_STATUS.Check the valid bit in each status register (MCi_STATUS.VAL). The MCi_STATUS register doesnot need to be examined when its valid bit is clear.Check the valid MCi_STATUS registers to see if error recovery is possible. Error recovery is notpossible when:- The processor-context corrupt bit (MCi_STATUS.PCC) is set to 1.- The error-overflow status bit (MCi_STATUS.OVER) is set to 1.
This bit indicates that morethan one machine-check error occurred, but only one error is reported by the status register.If error recovery is not possible, the handler should log the error information and return to theoperating system.Check the MCi_STATUS.UC bit to see if the processor corrected the error.
If UC=1, the processordid not correct the error, and the exception handler must correct the error before restarting theinterrupted program. If the handler cannot correct the error, it should log the error information andreturn to the operating system.266Machine Check Mechanism24593—Rev. 3.13—July 2007••••••AMD64 TechnologyWhen identifying the error condition, portable exception handlers should examine only theMCi_STATUS register MCA error-code field.
See “Error Codes” on page 260 for information oninterpreting this field.If the MCG_STATUS.RIPV bit is set to 1, the interrupted program can be restarted reliably at theinstruction-pointer address pushed onto the exception-handler stack. If RIPV=0, the interruptedprogram cannot be restarted reliably at that location, although it can be restarted at that location fordebugging purposes.When logging errors, particularly those that are not recoverable, check the MCG_STATUS.EIPVbit to see if the instruction-pointer address pushed onto the exception-handler stack is related to themachine-check error.