Intel 253668-032US User Manual

Page of 806
15-44   Vol. 3
MACHINE-CHECK ARCHITECTURE
mechanism to indicate the frequency of exceptions. A multiprocessing oper-
ating system stores the identity of the processor node incurring the excep-
tion using a unique identifier, such as the processor’s APIC ID (see Section 
10.9, “Handling Interrupts”). 
The basic algorithm given in Example 15-3 can be modified to provide more 
robust recovery techniques. For example, software has the flexibility to 
attempt recovery using information unavailable to the hardware. Specifi-
cally, the machine-check exception handler can, after logging carefully 
analyze the error-reporting registers when the error-logging routine reports 
an error that does not allow execution to be restarted. These recovery tech-
niques can use external bus related model-specific information provided 
with the error report to localize the source of the error within the system and 
determine the appropriate recovery strategy. 
15.10.4 Machine-Check 
Software 
Handler Guidelines for Error 
Recovery
15.10.4.1   Machine-Check Exception Handler for Error Recovery
When writing a machine-check exception (MCE) handler to support software 
recovery from Uncorrected Recoverable (UCR) errors, consider the 
following: 
When IA32_MCG_CAP [24] is zero, there are no recoverable errors supported 
and all machine-check are fatal exceptions. The logging of status and error 
information is therefore a baseline implementation requirement. 
When IA32_MCG_CAP [24] is 1, certain uncorrected errors called uncorrected 
recoverable (UCR) errors may be software recoverable. The handler can analyze 
the reported error information, and in some cases attempt to recover from the 
uncorrected error and continue execution.
For processors with DisplayFamily_DisplayModel encoding of 06H_EH and above, 
a MCA signal is broadcast to all logical processors in the system.  Due to the 
potentially shared machine check MSR resources among the logical processors 
on the same package/core, the MCE handler may be required to synchronize with 
the other processors that received a machine check error and serialize access to 
the machine check registers when analyzing, logging and clearing the 
information in the machine check registers.
The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the 
error information in the register is valid. If this flag is clear, the registers in that 
bank do not contain valid error information and should not be checked.
The MCE handler is primarily responsible for processing uncorrected errors. The 
UC flag in each IA32_MCi_Status register indicates whether the reported error