Decoding Machine Check Exception (MCE) output after a purple screen error (1005184)
- An ESX/ESXi host halts with a purple diagnostic screen.
- The purple diagnostic screen shows a message similar to:
- Machine Check Exception: Unable to continue
- Hardware (Machine) Error
- PCPU: 1 hardware errors seen since boot (1 corrected by hardware)
- When extracting the logs from the core dump you see messages similar to:
- ALERT: MCE: 171: Machine Check Exception: Bank x, Status nnnnnnnnnnn
- MC:PCPUn B:x S:nnnnnnnnnnn M:mmmmmmmmmmmm: A:aaaaaaaaaaa
- On AMD systems you may see a message which indicates a hardware issue, but an MCE does not occur. The message is similar to:
vmkernel: 72:03:47:16.847 cpu4:14403)MCE: 978: MCE not recoverable but did not generate an exception.
The machine check architecture is a mechanism within a CPU to detect and report hardware issues. When a problem is detected, a machine check exception (MCE) is thrown. If an MCE is thrown and a purple diagnostic screen is displayed, a hardware problem has caused it. There is no other way to generate an MCE.
When the system has faulted with a purple screen, capture the screen output then reboot the server and contact your hardware vendor. In the meantime, the information regarding the fault itself can be decoded to get a better idea of what may be happening.
MCE: 215: CMCI on cpu1 bank8: Status:0xd000008000310080 Misc:0x0 Addr:0x0: Valid.Overflow.Err enabled.MCE: 220: Status bits: "Memory Controller Error on Channel 0.
For more information, see the documentation by the CPU manufacturers:
- Intel - http://www.intel.com/products/processor/manuals/
- Chapters 15 and 16 of the Intel 64 and IA-32 Architectures Software Developer's Manual.
- AMD - http://developer.amd.com/documentation/guides/pages/default.aspx
- Chapter 9 of the AMD64 Architecture Programmer's Manual Volume 2: System Programming
- Chapter 5 of the BIOS and Kernel Developers Guide for AMD Athlon 64 Opteron Processors
Note: The preceding links were correct as of August 7, 2012. If you find the link is broken, provide feedback and a VMware employee will update the link.
- Enabling serial-line logging for an ESXi/ESXi host
- Collecting diagnostic information from an ESX or ESXi host that experiences a purple diagnostic screen
- Interpreting an ESX/ESXi host purple diagnostic screen
- Extracting the log file after an ESX or ESXi host fails with a purple screen error
- Determining VMware Software Version and Build Number