Overview
To understand the error message, you must be familiar with these terms:
- Context – A context is a collection of CPU specific information that pertains to a specific process. The context includes the values of the CPU registers and memory management information.
- Context switch – A context switch occurs when an interrupt occurs. The system saves the context and restores the context of another process.
- Translation Look-aside Buffer (TLB) – The TLB is a table of keys and values that improve the performance of addressing virtual memory. This is part of the memory management information included in the context.
When an interrupt occurs, a context switch must be performed. Prior to loading a new context and loading a new TLB, the current TLB needs to be flushed or invalidated. This type of purple diagnostic screen occurs when the physical CPU does not perform this flush for a prolonged period of time.
The diagnostic information
This is an example of the diagnostic information that is included in the purple diagnostic screen:
VMware ESX Server [Releasebuild-52542]
PCPU 3 locked up. Failed to ack TLB invalidate.
gate=0x0 frame=0x343bd78 eip=0x61fafc cr2=0x0 cr3=0x13a83000 cr4=0x16c
eax=0x0 ebx=0x0 ecx=0x0 edx=0x0 es=0x0 ds=0x0
fs=0x0 gs=0x0 ebp=0x343bed4 esi=0x0 edi=0x0 err=0 ef=0x0
cpu 0 2673 vmm0:keys: cpu 1 2372 mks:dc02: CPU 2 1038 helper1-3: cpu 3 3012 vmm0:erpt:
cpu 4 3019 vmm0:keys: cpu 5 2652 vmm0:erpt: cpu 6 2832 vmm0:time: cpu 7 2394 vmm0:addc:
@BlueScreen: PCPU 3 locked up. Failed to ack TLB invalidate.
0x343bed4:[0x61fafc]_vLog+0x0(0x78cb60, 0x343bef0, 0x343bf10)
0x343bee4:[0x61fafc]_vLog+0x0(0x78cb60, 0x3, 0x1)
0x343bf10:[0x63fd00]TLBInvalidateFailed+0x90(0x1, 0xffffffff, 0x0)
0x343bf38:[0x640012]TLBDoInvalidate+0x27a(0xffffffff, 0xffffffff, 0x343bf74)
0x343bf48:[0x63fbb5]TLB_Flush+0x35(0x0, 0x0, 0x400)
0x343bf74:[0x65d878]XMapFlushDelayedUnmaps+0x70(0x0, 0x12130b4, 0x0)
0x343bfac:[0x6463e3]helpFunc+0x1ff(0x1, 0xc9256c, 0x0)
0x343bffc:[0x702bb8]CpuSched_StartWorld+0x11c(0x0, 0x0, 0x0)
0x343c000:[0x0](0x0, 0x0, 0x0)
VMK uptime: 210:15:14:32.718 TSC: 47315535316217757
cpu5:2602)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 3781 seconds. *may* be locked up
cpu5:2659)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 7621 seconds. *may* be locked up
cpu5:2644)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 15301 seconds. *may* be locked up
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1... using slot 1 of 1... log
From the preceding example, you can:
-
Identify the physical CPU that is misbehaving. In the this example, it is physical CPU 3:
PCPU 3 locked up.
-
See the length of time that you have waited for the PCPU to invalidate the TLB:
cpu5:2602)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 3781 seconds. *may* be locked up
cpu5:2659)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 7621 seconds. *may* be locked up
cpu5:2644)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 15301 seconds. *may* be locked up
The Failed to ack TLB Invalidate is caused by either a hardware or a software issue. For a list of articles that have a Failed to ack TLB invalidate as one of their symptoms, see the Additional Information section.
If you cannot resolve the issue, collect diagnostic information from the VMware ESX host and submit a support request.
For more information, see: