Understanding a "Failed to ack TLB invalidate" purple diagnostic screen
search cancel

Understanding a "Failed to ack TLB invalidate" purple diagnostic screen

book

Article ID: 324947

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
A purple diagnostic screen that reports information similar to:

  • PCPU 3 locked up. Failed to ack TLB invalidate.
    @BlueScreen: PCPU 3 locked up. Failed to ack TLB invalidate.

  • cpu34:9213)VMware ESXi 5.0.0 [Releasebuild-702118 x86_64] PCPU 18 locked up. Failed to ack TLB invalidate (total of 5 locked up, PCPU(s): 0,10,11,16,18).cpu34:9213)cr0=0x80010031 cr2=0x29bbd000 cr3=0x47aa000 cr4=0x2768

Note: If you encounter a purple diagnostic screen that does not match these symptoms, see Interpreting an ESX host purple diagnostic screen (1004250).


Environment

VMware ESXi 4.0.x Installable
VMware ESX Server 3.0.x
VMware vSphere ESXi 6.5
VMware ESX Server 3.5.x
VMware vSphere ESXi 5.5
VMware ESXi 4.1.x Installable
VMware ESX 4.0.x
VMware vSphere ESXi 6.0
VMware ESXi 4.0.x Embedded
VMware vSphere ESXi 5.0
VMware vSphere ESXi 5.1
VMware ESXi 4.1.x Embedded
VMware ESX 4.1.x
VMware ESXi 3.5.x Embedded
VMware ESXi 3.5.x Installable

Resolution

Overview

To understand the error message, you must be familiar with these terms:
  • Context – A context is a collection of CPU specific information that pertains to a specific process. The context includes the values of the CPU registers and memory management information.
  • Context switch – A context switch occurs when an interrupt occurs. The system saves the context and restores the context of another process.
  • Translation Look-aside Buffer (TLB) – The TLB is a table of keys and values that improve the performance of addressing virtual memory. This is part of the memory management information included in the context.
When an interrupt occurs, a context switch must be performed. Prior to loading a new context and loading a new TLB, the current TLB needs to be flushed or invalidated. This type of purple diagnostic screen occurs when the physical CPU does not perform this flush for a prolonged period of time.

The diagnostic information

This is an example of the diagnostic information that is included in the purple diagnostic screen:
VMware ESX Server [Releasebuild-52542]
PCPU 3 locked up. Failed to ack TLB invalidate.
gate=0x0 frame=0x343bd78 eip=0x61fafc cr2=0x0 cr3=0x13a83000 cr4=0x16c
eax=0x0 ebx=0x0 ecx=0x0 edx=0x0 es=0x0 ds=0x0
fs=0x0 gs=0x0 ebp=0x343bed4 esi=0x0 edi=0x0 err=0 ef=0x0
cpu 0 2673 vmm0:keys: cpu 1 2372 mks:dc02: CPU 2 1038 helper1-3: cpu 3 3012 vmm0:erpt:
cpu 4 3019 vmm0:keys: cpu 5 2652 vmm0:erpt: cpu 6 2832 vmm0:time: cpu 7 2394 vmm0:addc:
@BlueScreen: PCPU 3 locked up. Failed to ack TLB invalidate.
0x343bed4:[0x61fafc]_vLog+0x0(0x78cb60, 0x343bef0, 0x343bf10)
0x343bee4:[0x61fafc]_vLog+0x0(0x78cb60, 0x3, 0x1)
0x343bf10:[0x63fd00]TLBInvalidateFailed+0x90(0x1, 0xffffffff, 0x0)
0x343bf38:[0x640012]TLBDoInvalidate+0x27a(0xffffffff, 0xffffffff, 0x343bf74)
0x343bf48:[0x63fbb5]TLB_Flush+0x35(0x0, 0x0, 0x400)
0x343bf74:[0x65d878]XMapFlushDelayedUnmaps+0x70(0x0, 0x12130b4, 0x0)
0x343bfac:[0x6463e3]helpFunc+0x1ff(0x1, 0xc9256c, 0x0)
0x343bffc:[0x702bb8]CpuSched_StartWorld+0x11c(0x0, 0x0, 0x0)
0x343c000:[0x0](0x0, 0x0, 0x0)
VMK uptime: 210:15:14:32.718 TSC: 47315535316217757
cpu5:2602)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 3781 seconds. *may* be locked up
cpu5:2659)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 7621 seconds. *may* be locked up
cpu5:2644)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 15301 seconds. *may* be locked up
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1... using slot 1 of 1... log
From the preceding example, you can:
  • Identify the physical CPU that is misbehaving. In the this example, it is physical CPU 3:

    PCPU 3 locked up.

  • See the length of time that you have waited for the PCPU to invalidate the TLB:

    cpu5:2602)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 3781 seconds. *may* be locked up
    cpu5:2659)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 7621 seconds. *may* be locked up
    cpu5:2644)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 15301 seconds. *may* be locked up

You must extract the logs that led to the purple diagnostic screen and examine it for a potential cause. To extract the logs, see Extracting the log file after an ESX or ESXi host fails with a purple screen error (1006796).
The Failed to ack TLB Invalidate is caused by either a hardware or a software issue. For a list of articles that have a Failed to ack TLB invalidate as one of their symptoms, see the Additional Information section.
If you cannot resolve the issue, collect diagnostic information from the VMware ESX host and submit a support request.

For more information, see:


Additional Information

If you have a Failed to ack TLB invalidate purple diagnostic screen that exactly matches the symptoms in VMware ESX 3.5, Patch ESX350-200904403-BG: Updates bnx2 Driver for Broadcom (1010128), follow the instructions in that article.

For more information regarding a common cause for the Failed to ack TLB invalidate purple diagnostic screen, see An ESXi 5.x host running on HP server fails with a purple diagnostic screen and the error: hpsa_update_scsi_devices or detect_controller_lockup_thread (2075978).

Failed to ack TLB invalidate が発生したパープル スクリーンについて
"hpsa_update_scsi_devices or detect_controller_lockup_thread" PSOD error
了解“无法确认 TLB 是否失效”紫色诊断屏幕(紫屏)
Grundlegendes zum violetten Diagnosebildschirm „Failed to ack TLB invalidate“