PCIe hotplug: ESX host may crash when PCIe NVMe device(s) surprise hot removed and hot inserted back quickly ( < 1 minute)
search cancel

PCIe hotplug: ESX host may crash when PCIe NVMe device(s) surprise hot removed and hot inserted back quickly ( < 1 minute)

book

Article ID: 312022

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
Under certain surprise hot removal scenario, VMware native NVMe hot plug might cause PSOD if a NVMe drive is pulled out and reinserted back within one minute. This is applicable to vSphere as well as vSAN deployment for new as well as existing drive reinsertion.

Environment

VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 7.0.0

Cause

After an NVMe drive is physically removed from the server, it takes ESXi 1 minute to clean up the resources allocated for the drive. In between, ESXi may still try to access the removed drive and trigger a non-maskable interrupt (NMI) from the server, leading to a PSOD in ESXi.

Resolution

Currently there is no resolution.

Workaround:
To workaround this issue, wait for 1 minute or longer and reinsert (hot plug) the new or existing NVMe drive back into the PCIe slot.