Cannot remount a datastore after an unplanned permanent device loss (PDL)
search cancel

Cannot remount a datastore after an unplanned permanent device loss (PDL)

book

Article ID: 323145

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides steps to resolve the issue when you are unable to remount a datastore after an unplanned permanent device loss.


Symptoms:
  • After a storage device has unexpectedly unpresented from the storage array, you are unable to mount it again.
  • This issue occurs when there was a running virtual machine when the storage device went offline.
  • An ESXi host cannot mount the storage after the LUN is online again.
  • In the vmkernel.log file, you see entries similar to:

    cpu36:5590)Vol3: 1665: Error refreshing FD resMeta: Device is permanently unavailable
    cpu34:5590)VC: 1449: Device rescan time 165 msec (total number of devices 75)
    cpu34:5590)VC: 1452: Filesystem probe time 504 msec (devices probed 48 of 75)
    cpu38:5590)ScsiDevice: 4592: naa.6006016058201700354179be0c6fdf11 device :Open count > 0, cannot be brought online
    cpu34:5590)Vol3: 647: Couldn't read volume header from control: Invalid handle
    cpu34:5590)FSS: 4333: No FS driver claimed device 'control': Not supported
    cpu38:5590)ScsiDeviceIO: 2316: Cmd(0x4124c0ea2e80) 0x28, CmdSN 0x70509 to dev "naa.6006016058201700354179be0c6fdf11" failed H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

     

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.  
     



Environment

VMware vSphere ESXi 5.1
VMware vSphere 6.7.x
VMware vSphere ESXi 6.7
VMware vSphere ESXi 7.0
VMware vSphere 7.0.x
VMware vSphere ESXi 6.0
VMware vSphere ESXi 8.0
VMware vSphere ESXi 5.5
VMware vSphere ESXi 6.5

Resolution

To resolve this issue:

  1. Run this command to see the world that has the device open for the LUN:

    #esxcli storage core device world list -d naa_id

    For example:

    #esxcli storage core device world list -d naa.6006016058201700354179be0c6fdf11

    You see output similar to:

    Device World ID Open Count World Name
    ------------------------------------ -------- ---------- ----------
    naa.6006016058201700354179be0c6fdf11 2060 1 idle0


    If a VMFS volume is using the device indirectly, the world name includes the string idle0. If a virtual machine uses the device as an RDM, the virtual machine World ID is displayed. If any other process is using the raw device, the corresponding information is displayed.

    Notes:
     
    • If the host is not responding, run the esxcfg-scsidevs –m | grep naa.id command to get the corresponding datastore name.
    • Ensure all virtual machines registered on the volume in a PDL state do not require any further steps. If you have a virtual machine in that state, attempting to Retry or Cancel an operation will not return the virtual machine world ID. Click Cancel as the Retry operation cannot succeed unless the volume is remounted.
       
  2. Run this command to list all virtual machines running on the ESXi host and identify the virtual machine registered on that LUN:

    #esxcli vm process list
     
  3. To kill the virtual machine World ID, run this command:

    #esxcli vm process kill --type=force --world-id=World ID

    For example:

    #esxcli vm process kill --type=force --world-id=12131
     
  4. To clean up other processes that might be using this datastore:
    Remove any templates from vCenter that are using this datastore.

    Restart the management service (See Restarting the Management agents in ESXi):
    #/etc/init.d/hostd restart

    Clean up the SIOC service, see KB: 2011220  
  5. Rescan the storage using this command:

    #esxcfg-rescan -u vmhba#
     
  6. Run this command to see the device state:

    #esxcli storage core device list -d naa-id
     
  7. If the issue persists, reboot the ESXi 5.x host where virtual machine was registered.


Additional Information

How to unmount a LUN or detach a datastore device from ESXi hosts
Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
計画外の永続的なデバイスの損失 (PDL) 後にデータベースを再マウントできない
意外永久设备丢失 (PDL) 后,无法重新挂载数据存储