ESXi host takes a long time to start during rescan of RDM LUNs
search cancel

ESXi host takes a long time to start during rescan of RDM LUNs

book

Article ID: 318851

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
When RDMs are used as shared disk resources for a clustering solution such as WSFC, Red Hat High Availability Cluster, etc., you experience these symptoms:
  • ESXi hosts hosting secondary nodes may take a long time to start. This time depends on the number of RDMs that are attached to the ESXi host..

    Note: Example, In a system with 10 RDMs used in a two-node WSFC or Red Hat High Availability Cluster, restart of the ESXi host with the secondary node may take upto 30 minutes. In a system with less RDMs, the restart time is less. For example, if only three RDMs are used, the restart time is approximately 10 minutes.
  • The ESXi host intermittently displays an error message on the Summary Tab and the vSphere Client may not be able to start:

    Cannot synchronize host hostname. Operation Timed out.
     
  • The log in screen shows the start waiting after this message similar to:

    Loading module multiextent.


Environment

VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.7
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.5
VMware vSphere ESXi 8.0

Cause

This issue occurs when virtual machines participating in a clustering solution such as WSFC, Red Hat High Availability Cluster use shared RDMs and SCSI reservations across hosts, and a virtual machine on the other host is the active cluster node holding a SCSI Reservation.

The delay occurs at these steps:

Starting path claiming and SCSI device discovery

In the /var/log/vmkernel.log file of the restarting ESXi host, you see entries similar to:

vmkernel: 0:00:01:57.828 cpu0:4096)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.6006016045502500176a24d34fbbdf11
vmkernel: 0:00:01:57.830 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:3:0 (naa.6006016045502500166a24d34fbbdf11)
vmkernel: 0:00:02:37.842 cpu3:4099)ScsiDeviceIO: 1672: Command 0x1a to device "naa.6006016045502500176a24d34fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

Mounting the partition of the RDM LUNs

In the /var/log/vmkernel.log file of the restarting ESXi host, you see entries similar to:

vmkernel: 0:00:08:58.811 cpu2:4098)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.600601604550250083489d914fbbdf11
vmkernel: 0:00:08:58.814 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:9:0 (naa.600601604550250082489d914fbbdf11)
vmkernel: 0:00:09:38.855 cpu2:4098)ScsiDeviceIO: 1672: Command 0x1a to device "naa.600601604550250083489d914fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
vmkernel: 0:00:09:38.855 cpu1:4111)ScsiDeviceIO: 4494: Could not detect setting of QErr for device naa.600601604550250083489d914fbbdf11. Error Failure.
vmkernel: 0:00:10:08.945 cpu1:4111)WARNING: Partition: 801: Partition table read from device naa.600601604550250083489d914fbbdf11 failed: I/O error
vmkernel: 0:00:10:08.945 cpu1:4111)ScsiDevice: 2200: Successfully registered device "naa.600601604550250083489d914fbbdf11" from plugin "NMP" of type 0
vmkernel: 47:02:52:19.382 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
vmkernel: 47:02:52:19.383 cpu12:4108)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18
vmkernel: 47:02:52:21.383 cpu23:9621)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4

If you configure the setting on an existing VMFS LUN, you may see these entries in the /var/log/vmkernel.log file:

cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.
You can safely ignore this warning for Clustered VMDK datastores. VMware Engineering is working on suppressing the message in future release.

Resolution

ESXi 6.x and ESXi 7.x Hosts

For all ESXi 6.x and 7.x hosts, the command line, vSphere Client, and PowerCLI methods of setting the RDMs to perennially reserved are covered in the sections below:

To mark the LUNs as perennially reserved:
  1. Determine which RDM LUNs are part of WSFC, Red Hat High Availability Cluster etc . From the vSphere Client, select a virtual machine that has a mapping to the cluster RDM devices.
  2. Edit your virtual machine settings and navigate to your Mapped RAW LUNs. In this example, Hard disk 2:

    vmware perennially reserved
     
  3. In the Physical disk, there is the specification of the device in use as RDM (that is, the VML ID).

    Take note of the VML ID, which is a globally unique identifier for your shared device.
     
  4. Identify the naa.id for this VML using this command:  esxcli storage core device list

    For example:

    esxcli storage core device list

    naa.6589cfc000000a17ac02aae02067e747
       Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747)
       Has Settable Display Name: true
       Size: 40960
       Device Type: Direct-Access
       Multipath Plugin: NMP
       Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747
       Vendor: FreeNAS
       Model: iSCSI Disk
       Revision: 0123
       SCSI Level: 6

     
  5.    Is Pseudo: false
       Status: degraded
       Is RDM Capable: true
       Is Local: false
       Is Removable: false
       Is SSD: false
       Is VVOL PE: false
       Is Offline: false
       Is Perennially Reserved: false
       Queue Full Sample Size: 0
       Queue Full Threshold: 0
       Thin Provisioning Status: unknown
       Attached Filters:
       VAAI Status: supported
       Other UIDs: vml.010001000030303530353630313031303830310000695343534920
       Is Shared Clusterwide: true
       Is SAS: false
       Is USB: false
       Is Boot Device: false
       Device Max Queue Depth: 128
       No of outstanding IOs with competing worlds: 32
       Drive Type: unknown
       RAID Level: unknown
       Number of Physical Drives: unknown
       Protection Enabled: false
       PI Activated: false
       PI Type: 0
       PI Protection Mask: NO PROTECTION
       Supported Guard Types: NO GUARD SUPPORT
       DIX Enabled: false
       DIX Guard Type: NO GUARD SUPPORT
       Emulated DIX/DIF Enabled: false
  6. Use the esxcli command to mark the device as perennially reserved:

    esxcli storage core device setconfig -d naa.id --perennially-reserved=true

    For example:

    esxcli storage core device setconfig -d naa.6589cfc000000a17ac02aae02067e747 --perennially-reserved=true

    Note: For vSphere 7.x, see the Change Perennial Reservation Settings section of the vSphere Storage Guide.
     
  7. To verify that the device is perennially reserved, run this command:

    esxcli storage core device list -d naa.id

    In the output of the esxcli command, search for the entry Is Perennially Reserved: true. This shows that the device is marked as perennially reserved.

    For example:

    esxcli storage core device list -d naa.6589cfc000000a17ac02aae02067e747

    naa.6589cfc000000a17ac02aae02067e747
       Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747)
       Has Settable Display Name: true
       Size: 40960
       Device Type: Direct-Access
       Multipath Plugin: NMP
       Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747
       Vendor: FreeNAS
       Model: iSCSI Disk
       Revision: 0123
       SCSI Level: 6
       Is Pseudo: false
       Status: degraded
       Is RDM Capable: true
       Is Local: false
       Is Removable: false
       Is SSD: false
       Is VVOL PE: false
       Is Offline: false
       Is Perennially Reserved: true
       Queue Full Sample Size: 0
       Queue Full Threshold: 0
       Thin Provisioning Status: unknown
       Attached Filters:
       VAAI Status: supported
       Other UIDs: vml.010001000030303530353630313031303830310000695343534920
       Is Shared Clusterwide: true
       Is SAS: false
       Is USB: false
       Is Boot Device: false
       Device Max Queue Depth: 128
       No of outstanding IOs with competing worlds: 32
       Drive Type: unknown
       RAID Level: unknown
       Number of Physical Drives: unknown
       Protection Enabled: false
       PI Activated: false
       PI Type: 0
       PI Protection Mask: NO PROTECTION
       Supported Guard Types: NO GUARD SUPPORT
       DIX Enabled: false
       DIX Guard Type: NO GUARD SUPPORT
       Emulated DIX/DIF Enabled: false

     
  8. Repeat the procedure for each Mapped RAW LUN that is participating in the clustering solution such as WSFC, Red Hat High Availability Cluster, etc.

    Note: The configuration is permanently stored with the ESXi host and persists across restarts. To remove the perennially reserved flag, run this command:

    esxcli storage core device setconfig -d naa.id --perennially-reserved=false


Additional Information

VMware Skyline Health Diagnostics for vSphere - FAQ
How to detach a LUN device from ESXi hosts

For more information, see Obtaining LUN pathing information for ESX or ESXi hosts (1003973).

Note: The PowerCLI and esxcli commands are case sensitive. If the naa.id is specified in uppercase letters when issuing the command, a new device is added on the ESXi host.

The resolution steps in this article are also known to resolve storage devices reporting NMP errors similar to:
 
WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.600601604ec0360065efeed9d265e411": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

For more information, see:  If you experience symptoms described above for Clustered VMDK, follow the steps to resolve the issue.