book
Article ID: 318851
calendar_today
Updated On:
Issue/Introduction
Symptoms:
When RDMs are used as shared disk resources for a clustering solution such as WSFC, Red Hat High Availability Cluster, etc., you experience these symptoms:
- ESXi hosts hosting secondary nodes may take a long time to start. This time depends on the number of RDMs that are attached to the ESXi host..
Note: Example, In a system with 10 RDMs used in a two-node WSFC or Red Hat High Availability Cluster, restart of the ESXi host with the secondary node may take upto 30 minutes. In a system with less RDMs, the restart time is less. For example, if only three RDMs are used, the restart time is approximately 10 minutes. - The ESXi host intermittently displays an error message on the Summary Tab and the vSphere Client may not be able to start:
Cannot synchronize host hostname. Operation Timed out.
- The log in screen shows the start waiting after this message similar to:
Loading module multiextent.
Environment
VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.7
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.5
VMware vSphere ESXi 8.0
Cause
This issue occurs when virtual machines participating in a clustering solution such as WSFC, Red Hat High Availability Cluster use shared RDMs and SCSI reservations across hosts, and a virtual machine on the other host is the active cluster node holding a SCSI Reservation.
The delay occurs at these steps:
Starting path claiming and SCSI device discovery
In the /var/log/vmkernel.log file of the restarting ESXi host, you see entries similar to:
vmkernel: 0:00:01:57.828 cpu0:4096)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.6006016045502500176a24d34fbbdf11
vmkernel: 0:00:01:57.830 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:3:0 (naa.6006016045502500166a24d34fbbdf11)
vmkernel: 0:00:02:37.842 cpu3:4099)ScsiDeviceIO: 1672: Command 0x1a to device "naa.6006016045502500176a24d34fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0
Mounting the partition of the RDM LUNs
In the /var/log/vmkernel.log file of the restarting ESXi host, you see entries similar to:
vmkernel: 0:00:08:58.811 cpu2:4098)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.600601604550250083489d914fbbdf11
vmkernel: 0:00:08:58.814 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:9:0 (naa.600601604550250082489d914fbbdf11)
vmkernel: 0:00:09:38.855 cpu2:4098)ScsiDeviceIO: 1672: Command 0x1a to device "naa.600601604550250083489d914fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
vmkernel: 0:00:09:38.855 cpu1:4111)ScsiDeviceIO: 4494: Could not detect setting of QErr for device naa.600601604550250083489d914fbbdf11. Error Failure.
vmkernel: 0:00:10:08.945 cpu1:4111)WARNING: Partition: 801: Partition table read from device naa.600601604550250083489d914fbbdf11 failed: I/O error
vmkernel: 0:00:10:08.945 cpu1:4111)ScsiDevice: 2200: Successfully registered device "naa.600601604550250083489d914fbbdf11" from plugin "NMP" of type 0
vmkernel: 47:02:52:19.382 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
vmkernel: 47:02:52:19.383 cpu12:4108)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18
vmkernel: 47:02:52:21.383 cpu23:9621)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4
If you configure the setting on an existing VMFS LUN, you may see these entries in the /var/log/vmkernel.log file:
cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.
You can safely ignore this warning for Clustered VMDK datastores. VMware Engineering is working on suppressing the message in future release.
Resolution
ESXi 6.x and ESXi 7.x Hosts
For all ESXi 6.x and 7.x hosts, the command line, vSphere Client, and PowerCLI methods of setting the RDMs to perennially reserved are covered in the sections below:
To mark the LUNs as perennially reserved:
- Determine which RDM LUNs are part of WSFC, Red Hat High Availability Cluster etc . From the vSphere Client, select a virtual machine that has a mapping to the cluster RDM devices.
- Edit your virtual machine settings and navigate to your Mapped RAW LUNs. In this example, Hard disk 2:
- In the Physical disk, there is the specification of the device in use as RDM (that is, the VML ID).
Take note of the VML ID, which is a globally unique identifier for your shared device.
- Identify the naa.id for this VML using this command: esxcli storage core device list
For example:
esxcli storage core device list
naa.6589cfc000000a17ac02aae02067e747
Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747)
Has Settable Display Name: true
Size: 40960
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747
Vendor: FreeNAS
Model: iSCSI Disk
Revision: 0123
SCSI Level: 6
- Is Pseudo: false
Status: degraded
Is RDM Capable: true
Is Local: false
Is Removable: false
Is SSD: false
Is VVOL PE: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: unknown
Attached Filters:
VAAI Status: supported
Other UIDs: vml.010001000030303530353630313031303830310000695343534920
Is Shared Clusterwide: true
Is SAS: false
Is USB: false
Is Boot Device: false
Device Max Queue Depth: 128
No of outstanding IOs with competing worlds: 32
Drive Type: unknown
RAID Level: unknown
Number of Physical Drives: unknown
Protection Enabled: false
PI Activated: false
PI Type: 0
PI Protection Mask: NO PROTECTION
Supported Guard Types: NO GUARD SUPPORT
DIX Enabled: false
DIX Guard Type: NO GUARD SUPPORT
Emulated DIX/DIF Enabled: false - Use the esxcli command to mark the device as perennially reserved:
esxcli storage core device setconfig -d naa.id --perennially-reserved=true
For example:
esxcli storage core device setconfig -d naa.6589cfc000000a17ac02aae02067e747 --perennially-reserved=true
Note: For vSphere 7.x, see the Change Perennial Reservation Settings section of the vSphere Storage Guide.
- To verify that the device is perennially reserved, run this command:
esxcli storage core device list -d naa.id
In the output of the esxcli command, search for the entry Is Perennially Reserved: true. This shows that the device is marked as perennially reserved.
For example:
esxcli storage core device list -d naa.6589cfc000000a17ac02aae02067e747
naa.6589cfc000000a17ac02aae02067e747
Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747)
Has Settable Display Name: true
Size: 40960
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747
Vendor: FreeNAS
Model: iSCSI Disk
Revision: 0123
SCSI Level: 6
Is Pseudo: false
Status: degraded
Is RDM Capable: true
Is Local: false
Is Removable: false
Is SSD: false
Is VVOL PE: false
Is Offline: false
Is Perennially Reserved: true
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: unknown
Attached Filters:
VAAI Status: supported
Other UIDs: vml.010001000030303530353630313031303830310000695343534920
Is Shared Clusterwide: true
Is SAS: false
Is USB: false
Is Boot Device: false
Device Max Queue Depth: 128
No of outstanding IOs with competing worlds: 32
Drive Type: unknown
RAID Level: unknown
Number of Physical Drives: unknown
Protection Enabled: false
PI Activated: false
PI Type: 0
PI Protection Mask: NO PROTECTION
Supported Guard Types: NO GUARD SUPPORT
DIX Enabled: false
DIX Guard Type: NO GUARD SUPPORT
Emulated DIX/DIF Enabled: false
- Repeat the procedure for each Mapped RAW LUN that is participating in the clustering solution such as WSFC, Red Hat High Availability Cluster, etc.
Note: The configuration is permanently stored with the ESXi host and persists across restarts. To remove the perennially reserved flag, run this command:
esxcli storage core device setconfig -d naa.id --perennially-reserved=false
Additional Information
VMware Skyline Health Diagnostics for vSphere - FAQHow to detach a LUN device from ESXi hostsFor more information, see
Obtaining LUN pathing information for ESX or ESXi hosts (1003973).
Note: The PowerCLI and
esxcli
commands are case sensitive. If the
naa.id
is specified in uppercase letters when issuing the command, a new device is added on the ESXi host.
The resolution steps in this article are also known to resolve storage devices reporting NMP errors similar to:
WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.600601604ec0360065efeed9d265e411": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.For more information, see:
If you experience symptoms described above for Clustered VMDK, follow the steps to resolve the issue.