Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

ESX hosts to lose storage when CLARiiON failover mode changes during storage processor reboot (1038308)

Symptoms

  • ESX hosts to lose storage when CLARiiON failover mode changes during storage processor reboot
  • During scheduled reboot of a storage processor, one or more servers initiator's failover mode changes from Active/Active Mode (ALUA) - failovermode 4 to Active/Passive Mode (PNR) - failover mode 1

Resolution

This failover mode change can cause the ESX servers to lose storage completely during a second Storage Processor (SP) reboot.

In ESX 4.x, the ESX host agent tries to register the host with failover mode 1 - it does this because the SP rebooted and lost connection with the array. When the array comes back up, the ESX host re-registers and must use a hard coded failover mode of 1.

To work around this issue, present an actual LUN0 to the hosts in ALUA mode.
  1. In the EMC Unisphere, disable the ArrayCommPath parameter for each ESX host.

    Note: Disabling ArrayCommPath for each host requires a reboot of each host.

  2. Present a 2GB or larger LUN0 to all ESX hosts.

    Note: This volume does not need to be formatted, but must be equal to or larger than 1.5GB.

  3. Rescan each ESX Server to see if they are now seeing LUN0 rather then LUNZ. If not, then run a rolling reboot of all ESX hosts.

Additional Information

A LUNZ is the pseudo LUN that is presented to all hosts prior to any real LUNs being assigned to the host (in the Storage Group). The very first LUN assigned to a host in the Storage Group is assigned the Host LUN Number (HLU) of zero (0) by default. If you manually assign the HLU and do not use zero (0), the host sees the LUNZ.

LUNZ always presents as a PNR (A/P) device (barring an actual LUN 0 being presented). In Flare versions 4.30 and below, when ESX attempts to push a registration in PNR mode to the array, the array refuses to update the registration that already existed. Remaining volumes come up in ALUA mode and are fully functional.

As the first LUN that becomes accessible to an ESX host is the LUNZ (barring an actual LUN 0 being presented), and is thus always in PNR mode, Navireg attempts to push a registration for the hosts in PNR (Active/Passive) mode to the array. Flare versions 4.30 and below prevent this from happening, since an existing registration exists and has been modified from defaults, no new versions enables features where the registration can be controlled entirely from the host side.
 
To check if a LUN is a pseudo LUN from the ESX host command-line:

esxcfg-scsidevs -l -d <naaID>
 
For example, the output of running the command on device naa.50060160bb201bc550060160bb201bc5:
 
# esxcfg-scsidevs -l -d naa.50060160bb201bc550060160bb201bc5

naa.50060160bb201bc550060160bb201bc5
   Device Type: Direct-Access
   Size: 0 MB
   Display Name: DGC Fibre Channel Disk (naa.50060160bb201bc550060160bb201bc5)
   Multipath Plugin: NMP
   Console Device: /dev/sdi
   Devfs Path: /vmfs/devices/disks/naa.50060160bb201bc550060160bb201bc5
   Vendor: DGC Model: LUNZ Revis: 0430
   SCSI Level: 4 Is Pseudo: true Status: on
   Is RDM Capable: true Is Removable: false
   Is Local: false
   Other Names:
      vml.020000000050060160bb201bc550060160bb201bc54c554e5a2020

Update History

12/11/2012 - Added ESX/ESXi 4.1 and ESxi 5.0 to Products

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 9 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 9 Ratings
Actions
KB: