Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

ESXi/ESX hosts in APD may appear Not Responding in vCenter Server (1030980)

Symptoms

An ESXi/ESX host with one or more LUNs in an All-Paths-Down (APD) condition may become unmanageable in vCenter Server:
  • The ESXi/ESX host appears as Disconnected or Not Responding in the vCenter Server inventory.
  • Virtual machines utilizing the LUNs in APD may become unresponsive.
  • Connecting to the ESXi/ESX host using the vSphere Client, vCLI, or PowerCLI fails.
  • Adding a host to vCenter Server fails with error:

    Failed to read resource pool tree from host

  • Connecting to the ESXi/ESX host using SSH is successful.
  • The vmware-hostd management service is running.
  • Connecting to the vmware-hostd management service using the vim-cmd or vmware-cmd command fails.
  • The last line in the hostd log at /var/log/vmware/hostd.log contains:

    verbose 'FSVolumeProvider'] RefreshVMFSVolumes called

Resolution

Validation

Determine whether there are any LUNs in an All-Paths-Down (APD) state on an ESXi/ESX host:

  1. Open a console to the ESXi/ESX host. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807) or Using Tech Support Mode in ESXi 4.1 and ESXi 5.0 (1017910).
  2. Use the esxcfg-mpath command to obtain a list of all device paths, and filter by their State:

    # esxcfg-mpath --list-paths --device <device mpx/naa name> | grep state

    If you do not know the problem device ID or you have many devices it may be more efficient to use this command to identify the dead paths:

    # esxcfg-mpath -b | grep -C 1 dead

  3. If any path reports the State dead, but other paths to the same device report the state Up, perform a rescan to remove the stale device entries. For more information, see Performing a rescan of the storage on an ESXi/ESX host (1003988).
  4. If every path to a LUN reports a State of dead, then the LUN is in an All-Paths-Down state.

Preemptive workaround

If the APD condition is noticed prior to any process opening a file on the affected VMFS datastores, the impending blocking I/O can be fast-failed by setting the advanced configuration option VMFS3.FailVolumeOpenIfAPD = 1 on ESXi/ESX 4.1. For more information, see Configuring advanced options for ESX/ESXi (1038578).

In situations where any dead path or APD is noticed, individual HBAs can be rescanned using the following command:

# esxcfg-rescan -d vmhbaX

Note: Replace vmhbaX with the appropriate HBA, for example vmhba33.

In ESX/ESXi 4.1 and later, all HBAs can be rescanned using the following command:

# esxcfg-rescan -A

Note: If any device is already in an APD condition with active I/O already waiting for the device to return, setting this option does not cause the already-issued I/O to fail. It is necessary to either bring the LUN paths back up, or to wait for the I/O to eventually fail.

For more information, see Virtual machines stop responding when any LUN on the host is in an all-paths-down (APD) condition (1016626).

To avoid the APD state on an ESXi/ESX host, be sure to use the correct method to unpresent the LUNs. For more information on the correct procedure for unpresenting LUNs, see Removing a LUN containing a datastore from VMware ESXi/ESX 4.x (1029786).

See Also

Update History

04/05/2013 - Added additional symptom

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 51 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 51 Ratings
Actions
KB: