"Lost access to volume" messages with vSAN

Products

VMware vSAN

Issue/Introduction

To inform about the meaning of "Lost access to volume" messages on vSAN clusters.

Symptoms:

In the /var/log/hostd.log file, you see entries similar to:

2015-07-02T02:00:11.675Z [4F1E1B70 info 'Vimsvc.ha-eventmgr'] Event 205 : Lost access to volume 54f89e21-4427e506-b968-a0369f519998 (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2015-07-02T02:00:37.055Z [4F480B70 info 'Vimsvc.ha-eventmgr'] Event 210 : Successfully restored access to volume 54f89e21-4427e506-b968-a0369f519998 (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) following connectivity issues.

In the /var/log/hostd.log file, you see entries similar to:

2015-07-02T02:00:11.675Z [4F1E1B70 info 'Vimsvc.ha-eventmgr'] Event 205 : Lost access to volume 54f89e21-4427e506-b968-a0369f519998 (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2015-07-02T02:00:37.055Z [4F480B70 info 'Vimsvc.ha-eventmgr'] Event 210 : Successfully restored access to volume 54f89e21-4427e506-b968-a0369f519998 (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) following connectivity issues.
In vCenter Server, you see an event similar to:

Lost access to volume 54f89e21-4427e506-b968-a0369f519998 (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
In this instance, shown ID XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX is not referring to a volume/datastore, but to the affected object on the vSAN datastore. These messages are similar to a VMFS volumes when there is too high latency or interruptions when accessing the data.

Cause

Any ESXi is writing and processing periodic heartbeats to it's VMFS filesystems. When there are issues updating heartbeats due to underlying vSAN storage instability, these heartbeats will timeout and "Lost access to volume" is logged in the vmkernel.log file.

On vSAN-enabled clusters the folders of virtual machines (called "VM Namespace") are using a special form of VMFS filesystems to incorporate required files for virtual machines, like the .vmx configuration file, .vmdk descriptor files, vmware.log files, etc.

These vSAN cluster instability can, but not limited to, caused by any misbehaving disks in the entire vSAN environment (on any node), network latency or instability of the vSAN network, any sort of congestion, etc.

Resolution

There are multiple possible reasons when such issues are observed. Please validate following:

The "vSAN health check" must be run to verify the state of vSAN and validate whether any issues exist. As an administrator it must be run and checked regularly.
All virtual machines must be compliant with applied storage policy.
Verify a stable, performant network connectivity on the vSAN network.
Verify if any disks on the vSAN environment are nearing it's end of lifespan (e.g. "Write Endurance" for any disks)

Should issues still exist, please open a support case with VMware.

Additional Information

Understanding lost access to volume messages in ESXi 6.x/7.x (2136081)

Attachments

POLICY_COMPLIANCE_CHECK_4.PNG get_app

POLICY_COMPLIANCE_2.PNG get_app

POLICY_COMPLIANCE_3.PNG get_app