ESXi host loses connectivity to a VMFS datastore
search cancel

ESXi host loses connectivity to a VMFS datastore

book

Article ID: 326437

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides steps to revert the heartbeat related activity to the legacy method by disabling ATS heartbeat in the ESXi kernel to resolve this issue.


Symptoms:
  • An ESXi host loses connectivity to a VMFS6 datastore, while using the VAAI ATS heartbeat in your environment.
  • In the /var/run/log/vmkernel.log file, you see the message:
ATS Miscompare detected between test and set HB images at offset XXX on vol YYY
 
Note: For additional symptoms and log entries, see the Additional Information section.


Environment

VMware vSphere ESXi 5.5
VMware vSphere ESXi 6.7
VMware vSphere ESXi 5.1
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0

Cause

The legacy method involves plain SCSI reads and writes with the VMware ESXi kernel handling validation, the new method offloads the validation step to the storage system. This is similar to other VAAI-related offloads.

This optimization results in a significant increase in the volume of ATS commands the ESXi kernel issues to the storage system and resulting increased load on the storage system. Under certain circumstances, VMFS heartbeat using ATS may fail with false ATS miscompare which causes the ESXi kernel to again verify its access to VMFS datastores. This leads to the Lost access to datastore messages.

Note: For VMFS5/VMFS6 datastores, ATS heartbeat setting is on by default.

Resolution

Important note for vSAN clusters: If the host is part of a vSAN cluster, the ATS heartbeats must NOT be disabled and can negatively impact the cluster and its data accessibility!

To resolve this issue, revert the heartbeat to non-ATS mechanisms by disabling ATS heartbeat on ALL hosts sharing the datastore where these errors are seen.
 
Notes:
  • If you suspect that ATS Heartbeat may be causing an issue with array workload or IO responsiveness, engage with your storage vendor to determine if they recommend disabling this function.
  • This change disables or enables using ATS primitive for creating or updating VMFS heartbeat and does not change ATS primitive configuration itself.
  • These operations can be safely performed online, while the storage is in use.
  • Disabling ATS heartbeat:
  • Does not impact Acquiring a HB slot (starting heartbeat).
  • Periodic/Routine heartbeat updates are affected by this change.
  • Reverts the heartbeat related activity in the ESXi host using plain SCSI reads and writes to update its heartbeat on VMFS datastores.
     

Process to Revert Heartbeat to non-ATS mechanisms:

For VMFS5 and VMFS6 datastores:

To disable ATS heartbeat, run either the CLI command or the PowerCLI command

Command line:
esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS5

PowerCLI:
Get-AdvancedSetting -Entity VMHost-Name -Name VMFS3.UseATSForHBOnVMFS5 | Set-AdvancedSetting -Value 0 -Confirm:$false

To enable ATS heartbeat, run either the CLI command or the PowerCLI command

Command line:
esxcli system settings advanced set -i 1 -o /VMFS3/UseATSForHBOnVMFS5

PowerCLI:
Get-AdvancedSetting -Entity VMHost-Name -Name VMFS3.UseATSForHBOnVMFS5 | Set-AdvancedSetting -Value 1 -Confirm:$false
 
Notes:
  • This change takes effect immediately without reboot.
  • This change does not affect ESXi host OS.
  • The root node of these options is /VMFS3 regardless of the VMFS version. The last character of the option matches the corresponding VMFS version.

To review the changed options, run this command

    esxcli system settings advanced list -o /VMFS3/UseATSForHBonVMFS

    For example:
    esxcli system settings advanced list -o /VMFS3/UseATSForHBonVMFS3

    You see output similar to:
    Path: /VMFS3/UseATSForHBOnVMFS3
    Type: integer
    Int Value: 0
    <--- Check this value
    Default Int Value: 0
    Min Value: 0
    Max Value: 1
    String Value:
    Default String Value:
    Valid Characters:
    Description: Use ATS for HB on ATS supported VMFS3 volumes

    This reversion of VMFS heartbeat activity is preferred instead of globally disabling VAAI or ATS when using applicable storage systems. Although some storage systems require that the heartbeat-related activity be reverted to the legacy methodology, they still handle non-heartbeat-related ATS commands normally and there are dramatic performance and scale benefits to the use of ATS even if ATS should not be used for VMFS heartbeats.


    Additional Information

    You experience these additional symptoms:
     
    • In the /var/run/log/vobd.log file and Virtual Center Events, you see the VOB message:
      Lost access to volume <uuid><volume name> due to connectivity issues. Recovery attempt is in progress and the outcome will be reported shortly
    • In the /var/log/vmkernel.log file, you see similar error messages indicating an ATS miscompare:
      2015-11-20T22:12:47.194Z cpu13:33467)ScsiDeviceIO: 2645: Cmd(0x439dd0d7c400) 0x89, CmdSN 0x2f3dd6 from world 3937473 to dev "naa.50002ac0049412fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
       
    • You may also see:
      • Hosts disconnecting from vSphere vCenter.
      • Virtual machines hanging on I/O operations.

    Note: These symptoms are seen in connection with the use of VAAI ATS heartbeat with storage arrays supplied by several different vendors.

    This issue is not limited to one vendor. If the datastores are on IBM Storwize and San Volume Controller, the ATS heartbeat must be disabled per IBM recommendation. For more information, see the IBM advisory Host Disconnects Using VMware vSphere 5.5.0 Update 2 and vSphere 6.0 .
     

    Disclaimer: VMware is not responsible for the reliability of any data, opinions, advice, or statements made on third-party websites. Inclusion of such links does not imply that VMware endorses, recommends, or accepts any responsibility for the content of such sites.

    For More Information, see: 

    For Translated versions of this Article, see: