vSAN Health Check in vSphere Update Manager (VUM)
search cancel

vSAN Health Check in vSphere Update Manager (VUM)

book

Article ID: 326617

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In vSphere 6.7 U1, We have introduced vSAN Health Check to identify upgrade issues when a customer is using VUM to remediate any host in the vSAN cluster.

First, perform pre-remediate health check for the cluster to avoid potential failure that will happen in the upgrade. Then after host upgrade, it will perform another health check before letting the host exit maintenance mode, to ensure that the newly upgraded host won't impact the current vSAN cluster. Also, after the previous host upgraded and exited maintenance mode, it will check again to make sure the current cluster is healthy and ready to upgrade the rest hosts. If any issue is detected by vSAN Health during VUM remediation, the upgrade workflow will be interrupted to keep the minimum impacts on the vSAN cluster.

Generally, for upgrade use-case, we only pick up those health tests which are closely related to the upgrade, which can be used to determine whether VUM remediation is risky or not.

Here is the VUM health checklist (some of them could be skipped for different stage/deployment):
  • Hardware compatibility
    • Controller disk group mode is VMware certified (com.vmware.vsan.health.test.controllerdiskmode)
    • Controller is VMware certified for ESXi release (com.vmware.vsan.health.test.controllerreleasesupport)
    • SCSI controller is VMware certified (com.vmware.vsan.health.test.controlleronhcl)
    • Controller driver is VMware certified (com.vmware.vsan.health.test.controllerdriver)
    • Controller firmware is VMware certified (com.vmware.vsan.health.test.controllerfirmware)
    • vSAN HCL DB up-to-date (com.vmware.vsan.health.test.hcldbuptodate)
  • Network
    • All hosts have a vSAN vmknic configured (com.vmware.vsan.health.test.vsanvmknic)
    • Hosts with connectivity issues (com.vmware.vsan.health.test.hostconnectivity)
    • vSAN cluster partition (com.vmware.vsan.health.test.clusterpartition)
    • vSAN: Basic (unicast) connectivity check (com.vmware.vsan.health.test.network.smallping)
    • vSAN: MTU check (ping with large packet size) (com.vmware.vsan.health.test.network.largeping)
    • Network latency check (com.vmware.vsan.health.test.hostlatencycheck)
    • vMotion: Basic (unicast) connectivity check (com.vmware.vsan.health.test.network.vmotionpingsmall)
    • vMotion: MTU check (ping with large packet size) (com.vmware.vsan.health.test.network.vmotionpinglarge)
  • Physical disk
    • Component metadata health (com.vmware.vsan.health.test.componentmetadata)
    • Memory pools (heaps) (com.vmware.vsan.health.test.lsomheap)
    • Memory pools (slabs) (com.vmware.vsan.health.test.lsomslab)
    • Operation health (com.vmware.vsan.health.test.physdiskoverall)
  • Encryption
    • vCenter and all hosts are connected to Key Management Servers (com.vmware.vsan.health.test.kmsconnection)
Note: vSAN Health Check will only be performed against vSAN cluster where all the hosts are connected to VC and all supported vSAN Health Check feature (ESXi version >6.0 U2).


Environment

VMware vSAN 6.7.x

Resolution

If VUM pre-remediate check or remediation task failed on vSAN health, please navigate to vSAN health UI and identify the issues. Then trigger another remediation task after resolving the issue.