VMware vSAN upgrade best practices
search cancel

VMware vSAN upgrade best practices

book

Article ID: 326927

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article describes prescriptive best practices for upgrading vSAN clusters

Environment

VMware vSAN 5.5.x
VMware vSAN 8.0.x
VMware vSAN 7.0.x
VMware vSAN 6.x

Resolution

Successful vSAN cluster upgrades depend on both prerequisite steps to prepare the cluster for upgrade as well as adherence to certain recommendations during the upgrade process.

Before starting a vSAN Cluster upgrade

Before starting the vSAN upgrade process, ensure that the following requirements are met:

  1. The vSphere environment is up to date:
    • The vCenter Server managing the hosts must be at an equal or higher version than the ESXi hosts version it manages. It is advisable to have vCenter and ESXi on matching versions as doing otherwise can lead to communication issues between the two as per KB vCenter version to ESXi version (68174) (Refer to KBs Build numbers and versions of VMware ESXi/ESX  and Build numbers and versions of VMware vCenter Server to determine supported vCenter/ESXi version combinations.)  
    • All hosts should be running the same build of ESXi before vSAN cluster upgrade is started. Only uniform ESXI host versions across the cluster will ensure efficient vSAN functionality.
    • If the ESXi host versions are not matched, the hosts should be patched to the same build before upgrading.
  2. It is a must that all checks in the health plugin of the vSAN cluster show green. ( vSAN cluster > Monitor >  vSAN > Skyline Health > Test ). All vSAN components ( Disks/DOM objects/Network etc)  should be healthy
    • No disk should be failed or absent
    • This can be determined via Cluster -> Configure -> vSAN -> Disk Management
  3. The HCL for the controller should have a matching driver/firmware combination and it should also be supported with the target version of ESXi 
  1. There should NOT be any inaccessible vSAN objects 
    • This can be verified with the vSAN Health Service in vSAN 6.0 and above, or with the Ruby vSphere Console (RVC, Deprecated)
  2. There should not be any active resync at the start of the upgrade process.
    • Some resync activity is expected during the upgrade process, as data needs to be synchronized following host reboots. The Administrator must wait till resync finishes before putting the next host into Maintenance Mode. 
  3. Ensure that there are no known compatibility issues between your current vSAN version and the desired target vSAN version. For information on upgrade requirements, see vSAN upgrade requirements (2145248).
    • Check the Upgrade path for the compatibility 
    • If required, update the vSAN cluster to the required build before undertaking the upgrade process to avoid compatibility concerns.

ESXi Host preparation

  • Ensure you choose the right maintenance mode option. When you move a host into maintenance mode in vSAN, you have three options to choose:
    • Ensure availability:
      • If you select Ensure availability, vSAN allows you to move the host into maintenance mode faster than Full data migration and ensures access to the virtual machines in the environment. 
    • Full data migration:
      • vSAN evacuates all data to other hosts in the cluster. This evacuation mode results in the largest amount of data transfer and consumes the most time and resources. 
    • No data migration:
      • If you select No data migration, vSAN does not evacuate any data from this host. If you power off or remove the host from the cluster, some virtual machines might become inaccessible. This is not a safe option to be used.
  • Exit maintenance mode and resync
    • When the ESXi host is upgraded and moved out of maintenance mode, a resync will occur fi this took more than 60mins which is the default resync delay timer. You can see this through the vSphere client.
    • Ensure this is complete before moving on to the next host. A resync is occurring as the host that has been updated can now contribute to the vSAN Datastore again. It is vital to wait till this resync is complete to ensure there is no data loss.
  • For Stretched vSAN clusters, always upgrade the witness host after the physical nodes. Also, check Witness appliance upgrade to vSphere 7.0 or higher with caution if the Witness is being upgraded from 6.x to 7.x or higher version.
Note: As of version 7.0 and higher or if the target upgrade version is 7.0 and higher the witness node should be upgraded before upgrading the physical nodes.

After starting a vSAN Cluster upgrade:

After beginning the upgrade process, there are a few items to keep in mind:​​​​​​

1. Once you start an upgrade of a vSAN cluster make sure to complete the upgrade ASAP preferably within a week's time as mixed versions of ESXi in the same cluster, especially a difference of major releases, is not a supported configuration and can cause issues such as performance issues and cluster instability. This is due to having mixed codes talking to each other within the same cluster. Mixed versions are ONLY supported during an upgrade which is expected to be completed typically within a 24-48hr period for clusters below 32 hosts. For large clusters, 32-64 hosts typical upgrade should be completed within 48-72hrs.

2. Do NOT attempt to upgrade a cluster by introducing new versions to the cluster and migrating workloads.
  • If introducing new host(s) mid-cluster upgrade ensure no disk groups are present/created until all hosts have been upgraded to the same ESXi version to prevent potential vSAN network partitions. Be sure to complete the upgrade of the cluster as per #1 above before creating any disk groups on the newly added host(s).
3. If you are adding or replacing disks in the midst of an upgrade, ensure that they are formatted with the appropriate legacy on-disk format version, if applicable. For more information, see How to format vSAN Disk Groups with a legacy format version and Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade

Failure to adhere to these best practices may result in one or more of the following issues:
  • Unexpected network partitions
  • Unexpected loss of data availability
  • vSAN cluster instability


Additional Information