Enabling vSphere HA might fail or never complete on hosts with ESXi 7.0u2c/u2d and 7.0u3/u3a
search cancel

Enabling vSphere HA might fail or never complete on hosts with ESXi 7.0u2c/u2d and 7.0u3/u3a

book

Article ID: 318765

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  1. HA cannot be successfully enabled on the newly added/moved host in any one of the following cases:
    • A host with ESXi 7.0u3/u3a is added/moved to a HA-enabled VUM cluster
    • A host with ESXi 7.0u3/u3a is added/moved to a non-HA-enabled VUM cluster and HA is enabled later on the cluster.
    • A host with ESXi 7.0u2c/u2d or ESXi 7.0u3/u3a is added/moved to a HA-enabled VLCM cluster
    • A host with ESXi 7.0u2c/u2d or ESXi 7.0u3/u3a is added/moved to a non-HA-enabled VLCM cluster and HA is enabled later on the cluster.
    • Multiple hosts (at least one 7.0u3c host and at least one host with ESXi 7.0u2c/u2d or ESXi 7.0u3/u3a) are added to a HA-enabled VLCM cluster and the host maintenance mode is exited.
    • A previously disconnected host with ESXi 7.0u3/u3a is reconnected in a HA-enabled VUM cluster.
    • A previously disconnected host with ESXi 7.0u2c/u2d or ESXi 7.0u3/u3a is reconnected in a HA-enabled VLCM cluster
  2. When trying to enable HA, the installation of the HA agent (FDM) on the host fails.
  3. In some cases we also observe a symptom where in "Remediate HA" task will complete but HA is not enabled on host. HA is not healthy on host
  4. The HA status for the ESXi host is stuck at "Configuration Error" with the description "An error occurred when vCenter Server attempted to initialize the vSphere HA Agent running on the host."
  5. Sometimes, the HA status for the ESXi host is stuck at "HA Agent Unreachable" with the description "The vSphere HA Agent on the host cannot be reached."
  6. In some cases the "Configuring vSphere HA" task fails with "Cannot complete the configuration of the vSphere HA agent on the host. "Applying HA VIBs on the cluster encountered failure"".
  7. In some cases the "Remediate HA" task fails with "A general system error occurred: Installing HA components failed on the cluster: domain-c100".
  8. Migrating or powering on VMs on this newly added host in the HA enabled cluster fail.
  9. VLCM cluster Image Apply on the HA-enabled VLCM cluster will fail because HA is unhealthy.
  10. When the host is being moved/added to a VLCM cluster, the following entries will be seen in /var/run/log/lifecycle.log on the ESXi.
    • lifecycle.log
      • 2022-01-10T15:55:07Z lifecycle: 308060: ImageProfile:844 INFO Adding VIB VMware_bootbank_vmware-fdm_7.0.3-19012297 to ImageProfile (Updated) ESXi-7.0U2d-18538813-standard
         
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:147 ERROR [ValueError] Expected 1 component, found 2
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR Traceback (most recent call last):
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR   File "/usr/lib/vmware/lifecycle/bin/imagemanagerctl.py", line 665, in components
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR     Transaction(initInstallers=False).InstallComponentsFromSources(
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR   File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 532, in InstallComponentsFromSources
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR     _checkComponentDowngrades(curProfile, newProfile,
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR   File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 2264, in _checkComponentDowngrades
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR     compDowngrades = curProfile.GetCompsDowngradeInfo(newProfile)
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR   File "/lib64/python3.8/site-packages/vmware/esximage/ImageProfile.py", line 2376, in GetCompsDowngradeInfo
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR     curComp = self.components.GetComponent(name)
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR   File "/lib64/python3.8/site-packages/vmware/esximage/Bulletin.py", line 1243, in GetComponent
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR     raise ValueError('Expected 1 component, found %u'
      • 2022-01-10T15:55:07Z lifecycle: 308060: imagemanagerctl:152 ERROR ValueError: Expected 1 component, found 2
  11. When the host is being moved/added to a VUM cluster, the following entries will be seen in /var/run/log/esxupdate.log on the ESXi.
    • esxupdate.log
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR: An unexpected exception was caught:
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR: Traceback (most recent call last):
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/usr/sbin/esxupdate", line 216, in main
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:     cmd.Run()
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/lib64/python3.8/site-packages/vmware/esx5update/Cmdline.py", line 153, in Run
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 965, in InstallVibsFromSources
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:     inst, removed, exitstate = self._installVibs(curprofile,
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1207, in _installVibs
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:     hasConfigDowngrade = checkFdmConfigDowngrade(curProfile, newProfile)
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/lib64/python3.8/site-packages/vmware/esximage/Transaction.py", line 1122, in checkFdmConfigDowngrade
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:     compDowngrades = curProfile.GetCompsDowngradeInfo(newProfile)
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/lib64/python3.8/site-packages/vmware/esximage/ImageProfile.py", line 2416, in GetCompsDowngradeInfo
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:     curComp = self.components.GetComponent(name)
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:   File "/lib64/python3.8/site-packages/vmware/esximage/Bulletin.py", line 1276, in GetComponent
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR:     raise ValueError('Expected 1 component, found %u'
      • 2022-01-10T17:37:04Z esxupdate: 136643: esxupdate: ERROR: ValueError: Expected 1 component, found 2
  12. Environment has HA enabled VUM cluster
    1. This cluster contains vSphere 7.0 u2c/u2d/u3/u3a host which has 2-component (i40en & i40enu) situation
    2. The user transitions this cluster to vLCM
    3. Post transition HA will invoke Apply HA API which will fail. 
    4. Because of this failure HA will raise alarms but the vSphere HA will be still functional
Screen Shot 2021-11-19 at 8.31.57 PM.png


Environment

VMware vSphere ESXi 7.0.x
VMware vSphere ESXi 7.0.3
VMware vSphere ESXi 7.0.2

Cause

  1. This issue occurs when both the i40en and i40enu VIBs for the inbox Intel network driver are installed on the host.

  2. In vSphere 7.0 Update 2, the driver was renamed from i40en to i40enu. Starting with vSphere 7.0 Update 3, the inbox i40enu network driver for ESXi changes back to i40en.

  3. The HA VIB Installation fails on the affected host.

Resolution

This issue is resolved in 7.0 U3C

Workaround:

For VLCM cluster, use any one of the following workarounds on the hosts that have the issue:

  1. Remove host from vLCM cluster, and upgrade to ESXi 7.0u3c via VUM, then add/move the host back to the vLCM cluster.
  2. If you do not want to upgrade the hosts, do a manual cleanup.
    • Disable vSphere HA on the cluster with the affected hosts.
    • On each host that has the issue:
      1. Put the ESXi in maintenance mode.
      2. Run the following command to remove the obsolete i40enu VIB:esxcli software vib remove --vibname=i40enu
      3. Once the VIB has been removed, reboot the ESXi host.
    • When this has been done for all hosts in the cluster which have the issue, HA can be enabled again on the cluster.
  3. Disable vSphere HA on the cluster, have a valid desired image of ESXi 7.0u3c, and continue with Image Apply to upgrade the hosts which have the issue to ESXi 7.0u3c.

This should turn HA status back to healthy.

Note: For VUM cluster, remediate the hosts which have the issue via rollup upgrade or ISO baseline upgrade to ESXi 7.0u3c.

NOTE :- The below applies to the issue reported to Point number#12 under the Symptoms section 

  1. Do not proceed with a vCenter upgrade (to any vCenter release) otherwise that will cause HA to be completely non-functional
  2. Upgrade the ESXi to vSphere 7.0u3c
  3. First update the vSphere 7.0 U2c/ U2d/ U3/ U3a host to 7.0u3c via vLCM
  4. Only post completion of step#2 mentioned above the Environment can proceed with vCenter upgrade