Search the VMware Knowledge Base (KB)
View by Article ID

Performing a Reconfigure for VMware HA operation on a master node causes an unexpected virtual machine failover (2017778)

  • 6 Ratings

Symptoms

  • When you perform a Reconfigure for VMware HA operation on the master node in an HA cluster, an unexpected virtual machine failover occurs for the virtual machines running on that master node.

  • The vCenter Server events tab displays messages similar to:

    vCenter Server is disconnected from a master HA agent running on host <master hostname> in HA_DRS_Cluster in Datacenter - vSphere HA agent on <master hostname> in cluster HA_DRS_Cluster in Datacenter is disabled

    The vSphere HA availability state of the host <master hostname> in cluster in HA_DRS_Cluster in Datacenter has changed to Uninitialized

    The vSphere HA availability state of the host <slave hostname> in cluster in HA_DRS_Cluster in Datacenter has changed to Election

    vSphere HA unsuccessfully failed over <virtual machine> on <slave hostname> in cluster HA_DRS_Cluster in Datacenter. vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: The operation is not allowed in the current state.

Cause

When the master HA host is manually reconfigured for HA, it causes the remaining slaves to enter an election to find a new master host.

The newly elected master places the virtual machines that were running on the old master in an unknown power state, and waits for up to 10 seconds for notification that the virtual machines on the old master are powered on and running.

If the old master does not become a slave within that 10-second interval, the new master assumes that the virtual machines are down, and attempts to restart them. This causes a false failover event to occur, and consequently the failover task fails because the virtual machines were never powered off. The virtual machines remain unaffected in this scenario.

Resolution

To resolve is issue, increase the monitor period:

  1. In vCenter, right-click the cluster and select Edit Settings.
  2. Click vSphere HA and then Advanced Options.
  3. Add a new option:

    das.config.fdm.policy.unknownStateMonitorPeriod = 10

    Change the value from 10 to 30.

  4. Disable and re-enable HA settings of the cluster.

Impact/Risks

Increasing the monitor period also increases the time to start virtual machine failovers by the same amount (in this case, by 20 seconds) when a master node stops during an actual HA failure.

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 6 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 6 Ratings
Actions
KB: