Search the VMware Knowledge Base (KB)
View by Article ID

Simulating VMware High Availability failover (2056634)

  • 3 Ratings

Purpose

This article describes several methods to simulate VMware High Availability (HA) failover in your environment for cluster testing purposes.

Note: This article assumes that you are working with an HA enabled cluster in vCenter Server consisting of 2 ESXi/ESX hosts, where the Management Network is uplinked in a redundant vmnic configuration.
 
Caution: VMware recommends you perform this simulation only in a Testing or Development environment. DO NOT perform this simulation in a Production environment.

Resolution

You can simulate HA failover depending on the version of vSphere deployed in the environment.

Select a method below depending on your installed version of vSphere.

Method 1

For a vSphere 4.x environment where you are running HA based on AAM and have two redundant NICs for the Management Network, you can physically disconnect the patch cable where these physical NICs are uplinked.

Alternatively, you can issue a command to your switch software to disconnect the ports. This simulates a host isolation event since vCenter Server is not communicating with the hosts. Furthermore, the hosts in the cluster have the AAM agent running. The agent is designed to monitor the uptime of neighboring hosts in the cluster. If the master host of the cluster detects that the host you have disconnected is isolated, it restarts its virtual machines on surviving hosts in the cluster. Ensure that your HA cluster settings have the appropriate Host Isolation Response setting, as this type of host outage is considered to be a Network Isolation.

Method 2

In vSphere 5.x, HA is provided by the Fault Domain Manager (FDM) agent deployed on each of the HA cluster hosts. FDM is used where both Network and Datastore Heartbeats are used to determine the availability of a host, and in determining types host failure, whether that is a physically failed host or a Network Isolation type of failure. The FDM agent on secondary hosts report uptime information to the master host's FDM agent. The master host communicates with vCenter Server to report the uptime of itself and all secondary hosts.

For example, there are two ESXi/ESX hosts with two vmnics in a redundant NIC team serving Management Network traffic. These hosts are also sharing a single shared datastore. You want the virtual machines to failover to the surviving host in the cluster.

To prepare the environment for failover simulation:

  1. Log in to the vCenter Server with the vSphere Client.
  2. Edit the Cluster Settings.
  3. Under vSphere HA settings, change the Datastore heartbeat to None. Ensure no datastores are selected from the available list, and select Select only from my preferred datastores.

    To disrupt the communication between a single host and the vCenter Server, you can physically disconnect the patch cable where these physical NICs are uplinked. Alternatively, you may issue a command to your switch software to disconnect the ports.

    When the network communication between the host and the master host is disrupted (or the master host and vCenter Server if this host is the master) is disrupted, vCenter Server waits for the timeout period where it does not receive communication from the host it is managing, and then declares the host as Isolated. This causes all virtual machines to register and restart on the surviving host.

Method 3

As mentioned in Method 1 and 2, disconnecting the network to forcibly disrupt communication between master and secondary hosts is an option in simulating HA failover. However, to simulate a power-outage or hardware fault type of failure, hard power off the host physically or by using a remote management application such as KVM, DRAC, iLO, or RAS.

Method 4

Note: Use of this method may require re-installation of ESXi/ESX if the kernel module is not properly disabled/re-enabled. When disabling the kernel module for the physical NIC, you lose all remote management through the ESXi Service Console, and can only remotely manage the host through KVM, DRAC, iLO, or RAS. Be sure to have physical access to the host if a remote management application is not available.

Method 4 allows one to simulate a network isolation again, but this time by disabling the physical NIC (vmnic) driver module from the VMkernel, instead of physically disconnecting a patch cable or interrupting connectivity at the physical switch layer.

First determine which module is in use by the physical NIC by using one of these articles, depending on your installed vSphere version:
Next, disable the module by running the command:
  • For vSphere 5.x: esxcli system module set --disabled module_name
  • For vSphere 4.x: esxcfg-module -disable module_name

Impact/Risks

Caution: VMware recommends you perform this simulation only in a Testing or Development environment. DO NOT perform this simulation in a Production environment.

See Also

This Article Replaces

2056849

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 3 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 3 Ratings
Actions
KB: