Search the VMware Knowledge Base (KB)
View by Article ID

Handling split-brain scenarios in vSAN Stretched Cluster version (2135952)

  • 1 Ratings
Language Editions

Purpose

The purpose of this article is to provide guidance on how to manage a split-brain scenario in a vSAN Stretched Cluster, in particular dealing with having two copies of a virtual machine running.

Resolution

Preferred and Secondary sites

In vSAN stretched clusters, a feature that is available with vSAN 6.1/vSphere 6.0U1 and later, there is the concept of a preferred site and a secondary site. The preferred site means that this is the site that the witness components (located on the witness host) will bind to in the event of an inter-site link failure between the data sites, in other words, when a split brain scenario arises. Thus, the preferred site will be the site where all the virtual machines have access to the majority of their components, meaning this will also be the site where all virtual machines will run if there is an inter-site link failure of that nature. vSphere HA will restart virtual machines (that were located on the secondary site) on the preferred site if this situation arises.

How split-brain ghost virtual machines occur

In the current version of vSAN stretched cluster, if there is an inter-site link failure and a split brain situation arises, the virtual machines located on the secondary site are started by vSphere HA on the preferred site. However, because the virtual machines on the secondary site no longer have access to the underlying storage, they cannot be powered off. This results in a  ghost copy of the virtual machine. This ghost virtual machine has no access to the storage, yet continues to run on the secondary site. In other (non-vSAN) stretched cluster deployments, for example vMSC, the VM Component Protection (VMCP) feature manages this situation and halts the ghost virtual machines. However, this feature is not supported in vSAN environments. 

Workaround script to halt ghost vSAN virtual machines

As a workaround, VMware provides a script that can be run on the secondary site ESXi hosts that will stop the ghost virtual machines.

Note: The killInaccessibleVms.py script is included in ESXi 6.0 patch ESXi600-201601001 and later installations. For more information, see VMware ESXi 6.0, Patch Release ESXi600-201601001 (2135114). If you are running this version of ESXi or later, you do not need to download the script. The script is located in the /usr/lib/vmware/vsan/bin/ directory.
 
This script must be placed on each ESXi host on the secondary site, and in the event of a split-brain, the administrator can run this on each of these ESXi hosts. The script checks to make sure that virtual machines are inaccessible. If the virtual machine is determined to be inaccessible, the administrator is prompted to remove the inaccessible virtual machines. 
 
Note: A similar situation can arise on the secondary site if both the link between the two data sites and the link between the primary and the witness host is lost. In such a case, the witness will pair up with the secondary site and the virtual machines on the preferred site lose access to the vSAN datastore. In this case, vSphere HA starts all of the virtual machines on the secondary site. Thus, the virtual machines on the preferred site become ghosts and the script workaround must be applied on the ESXi hosts on the preferred site.

Running the workaround script 

To run the halt ghost virtual machines script:
  1. Download the attached file 2135952_killInaccessibleVms.zip.
  2. Extract the script from the zip file and copy the killInaccessibleVms.py file to persistent non vSAN storage on the ESXi host you are working on.

    Note: If required, you can copy the killInaccessibleVms.py script to ESXi installation disk but it will not persist through a host reboot.

  3. Connect to the ESXi host shell and change directory to the location of the script. For more information, see Using ESXi Shell in ESXi 5.x and 6.0 (2004746).
  4. To run the killInaccessibleVms.py script, run this command:

    python killInaccessibleVms.py

  5. Follow the prompts from the killInaccessibleVms.py script to halt the ghost virtual machines.

Additional Information

For more information about vSAN stretched clusters, see the vSAN 6.1 Stretched Cluster Guide.

To be alerted when this article is updated, click Subscribe to Document in the Actions box.

See Also

Update History

08/15/2016 - Added VSAN 6.2 to Products. Added note about killInaccessibleVms.py script included in ESXi 6.0 patch ESXi600-201601001 and later.

Attachments

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 1 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 1 Ratings
Actions
KB: