Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

When Active Server fails, under certain situations, split-brain detections may prevent failover

Purpose

This article provides information about how under certain situations, split-brain detection may prevent failover when the active server fails.

Resolution

In the following environment, split-brain detection, by design, prevents failover to the passive server and protect data integrity when the VMware channel fails. This prevents the passive server from becoming active with an already active Primary server.

Scenario

Primary Server
192.168.1.1

Secondary Server
192.168.2.1

Split-brain detection on the Secondary is set to monitor the following IP Address 192.168.1.1 and the Primary is configured to monitor 192.168.2.1.

Exception

If a BSOD, or other situation occurs on the active server where the principal (Public) network adapter is still responding but Windows has failed, a failover does not occur.

Impact: Clients cannot access the protected application, the Secondary server is still passive, and the Primary is displaying the BSOD.

VMware vCenter Server Heartbeat Split-brain detection can still ping the Primary Network IP address, as the NIC is still contactable on the network despite the BSOD. Failover does not occur because the passive server cannot use the active servers IP address as the IP address is still in use on the network. This is by design and manual intervention is required.

Recovery

To recover from a Windows server Blue Screen:

  1. The network IP address is still visible on the network.
  2. VMware vCenter Server Heartbeat is configured to check the principal (public) IP for Split-brain avoidance.
  3. Failover has notoccurred.
The following manual steps are required to restore client connectivity to the application. Data integrity has not been compromised by a Split-brain syndrome.  
  1. Shutdown the Primary-active failed server. Do not restart, as the Primary server returns as passive and VMware vCenter Server Heartbeat shutdowns if there is no active-passive configured servers present.
  2. The main network IP address is no longer visible on the Network and VMware vCenter Server Heartbeat initiates a failover to the Secondary passive server.
  3. Observe the Secondary server as it becomes active on the network. The protected application start normally if all other dependant services are available.
  4. Network clients now have access.
  5. Unplug the Primary server from network and start the server.
  6. Confirm that Primary is now passive. VMware vCenter Server Heartbeat by default starts as passive following a system failure.
  7. Re-Connect the network cable and allow VMware vCenter Server Heartbeat servers to Verify and Synchronize.
  8. Initiate a switchback to the original Primary-active and Secondary-passive mode when the Primary server has completed Verify and Synchronize and confirmed as operational.

Safe Guards

To help eliminate the downtime in this scenario, the configure alerting system to warn administrators that the channel has disconnected.

Additional Information

vCSHB-Ref-240

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback


Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
Actions