Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

Recovering from a failover

Purpose

This article provides information about the failover process and recovering from a failover.

Resolution

Failover Process

When the passive server detects that the active server is no longer running properly, it assumes the role of the active server by initiating a failover and takes the following steps:

  1. It applies any intercepted updates that are currently saved in the passive server queue.

    The size of the passive server queue influences the length of time it takes to complete the failover process.

  2. The passive server changes its role and mode of operation from passive to active.

    The server’s principal (public) identity is enabled. This principal (public) IP address can only be enabled on one of the two servers at any time. When the public identity is enabled, any clients that were connected to the server before the failover are able to reconnect.

  3. The newly active server starts intercepting updates to the protected data. Any updates to the protected data are saved in the local active server queue.
  4. The now active server starts all protected applications. The applications are able to use the replicated application data to recover, then accept re-connections from any clients. At this stage, the originally active server is offline. The originally passive server has taken over the role of the active server and is running the protected applications. As the originally active server stopped abruptly, the protected applications may have lost some data held in the active server queue.

Recovering from a Failover

This recovery scenario is based on VMware vCenter Server in a default configuration with the Primary server as active and the Secondary server as passive.

A failover has occurred and the Secondary server is now running as the active server.

  1. Check your event logs on both servers to determine the cause of the failover. If you are unsure how to do this, use the VMware Log Collector tool to collect information and send the output to VMware Support. For more information, see Retrieving the VMware vCenter Server Heartbeat Logs and other useful information for support purposes (1008124).

    If any of the following has occurred (on the Primary Server), performing a switchover back to the Primary server may not be possible until other important actions are carried out. VMware vCenter Server Heartbeat must not be restarted until these issues have been resolved: 
    • Hard Disk Failure - Disk may need replacing.
    • Power Failure - Power may need to be restored to the Primary server.
    • Virus - Server must be cleaned of all viruses before starting VMware vCenter Server Heartbeat.
    • Communications - Physical network hardware may need replaced.
    • Blue Screen - Cause must be determined and resolved.

  2. Run the Server Configuration wizard and ensure the server is set to Primary and passive. Click Finish to accept the changes.
  3. Disconnect the channel network cables or disable the network card.
  4. Resolve the problem – list of possible failures etc.
  5. Reboot this server and reconnect or again enable the network card.
  6. After the reboot, check that the Taskbar icon now reflects the changes by showing P / - (Primary and passive)
  7. On the Secondary active server or from a remote client, Launch the VMware vCenter Server Heartbeat Console and confirm that the Secondary server is reporting as active.

If the Secondary server is not displaying as active:

  1. If the VMware vCenter Server Heartbeat Console is unable to connect remotely, try running it locally. If you are still unable to connect locally, check the service is running via the Service Control Manager. If it is not, check the event logs for a cause.
  2. Run the Server Configuration wizard and ensure that the server is set to Secondary and active. Click Finish to accept the changes.
  3. Determine if the protected application is accessible from clients. If it is, start VMware vCenter Server Heartbeat on the Secondary server.

    If the application is not accessible, check the application logs to determine why the application is not running.

  4. Run the Server Configuration wizard and check that the server is set to Secondary and active.
  5. Click Finish to accept any changes.

    At this stage, you are now ready to start VMware vCenter Server Heartbeat on the Secondary active server.

    Note: The data on this server is the most up to date and this server is also be the live server on your network. When VMware vCenter Server Heartbeat starts, it overwrites all the protected data (configured in the File Filter list) on the Primary passive server. If there were problems starting the applications on the Secondary/active server, they must be investigated. If the applications are running, proceed to the next step

  6. Start VMware vCenter Server Heartbeat on the Secondary active server and check that the Taskbar icon now reflects the correct status by showing S / A (Secondary and active).

Additional Information

Do not confuse a failover with a switchover. A switchover is a controlled switch (initiated from the VMware vCenter Server Heartbeat Console) between the Primary and Secondary servers. A failover may happen when one or all of the following have suffered a failure on the active server: application, power, hardware, or communications. The passive server counts a preconfigured number of missed heartbeats before beginning a failover, and when this happens, it automatically assumes the active role and starts to execute the protected applications.
 
vCSHB-Ref-820

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback


Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
Actions