HCX - NE appliances in HA mode experience intermittent failover
search cancel

HCX - NE appliances in HA mode experience intermittent failover

book

Article ID: 321573

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

To identify a known issue with HCX NE Appliance running HA mode and how to recover.


Symptoms:

HCX Network Extension (NE) Appliance VMs running in High Availability (HA) mode may experience intermittent Standby to Active failover.
 
 


Cause

HA Active & Standby NE Appliance VMs exchanges BFD heartbeat packets periodically to ensure valid communication between both appliances and also help maintains HA relationship.
When an NE appliance is running version 4.8.0, the BFD software process may become blocked during runtime and the BFD heartbeats are no longer processed. This in turn results in a BFD software process restart and causes HA failover event between Active & Standby pair of NE Appliances at both sites.
 
IMPORTANT: Based on current implementations, the BFD software process may take some time to be fully blocked and not very frequent.

Resolution

This issue is fixed in HCX version 4.8.2.

Workaround:
If HCX Manager & NE Appliances already upgraded and running with version 4.8.0, one of the below approaches can be followed to avoid HA failover and subsequent network glitch over extended datapath:
  • Identify Standby NE Appliance VM for a given HA group and power off the VM using vCenter UI to avoid the intermittent failover during the event.
Note: Perform this operation on both Source/Destination sites.
Or,
  • Disable High Availability (HA) and run NE in the Standalone mode.
Note: All existing Network Extensions will need to be un-extended first from HA pair and then re-extended on a Standalone NE Appliance which will incur some downtime during the unstretch/re-stretch workflow.
IMPORTANT: Once Extensions are running on Standalone NE Appliance, similar step needs to be performed once this issue will be fixed in later version.
 
Alternatively,
  • If HCX Manager planned for an upgrade due to system Or migration requirements, DO NOT upgrade NE Appliances in the Service Mesh (SM) to 4.8.0 if Network Extension HA needs to be functional.
  • If HCX Connector/Cloud Manager already upgraded to 4.8.0, DO NOT upgrade NE Appliances running in HA mode to version 4.8.0.
Note: IX (Interconnect) Appliance can be upgraded to version 4.8.0 without any issues.

Additional Information

Impact/Risks:
  • During HA Failover, a network glitch may be observed while the Standby takes over as Active and ready for data forwarding.
  • Extended Datapath over Standalone NE Appliance will not be impacted.
  • Migration services will not be impacted.