tmpfs partition on DLR’s and Edges in NSX 6.3.6 and NSX 6.4.1, configured with HA can get full, preventing any configuration changes
search cancel

tmpfs partition on DLR’s and Edges in NSX 6.3.6 and NSX 6.4.1, configured with HA can get full, preventing any configuration changes

book

Article ID: 327287

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
Any addition / deletion / modification of configuration on the NSX Edge / DLR will fail.

NSX Manager logs begin reporting the following messages for the edge/DLR:

2018-07-16 08:03:34.361 AEST  INFO messagingTaskExecutor-7 VseRpcResponseHandler:111 - Received empty response for request f3244e8c-918c-49d0-8df9-40b95aca4c88 from appliance: 501d5055-8dbd-2b7a-888b-d4845c55249c, vm vm-2879
2018-07-16 08:03:34.362 AEST ERROR http-nio-127.0.0.1-7441-exec-7976 BaseRestController:452 - REST API failed : 'null'
java.lang.NullPointerException
 

/run/vmware/vshield/cmdOut/ha.cid.debug file grows and fills up the temporary file system of DLRs and Edge VMs.

The issue affects only the Edges and DLRs configured in HA.
The ha.cid.debug file creation is triggered only after an HA event (like HA failover or split-brain), and it can take approximately four weeks for the ha.cid.debug file to fill up edge tmpfs on a DLR or compact edge. Thus, it could be weeks after the original HA event that the customer notices this issue. 

 
No datapath impact is expected when the file system is full, only new configuration changes would fail for the affected edges.


Resolution

This issue is resolved in: 

VMware NSX for vSphere 6.3.7, available at VMware Downloads
VMware NSX for vSphere 6.4.2, available at VMware Downloads.

Workaround: The Active Edge VM would need to be rebooted to get them back into a working state.

Additional Information

The file ha.cid.debug does not exist until an HA event happens, i.e. a failover or split-brain recovery.
If you deploy a fresh Edge with HA enabled, the file will not exist until you trigger a failover.
/run/vmware/vshield/cmdOut/ha.cid.Out file filled with following logs on Edge appliances.
This rsync lacks old-style --compress due to its external zlib.  Try -zz. Continuing without compression.

This issue is seen in NSXv 6.3.6, 6.4.x.