One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI
book
Article ID: 318308
calendar_today
Updated On:
Products
VMware NSX Networking
Issue/Introduction
Symptoms:
NSX-T version 3.1.0 or 3.1.1.
ESXi version 7.0 Update 2 or above.
One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI.
ESXi host logs (nsx-syslog) display message(s) similar to:
2021-05-23T02:40:03Z nsx-sha: NSX 2104585 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="WARNING" invalid="true"] Exit SHA process as continuously encountering OSError - [Errno28] No space left on device, trace:Traceback (most recent call last): File "/usr/lib/vmware/netopa/lib/python/sha/contrib/metric/utils/_command.py", line 33, in run_command output = ForkServer.check_output( File usr/lib/vmware/netopa/lib/python/sha/forkserver/_fork_server.py", line 871, in check_output raise e OSError: [Errno 28] No space left on device ^@
There is no dataplane impact observed but the "Unknown" Node status may prevent upgrades due to health checks failing.
Environment
VMware NSX-T Data Center VMware NSX-T Data Center 3.x
Cause
This issue is caused by a memory leak in the SHA (System Health Agent) process on the ESXi host which is used to report information to the NSX Manager such as NSX services status, hyperbus status, uplink status etc. As a result when the SHA service stops running due to the memory leak, the ESXi host status will be shown as Unknown in the NSX Manager UI and other status report to the NSX Manager will fail. This issue does not impact the dataplane but only the reporting of the ESXi to the Manager.
Resolution
This issue is resolved in NSX-T 3.1.2, available at VMware Downloads.
Workaround: To workaround the issue you can restart the netopa service on the ESXi host using the following command, note that this is only a temporary workaround and the issue will occur again: #/etc/init.d/netopad restart