One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI
search cancel

One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI

book

Article ID: 318308

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • NSX-T version 3.1.0 or 3.1.1.
  • ESXi version 7.0 Update 2 or above.
  • One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI.
  • ESXi host logs (nsx-syslog) display message(s) similar to:
2021-05-23T02:40:03Z nsx-sha: NSX 2104585 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="WARNING" invalid="true"] Exit SHA process as continuously encountering OSError - [Errno28] No space left on device, trace:Traceback (most recent call last):   File "/usr/lib/vmware/netopa/lib/python/sha/contrib/metric/utils/_command.py", line 33, in run_command     output = ForkServer.check_output(   File usr/lib/vmware/netopa/lib/python/sha/forkserver/_fork_server.py", line 871, in check_output     raise e OSError: [Errno 28] No space left on device ^@
  • There is no dataplane impact observed but the "Unknown" Node status may prevent upgrades due to health checks failing.


Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Cause

This issue is caused by a memory leak in the SHA (System Health Agent) process on the ESXi host which is used to report information to the NSX Manager such as NSX services status, hyperbus status, uplink status etc. As a result when the SHA service stops running due to the memory leak, the ESXi host status will be shown as Unknown in the NSX Manager UI and other status report to the NSX Manager will fail. This issue does not impact the dataplane but only the reporting of the ESXi to the Manager.

Resolution

This issue is resolved in NSX-T 3.1.2, available at VMware Downloads.

Workaround:
To workaround the issue you can restart the netopa service on the ESXi host using the following command, note that this is only a temporary workaround and the issue will occur again:
#/etc/init.d/netopad restart