Understanding and troubleshooting Message Bus in VMware NSX for vSphere 6.x (2133897)
This article can be referred when a communication issue between the NSX Manager and the ESXi hosts is responsible for these symptoms:
- Publishing firewall rules fails
- Some of the ESXi hosts does not have the VDR/LIF information configured through the NSX Manager
Message Bus Overview
The NSX Manager web application and NSX components on ESXi hosts communicate with each other through a RabbitMQ broker process that runs on the same virtual machine as the NSX Manager web application. The communication protocol that is used is AMQP (Advanced Message Queueing Protocol) and the channel is secured using SSL. On an ESXi host, the VSFWD (vShield Firewall Daemon) process establishes and maintains the SSL connection to the broker and sends and receives messages on behalf of other components, which talks to it through IPC.For more information, see the VMware NSX for vSphere (NSX-V) network Virtualization Guide.
Determining a Message Bus issueIf using NSX Manager 6.2.0 or higher, check the ESXi host's message bus status on the host preparation page in the NSX Manager User Interface (UI). If this status is marked Red, then there is an issue with the Message Bus. If the status is marked Green, then the Message Bus is healthy.
If using an older version of NSX Manager, check the system events for the suspected ESXi host on the NSX Manager User Interface (UI). If there is a Message Bus issue, you should expect to see an event with critical severity and event code 391002.
Note: It takes up to 6 minutes for this event to be emitted after a communication failure. Wait for about 6 minutes after you notice the suspicious symptom before checking. If no such event is found, then the issue is not related to the Message Bus.
When communication is restored, you should expect to see an event with informational severity and event code 391001.
Note: It takes up to 3 minutes for this event to be emitted after communication is restored. Also, if you need the mo-id of the suspected ESXi host (For example: host-123), log in as root on the ESXi host either using SSH or direct console and run this command:
esxcfg-advcfg -g /UserVars/RmqHostID
Troubleshooting Message Bus
Once you determine that there is a Message Bus issue on the ESXi host, you can do further troubleshooting:
- Verify that VSFWD is running on the ESXi host.
Note: The process is automatically launched by the watchdog script and restarts the process if it terminates for an unknown reason.
Run this command on each of ESXi hosts on the cluster:
ps |grep vsfwd
You see output similar to:
ps |grep vsfwd
107557 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
107574 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
107575 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
107576 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
107577 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
107578 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
- Verify the VSFWD connectivity to the RabbitMQ broker. Run this command on ESXi hosts to see a list of connections from the vsfwd process on the ESXi host to the NSX Manager.
esxcli network ip connection list |grep 5671
Note: Ensure that the port 5671 is open for communication in any of the external firewall on the environment. Also, there should be at least two connections on port 5671. There can be more connections on port 5671 as there are NSX Edge virtual machines deployed on the ESXi host which also establish connections to the RMQ broker.
- Verify if VSFWD is configured. Run this command on the ESXi hosts:
esxcfg-advcfg -g /UserVars/RmqIpAddress
The preceding command should display the NSX Manager IP address
- If you are using host-profile for this ESXi host, verify that RabbitMQ configuration is not set in the host profile.
For more information, see:
- Deploying VXLAN through Auto Deploy and VMware NSX for vSphere 6.x (2092871)
- Distributed Firewall (DFW) rules fail to process traffic even after successfully publishing the rules in VMware NSX for vSphere 6.x (2125901)
- Verify if the RabbitMQ credentials of the ESXi host are out of sync with the NSX Manager. Download the NSX Manager Tech Support Logs. For more information, see Collecting diagnostic information for VMware NSX for vSphere 6.x (2074678). After gathering all the NSX Manager Tech Support logs, search all the logs for entries similar to:
PLAIN login refused: user 'uw-host-420' - invalid credentials.
Note: Replace host-420 with the mo-id of the suspect host.
If such entries are found on the logs for the suspected ESXi host, resynchronize the message bus.
Notes: Before performing the steps, ensure that:
- You have basic authorization with the NSX Manager Web credentials such as the admin user, or any vCenter Server user granted NSX privileges.
- Headers Content-type: application/xml and Accept: application/xml are used.
You can use a REST client such as:
For more information on how to make API calls to the NSX Manager, see the Using the NSX REST API section in the VMware NSX for vSphere API Guide.
To resynchronize the message bus, use REST API:
Note: To better understand the issue, collect the logs immediately after the Message Bus is resynchronized.