Search the VMware Knowledge Base (KB)
View by Article ID

Understanding and troubleshooting Message Bus in VMware NSX for vSphere 6.x (2133897)

  • 2 Ratings
Language Editions

Purpose

This article provides information to understand and troubleshoot the Message Bus in VMware NSX for vSphere 6.x and to verify if a message bus malfunction is causing the issue.

This article can be referred when a communication issue between the NSX Manager and the ESXi hosts is responsible for these symptoms:
  • Publishing firewall rules fails
  • Some of the ESXi hosts does not have the VDR/LIF information configured through the NSX Manager

Resolution

Message Bus Overview

The NSX Manager web application and NSX components on ESXi hosts communicate with each other through a RabbitMQ broker process that runs on the same virtual machine as the NSX Manager web application. The communication protocol that is used is AMQP (Advanced Message Queueing Protocol) and the channel is secured using SSL. On an ESXi host, the VSFWD (vShield Firewall Daemon) process establishes and maintains the SSL connection to the broker and sends and receives messages on behalf of other components, which talks to it through IPC.

For more information, see the VMware NSX for vSphere (NSX-V) network Virtualization Guide.

Determining a Message Bus issue

If using NSX Manager 6.2.0 or higher, check the ESXi host's message bus status on the host preparation page in the NSX Manager User Interface (UI). If this status is marked Red, then there is an issue with the Message Bus. If the status is marked Green, then the Message Bus is healthy.

If using an older version of NSX Manager, check the system events for the suspected ESXi host on the NSX Manager User Interface (UI). If there is a Message Bus issue, you should expect to see an event with critical severity and event code 391002.

Note: It takes up to 6 minutes for this event to be emitted after a communication failure. Wait for about 6 minutes after you notice the suspicious symptom before checking. If no such event is found, then the issue is not related to the Message Bus.

When communication is restored, you should expect to see an event with informational severity and event code 391001.

Note: It takes up to 3 minutes for this event to be emitted after communication is restored. Also, if you need the mo-id of the suspected ESXi host (For example: host-123), log in as root on the ESXi host either using SSH or direct console and run this command:

esxcfg-advcfg -g /UserVars/RmqHostID

Troubleshooting Message Bus

Once you determine that there is a Message Bus issue on the ESXi host, you can do further troubleshooting:

  1. Verify that VSFWD is running on the ESXi host.

    Note: The process is automatically launched by the watchdog script and restarts the process if it terminates for an unknown reason.

    Run this command on each of ESXi hosts on the cluster:

    ps |grep vsfwd

    You see output similar to:

    ps |grep vsfwd

    107557 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
    107574 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
    107575 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
    107576 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
    107577 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
    107578 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd


  2. Verify the VSFWD connectivity to the RabbitMQ broker. Run this command on ESXi hosts to see a list of connections from the vsfwd process on the ESXi host to the NSX Manager.

    esxcli network ip connection list |grep 5671


    Note: Ensure that the port 5671 is open for communication in any of the external firewall on the environment. Also, there should be at least two connections on port 5671. There can be more connections on port 5671 as there are NSX Edge virtual machines deployed on the ESXi host which also establish connections to the RMQ broker.

  3. Verify if VSFWD is configured. Run this command on the ESXi hosts:

    esxcfg-advcfg -g /UserVars/RmqIpAddress

    The preceding command should display the NSX Manager IP address

  4. If you are using host-profile for this ESXi host, verify that RabbitMQ configuration is not set in the host profile.

    For more information, see:


  5. Verify if the RabbitMQ credentials of the ESXi host are out of sync with the NSX Manager. Download the NSX Manager Tech Support Logs. For more information, see Collecting diagnostic information for VMware NSX for vSphere 6.x (2074678). After gathering all the NSX Manager Tech Support logs, search all the logs for entries similar to:

    PLAIN login refused: user 'uw-host-420' - invalid credentials.

    Note: Replace host-420 with the mo-id of the suspect host.

    If such entries are found on the logs for the suspected ESXi host, resynchronize the message bus.

    Notes: Before performing the steps, ensure that:

    •     You have basic authorization with the NSX Manager Web credentials such as the admin user, or any vCenter Server user granted NSX privileges.
    •     Headers Content-type: application/xml and Accept: application/xml are used.

    You can use a REST client such as:

    •     https://addons.mozilla.org/en-US/firefox/addon/restclient
    •     https://chrome.google.com/webstore/detail/postman-rest-client/fdmmgilgnpjigdojojpjoooidkmcomcm?hl=en
    •     curl

    For more information on how to make API calls to the NSX Manager, see the Using the NSX REST API section in the VMware NSX for vSphere API Guide.

    To resynchronize the message bus, use REST API:

    Request:

    POST https://NSX_Manager_IP/api/2.0/nwfabric/configure?action=synchronize

    Request Body:

    <nwFabricFeatureConfig>
    <featureId>com.vmware.vshield.vsm.messagingInfra</featureId>
    <resourceConfig>
    <resourceId>{HOST/CLUSTER MOID}</resourceId>
    </resourceConfig>
    </nwFabricFeatureConfig>


    Note: To better understand the issue, collect the logs immediately after the Message Bus is resynchronized.

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 2 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 2 Ratings
Actions
KB: