Transport Node (Edge or hypervisor) connectivity to NSX Manager is Down
search cancel

Transport Node (Edge or hypervisor) connectivity to NSX Manager is Down

book

Article ID: 330538

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • Transport Node (Edge or hypervisor) connectivity to NSX Manager is Down.
  • The NSX Manager "publish_fqdns" setting is set to true
GET /api/v1/configs/management
{
  "publish_fqdns" : true, <---
  "_revision": 1
}
  • NSX Manager logs (syslog.log) display message(s) indicating DNS reverse queries were failing when  "publish_fqdns" was set to true, similar to:
2019-09-09T09:23:55.369Z ERROR ExecutorChannel-1814785934 BrokerManager - SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP2119" subcomp="manager"] Reverse DNS lookup for IP 192.168.120.12 failed due to timeout or missing DNS configuration
  • Transport Node logs (syslog.log) repeatedly display message(s) similar to:
<179>1 2019-10-01T08:48:20.030750Z edge01.corp.local NSX 1554 - [nsx@6876 comp="nsx-edge" subcomp="mpa" tid="17006" level="ERROR" errorCode="MPA111"] Heartbeat message: configuration_hash mismatch new [d5c9aa2842c58ddf14119dcee626a4948eb5c4ff] -> old [798331249130c958689ede2c53748cb2b41d0e17]
<179>1 2019-10-01T08:48:20.030768Z edge01.corp.local NSX 1554 - [nsx@6876 comp="nsx-edge" subcomp="mpa" tid="17006" level="ERROR" errorCode="MPA110"] Missed or failed heartbeat. Signal ERROR to FE & Restart
<182>1 2019-10-01T08:48:20.030862Z edge01.corp.local NSX 1554 - [nsx@6876 comp="nsx-edge" subcomp="mpa" tid="1554" level="INFO"] Error/Cleanup handler. Signaling FE to restart


Environment

VMware NSX-T Data Center

Cause

When setting "publish_fqdns" the NSX Managers perform reverse DNS queries to resolve the FQDN of the other NSX Managers. If an NSX Manager is not able to resolve DNS queries at this point, this will result in a configuration mismatch between the NSX Manager and the Transport Nodes connecting to it.
This configuration mismatch will cause the connection between the Transport Node and NSX Manager to flap and eventually result in the Transport Node been disconnected from the NSX Manager.

Resolution

To Resolve the issue:

1. Ensure all the NSX Manager can perform forward and reverse DNS lookups successfully for all NSX Managers (including self).
Example:
root@nsx-mngr-01:~# nslookup 192.168.120.10 
Server:         192.168.110.10
Address:        192.168.110.10#53

10.120.168.192.in-addr.arpa     name = nsx-mngr-01.corp.local.


root@nsx-mngr-01:~# nslookup 192.168.120.11 
Server:         192.168.110.10
Address:        192.168.110.10#53

11.120.168.192.in-addr.arpa     name = nsx-mngr-02.corp.local.


root@nsx-mngr-01:~# nslookup 192.168.120.12
Server:         192.168.110.10
Address:        192.168.110.10#53

12.120.168.192.in-addr.arpa     name = nsx-mngr-03.corp.local.


2. Set the "publish_fqdns" back to false and to true again using the following REST API:
GET /api/v1/configs/management
PUT /api/v1/configs/management
Example Request:
{
  "publish_fqdns" : true,
  "_revision": 0 <-- match the number is the GET API
}