Search the VMware Knowledge Base (KB)
View by Article ID

Troubleshooting NSX for vSphere 6.x Edge appliance (2140009)

  • 3 Ratings
Language Editions

Purpose

This article provides information about troubleshooting the VMware NSX for vSphere 6.x Edge appliance.

Resolution

IMPORTANT: Please note that this knowledge base article is no longer being updated. For the most up-to-date information, see the latest version of the NSX Troubleshooting Guide.

Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document, to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.

Installation and Upgrade issues

Configuration issues

Firewall issues

  • If there are inactivity time out issues and you are noticing that applications are idle for a long time, increase inactivity timeout settings using REST API. For more information, see vCNS/NSX Edge Firewall TCP Timeout Values (2101275).
  • Starting with VMware NSX for vSphere 6.2.3, the default TCP Established Timeout has been increased from 3600 to 21600.

Edge Firewall Packet Drop Issues

If you are experiencing packet drops:

  1. Check the firewall rules table with the show firewall command. The usr_rules table displays configured rules. Also collect the show ipset command output.

  2. Check for an incrementing value of DROP invalid rule in the POST_ROUTING section in the show firewall ? command. Typical suspected reasons include asymmetric routing issues or TCP-based applications which have been inactive for more than one hour. Further evidence of asymmetric routing issues include:

    • Ping works in one direction and fails in the other direction
    • Ping works, while TCP does not work


                 
  3. Enable logging on a particular firewall rule using REST API or the Edge user interface and monitor the logs with the show log follow command.
     
    Note: If logs are not seen, enable logging on DROP Invalid Rule using the REST API.
     
    URL : https://NSX_Manager_IP/api/4.0/edges/{edgeId}/firewall/config/global

    PUT Method
    Input representation
    <globalConfig>   <!-- Optional -->
    <tcpPickOngoingConnections>false</tcpPickOngoingConnections>   <!-- Optional. Defaults to false -->
    <tcpAllowOutOfWindowPackets>false</tcpAllowOutOfWindowPackets>    <!-- Optional. Defaults to false -->
    <tcpSendResetForClosedVsePorts>true</tcpSendResetForClosedVsePorts>    <!-- Optional. Defaults to true -->
    <dropInvalidTraffic>true</dropInvalidTraffic>    <!-- Optional. Defaults to true -->
    <logInvalidTraffic>true</logInvalidTraffic>     <!-- Optional. Defaults to false -->
    <tcpTimeoutOpen>30</tcpTimeoutOpen>       <!-- Optional. Defaults to 30 -->
    <tcpTimeoutEstablished>3600</tcpTimeoutEstablished>   <!-- Optional. Defaults to 3600 -->
    <tcpTimeoutClose>30</tcpTimeoutClose>   <!-- Optional. Defaults to 30 -->
    <udpTimeout>60</udpTimeout>             <!-- Optional. Defaults to 60 -->
    <icmpTimeout>10</icmpTimeout>           <!-- Optional. Defaults to 10 -->
    <icmp6Timeout>10</icmp6Timeout>           <!-- Optional. Defaults to 10 -->
    <ipGenericTimeout>120</ipGenericTimeout>    <!-- Optional. Defaults to 120 -->
    </globalConfig>
    Output representation
    No payload


    Use the show log follow command to look for logs similar to:

    2016-04-18T20:53:31+00:00 edge-0 kernel: nf_ct_tcp: invalid TCP flag combination IN= OUT= SRC=172.16.1.4 DST=192.168.1.4 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=43343 PROTO=TCP SPT=5050 DPT=80 SEQ=0 ACK=1572141176 WINDOW=512 RES=0x00 URG PSH FIN URGP=0
    2016-04-18T20:53:31+00:00 edge-0 kernel: INVALID IN= OUT=vNic_1 SRC=172.16.1.4 DST=192.168.1.4 LEN=40 TOS=0x00 PREC=0x00 TTL=63 ID=43343 PROTO=TCP SPT=5050 DPT=80 WINDOW=512 RES=0x00 URG PSH FIN URGP=0


  4. Check for matching connections in the Edge firewall state table with the show flowtable rule_id command:


     
    Compare the active connection count and the maximum allowed count with the show flowstats command:

    For example:

    NvShieldEdge> show flowstats
    Total Flow Capacity: 1000000
    Current Statistics :
    entries 76
    searched 31
    found 13985
    new 12657
    invalid 0
    ignore 413
    delete 12567
    delete_list 11846
    insert 11937
    insert_failed 0
    drop 0
    early_drop 0
    icmp_error 0
    expect_new 1
    expect_create 2
    expect_delete 2
    search_restart 0


  5. Check the Edge logs with the show log follow command for any ALG drops. Search for strings similar to tftp_alg, msrpc_alg or oracle_tns. For additional information, see:

Serviceability Enhancements in VMware NSX for vSphere 6.2.3

Starting with NSX for vSphere 6.2.3, the command show packet drops is introduced. This command displays drop counters for these:

  • Interface
  • Driver
  • L2
  • L3
  • Firewall

To perform this operation:

  1. Log in to the NSX Edge CLI and enter basic mode. For more information, see the NSX Command Line Interface Reference.
  2. Run the show packet drops command.

    For example:

    show packet drops

    vShield Edge Packet Drop Stats:

    Driver Errors
    =============
              TX      TX    TX   RX   RX      RX
    Interface Dropped Error Ring Full Dropped Error Out Of Buf
    vNic_0    0       0     0    0    0       0
    vNic_1    0       0     0    0    0       0
    vNic_2    0       0     0    0    0       2
    vNic_3    0       0     0    0    0       0
    vNic_4    0       0     0    0    0       0
    vNic_5    0       0     0    0    0       0

    Interface Drops
    ===============
    Interface RX Dropped TX Dropped
    vNic_0             4          0
    vNic_1          2710          0
    vNic_2             0          0
    vNic_3             2          0
    vNic_4             2          0
    vNic_5             2          0

    L2 RX Errors
    ============
    Interface length crc frame fifo missed
    vNic_0         0   0     0    0      0
    vNic_1         0   0     0    0      0
    vNic_2         0   0     0    0      0
    vNic_3         0   0     0    0      0
    vNic_4         0   0     0    0      0
    vNic_5         0   0     0    0      0

    L2 TX Errors
    ============
    Interface aborted fifo window heartbeat
    vNic_0          0    0      0         0
    vNic_1          0    0      0         0
    vNic_2          0    0      0         0
    vNic_3          0    0      0         0
    vNic_4          0    0      0         0
    vNic_5          0    0      0         0

    L3 Errors
    =========
    IP:
     ReasmFails : 0
     InHdrErrors : 0
     InDiscards : 0
     FragFails : 0
     InAddrErrors : 0
     OutDiscards : 0
     OutNoRoutes : 0
     ReasmTimeout : 0
    ICMP:
     InTimeExcds : 0
     InErrors : 227
     OutTimeExcds : 0
     OutDestUnreachs : 152
     OutParmProbs : 0
     InSrcQuenchs : 0
     InRedirects : 0
     OutSrcQuenchs : 0
     InDestUnreachs : 151
     OutErrors : 0
     InParmProbs : 0

    Firewall Drop Counters
    ======================

    Ipv4 Rules
    ==========
    Chain - INPUT
    rid pkts bytes target prot opt in out source    destination
    0    119 30517 DROP   all  --   *   * 0.0.0.0/0 0.0.0.0/0    state INVALID
    0      0     0 DROP   all  --   *   * 0.0.0.0/0 0.0.0.0/0
    Chain - POSTROUTING
    rid pkts bytes target prot opt in out source    destination
    0    101 4040  DROP   all   --  *   * 0.0.0.0/0 0.0.0.0/0    state INVALID
    0      0    0  DROP   all   --  *   * 0.0.0.0/0 0.0.0.0/0

    Ipv6 Rules
    ==========
    Chain - INPUT
    rid pkts bytes target prot opt in out source destination
    0      0     0   DROP  all      *   * ::/0   ::/0            state INVALID
    0      0     0   DROP  all      *   * ::/0   ::/0
    Chain - POSTROUTING
    rid pkts bytes target prot opt in out source destination
    0      0     0   DROP  all       *   * ::/0   ::/0           state INVALID
    0      0     0   DROP  all       *   * ::/0   ::/0

Edge Routing Connectivity issues

To investigate packet drops which may be leading to latency issues:

  1. Initiate controlled traffic from a client using the ping destination_IP_address command.
  2. Capture traffic simultaneously on both interfaces, write the output to a file, and export it using SCP.

    For example:

    Capture the traffic on the ingress interface with this command:

    debug packet display interface vNic_0 –n_src_host_1.1.1.1

    Capture the traffic on the egress interface with this command:

    debug packet display interface vNic_1 –n_src_host_1.1.1.1

    For simultaneous packet capture, use the ESXi packet capture utility pktcap-uw tool in ESXi. For more information, see Using the pktcap-uw tool in ESXi 5.5 and later (2051814).

    If the packet drops appear more randomly, check for configuration errors related to:
    • IP addresses and routes
    • Firewall rules or NAT rules
    • Asymmetric routing
    • RP filter checks

    1. Check interface IP/Subnets with this command:

      show interface

    2. If there are missing routes at the data plane, check at Multipath Source Routing (MSR) for dynamic routes and VSM for static routes by running these commands:
      • show ip route
      • show ip route static
      • show ip route bgp
      • show ip route ospf

    3. Check the Route Table for needed routes by running this command:

      show ip forwarding

    4. If you have multiple paths, check the RP Filter status by running this command:

      show rpfilter

      To check for RPF statistics, run this command:

      show rpfstats

    If the packet drops appear more randomly, check for resource limitations:

    For CPU or memory usage, run these commands:

    • show system cpu
    • show system memory
    • show system storage
    • show process monitor
    • top

    Note: For ESXi, run the esxtop command as per the letter n.


High CPU Utilization

If you are experiencing high CPU utilization on the NSX Edge, verify the performance of the appliance using the esxtop command on the ESXi host. Review these knowledge base articles and verify if this applies to your issue:

For more information, see Interpreting esxtop Statistics.

A high value for the ksoftirqd process indicates a high incoming packet rate. Check whether logging is enabled on the data path, such as for firewall rules. Run the show log follow command to determine whether a large number of log hits are being recorded.

NSX Manager and Edge Communication Issues

The NSX Manager communicates with Edge through the VIX or Message Bus. It is chosen by the NSX Manager when Edge is deployed and never changes.

VIX

  • VIX is used for vShield/NSX Edge if ESXi host is not prepared
  • The NSX Manager gets host credential from the vCenter Server to connect to the ESXi host first.
  • The NSX Manager uses the Edge credentials to log in to the Edge appliance.
  • the vmtoolsd process on the Edge handles the VIX communication.

VIX failures occurs because of:

  • The NSX Manager cannot communicate with the vCenter Server.
  • The NSX Manager cannot communicate with the ESXi hosts.
  • There are NSX Manager internal issues.
  • There are Edge internal issues.

VIX Debugging

Check for VIX errors VIX_E_xxxx in the NSX Manager logs to narrow down the cause. Look for these errors similar to:

Vix Command 1126400 failed, reason com.vmware.vshield.edge.exception.VixException: vShield Edge:10013:Error code 'VIX_E_FILE_NOT_FOUND' was returned by VIX API.:null

Health check failed for edge  edge-13 VM vm-5025 reason: com.vmware.vshield.edge.exception.VixException: vShield Edge:10013:Error code 'VIX_E_VM_NOT_RUNNING' was returned by VIX API.:null


Note: In general, if the same failure occurs for many Edges at the same time, the issue is not on the Edge side.

Edge Diagnosis

  • Check if vmtoolsd is running with this command:

    show process list



  • Check if Edge is in good state by running this command:

    show eventmgr

    Note: Also, you can use the show eventmgr command to verify that the query/command is received and processed:



    If the show eventmgr command is not available, check the Edge logs. For more information, see Collecting diagnostic information for VMware NSX Edge (2079380).

Edge Recovery

  • If vmtoolsd is not running or the Edge is in a bad state, reboot the Edge.
  • Also, to recover from a crash, a reboot should also be sufficient. A redeploy should not be required.

    Note: All logging information from the old edge when a redeploy is done.

    To debug a kernel crash, You need to obtain:

    1. Either the vmss (VM suspend) or vmsn (VM snapshot) file for the Edge VM while it is still in the crashed state. If there is a vmem file, this is also needed. This can be use to extract a kernel core dump file which VMware Support can analyze.
    2. The Edge support log, generated right after the crashed Edge has been rebooted (but not redeployed). For more information, see Collecting diagnostic information for VMware NSX Edge (2079380).
    3. A screenshot of the Edge console would also be handy, although this does not usually contain the complete crash report. For additional information, see the Edge Appliance Troubleshooting section of the NSX Troubleshooting Guide.

Message Bus Debugging

Message Bus is used for NSX Edge when ESXi hosts are prepared. When you encounter issues, the NSX Manager logs may contain entries similar to:

GMT ERROR taskScheduler-6 PublishTask:963 - Failed to configure VSE-vm index 0, vm-id vm-117, edge edge-5. Error: RPC request timed out

This issue occurs because:

  • Edge being in a bad state
  • Message Bus connection is broken

To diagnose the issue on the Edge:

  • To check rmq connectivity, run this command:

    show messagebus messages

    For example:

    Message bus is enabled
    cmd conn state : listening
    init_req       : 1
    init_resp      : 1
    init_req_err   : 0
    init_resp_err  : 0

    cmd_req        : 362
    cmd_resp       : 361

    em_req         : 361
    em_resp        : 360
    em_req_err     : 0
    em_resp_invalid: 0
    em_resp_timeout: 0
    em_resp_err    : 0

    cmd_ch_conn    : 1
    cmd_login_fail : 0
    msg_thr_rstart : 0
    -----------------------
    evt conn state : listening
    vse_rx         : 223719
    vse_rx_hc      : 223717
    vse_rx_evt     : 2

    vse_tx_hc      : 223717
    vse_tx_evt     : 2

    evt_rsp        : 2


  • To check vmci connectivity, run this command:

    show messagebus forwarder

    For example:

    Forwarder Command Channel
    vmci_conn          : up
    app_client_conn    : up
    vmci_rx            : 74427
    vmci_tx            : 74446
    vmci_rx_err        : 0
    vmci_tx_err        : 0

    vmci_closed_by_peer: 8 <---- Number of times connection closed by host agent. If this number keeps on increasing and vmci conn is down,
                                                                   host agent cannot connect to RMQ broker. Look for repeated errors on the Edge logs for VmciProxy: [daemon.debug] VMCI Socket is closed by peer

    vmci_tx_no_socket  : 0
    app_rx             : 74446
    app_tx             : 74427
    ..
    app_conn_req       : 5
    app_closed_by_peer : 0
    -----------------------
    Forwarder Event Channel
    vmci_conn          : up
    app_client_conn    : up
    vmci_rx            : 22494
    vmci_tx            : 224001
    vmci_rx_err        : 0
    vmci_tx_err        : 0
    vmci_closed_by_peer: 6
    vmci_tx_no_socket  : 0
    app_rx             : 224001
    app_tx             : 22494
    ..
    app_conn_req       : 2
    app_closed_by_peer : 0
    app_tx_no_socket   : 0r


  • To check vmci connectivity, run these commands:

    • show messagebus forwarder
    • show log follow

    Note: If the commands displays any issues, check the Edge logs.

To diagnose the issue on the ESXi host:

  • To check if the ESXi host connects to the RMQ broker, run this command:

    esxcli network ip connection list |grep 5671

    For example:

    esxcli network ip connection list | grep 5671

    tcp         0       0  10.32.43.4:43329  10.32.43.230:5671    ESTABLISHED     35854  newreno  vsfwd         
    tcp         0       0  10.32.43.4:52667  10.32.43.230:5671    ESTABLISHED     35854  newreno  vsfwd         
    tcp         0       0  10.32.43.4:20808  10.32.43.230:5671    ESTABLISHED     35847  newreno  vsfwd         
    tcp         0       0  10.32.43.4:12486  10.32.43.230:5671    ESTABLISHED     35847  newreno  vsfwd

Note: The preceding links were correct as of April 19, 2016. If you find the link is broken, provide a feedback and a VMware employee will update the link.

Additional Information

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 3 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 3 Ratings
Actions
KB: