Search the VMware Knowledge Base (KB)
View by Article ID

Troubleshooting the ESXi Dump Collector service in VMware vSphere 5.x (2003042)

  • 4 Ratings

Symptoms

During a purple diagnostic screen outage on a vSphere ESXi 5.0 host, you experience these symptoms:

  • The ESXi Dump Collector fails to receive a coredump.
  • Connectivity to the ESXi Dump Collector service fails.
  • You see entries similar to:

    Starting network coredump from HostIP to DumpCollectorIP.
    Netdump: FAILED: Couldn't attach to dump server at IP DumpCollectorIP.
    Stopping Netdump.

    Dump: nnn: APR timed out for IP DumpCollectorIP.

Purpose

This article provides steps for troubleshooting the ESXi Dump Collector (netdump) functionality in vSphere 5.0.

Prior to an outage, ensure that the ESXi host's netdump client and the Dump Collector server are configured correctly, and host can establish network connectivity to the collector server on the configured UDP port. This port is required to test the connectivity while the ESXi host is up and running.
There is very little troubleshooting that can be done from the ESX/ESXi host netdump client at the time of an outage. If a host has halted with a purple diagnostic screen and was unable to reach the Dump Collector server, the dump fails and cannot be retried.
 
 
If a dump could not be saved to a diagnostic partition on disk, or captured through the Network Dump Collector, see Using the local debugger to review logs after an ESXi host fails with a purple diagnostic screen (2003067).

Resolution

To troubleshoot the network Dump Collector (netdump) functionality in vSphere 5.x:

  1. Confirm that the netdump Dump Collector server is started, is listed on the network, and has sufficient space to store received coredumps.

    For more information, see:

  2. Confirm that the host is correctly configured to send coredumps over the network using the netdump protocol. For more information, see Configuring an ESXi 5.0 host to capture a VMkernel coredump from a purple diagnostic screen via the Network Dump Collector (2002955).
  3. In ESXi 5.1, check the functionality of netdump by running the command:

    esxcli system coredump network check

    You see this output when core dump transmission is successful:

    Verified the configured netdump server is running

    You see this output when coredump transmission is unsuccessful:

    Attempt to contact configured netdump server failed: Configured netdump server did not respond in a timely manner

  4. Confirm that the host is able to connect to the remote netdump Dump Collector service on the configured UDP port, and that the Dump Collector service reports the test connection:

    1. Open a ESXi Shell console session to the host. For more information, see Using Tech Support Mode in ESXi 4.1 and ESXi 5.x (1017910).
    2. Refresh the firewall rules so the changes take effect by running the command:

      esxcli network firewall refresh

    3. Determine the VMkernel interface name and destination IP address configured for sending network coredumps by running the command:

      esxcli system coredump network get

      You see output similar to:

      Enabled: True
      Host VNic: vmk0
      Network Server IP: 10.11.12.13
      Network Server Port: 6500

      Note: In ESXi 5.0, VMkernel ports that use virtual switch VLAN tagging may require further configuration. For more information, see Mixed vSphere 5.0 and 5.1 environments behind VLAN require configuration changes (2032821).

    4. Determine the IPv4 Address of the VMkernel interface configured to send network coredumps by running the command:

      esxcli network ip interface ipv4 get --interface-name=vmk0

      You see output similar to:

      Name  IPv4 Address  IPv4 NetMask  IPv4 Broadcast  Address Type  DHCP DNS
      ----  ------------  ------------  --------------  ------------  --------
      vmk0 10.55.66.77   255.0.0.0     10.255.255.255  STATIC           false


    5. Send test traffic from the ESXi host to the Dump Collector service at the IP Address and Port by running the command:

      nc -z -w 1 -s VMkernelIPAddress -u DumpCollectorIPAddress DumpCollectorPortNumber

      For example:

      nc -z -w 1 -s 10.55.66.77 -u 10.11.12.13 6500

      Note: The nc command reports a successful connection regardless of whether the remote Netdump Server receives the traffic.

    6. Review the logs from the receiving Dump Collector service for messages indicating that the connection was established.

      For example, the vCenter Server 5.0 Dump Collector logs report the unknown client connection with a message similar to:
    7. <YYYY-MM-DD>T<TIME>:SS.nnnZ| netdumper| Bad magic:0xa656761. Expected:0xadeca1bf
    8. <YYYY-MM-DD>T<TIME>:SS.nnnZ| netdumper| Skipping bad packet.

  5. Review the logs from the receiving Dump Collector service for any errors. For more information, see Location of vSphere ESXi Dump Collector log files (2003277).

    Note: The date and timestamps in the Dump Collector logs and received zdump filenames reflect the time that on the server running the Dump Collector, not the time on the ESXi host which supplied the coredump.

    A normal startup of the Dump Collector service logs is similar to:

    netdumper| Log for vmware-netdumper pid=PidNumber version=VVV build=build-BBBBB option=Release
    netdumper| The process is 32-bit.
    netdumper| Host codepage=UTF-8 encoding=UTF-8
    log FIFO capture : Msg_Reset:
    log FIFO capture : [msg.dictionary.load.openFailed] Cannot open file "/path": No such file or directory
    netdumper| Configured to handle 1024 clients in parallel.
    netdumper| Configuring /path/to/coredump/storage as the directory to store the cores
    netdumper| Configured to use ListeningIPAddress:Port as the IP address:port
    netdumper| Using /var/log/vmware/netdumper/netdumper.log as the logfile.
    netdumper| Configure to daemonize netdumper


    Note: The msg.dictionary.load.openFailed entries refer to several configuration files that do not exist. This is normal.

  6. During an outage, use the console VMkernel log viewer to review log messages on the screen leading up to the outage. This may indicate a reason for network coredump failure. For more information, see Using the local debugger to review logs after an ESXi host fails with a purple diagnostic screen (2003067).

    A normal netdump appears similar to:

    netdumper| Starting network coredump from VMkernelIPAddress to DumpCollectorIPAddress.
    netdumper| Dump: nnnn: Compressed dump took bbbbbb bytes total.
    netdumper| NetDump: Successful.
    netdumper| Stopping Netdump.


  7. If network connectivity from the ESX/ESXi host to the Network Dump Collector server succeeds, but the host is unable to reach the server during a critical fault and purple diagnostic screen, the network connection may be impacted by the outage. In this case, rely on disk-based and serial-port-based methods of capturing diagnostic information. For more information, see Configuring an ESX/ESXi host to capture a VMkernel coredump from a purple diagnostic screen (1000328) and Enabling serial-line logging for an ESX or ESXi host (1003900).

Tags

network-dump-collector-fails

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 4 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 4 Ratings
Actions
KB: