Troubleshooting the ESXi Dump Collector service in VMware vSphere 5.x (2003042)
During a purple diagnostic screen outage on a vSphere ESXi 5.0 host, you experience these symptoms:
- The ESXi Dump Collector fails to receive a coredump.
- Connectivity to the ESXi Dump Collector service fails.
- You see entries similar to:
Starting network coredump from HostIP to DumpCollectorIP.
Netdump: FAILED: Couldn't attach to dump server at IP DumpCollectorIP.
Dump: nnn: APR timed out for IP DumpCollectorIP.
This article provides steps for troubleshooting the ESXi Dump Collector (netdump) functionality in vSphere 5.0.
To troubleshoot the network Dump Collector (netdump) functionality in vSphere 5.x:
- Confirm that the netdump Dump Collector server is started, is listed on the network, and has sufficient space to store received coredumps.
For more information, see:
- Confirm that the host is correctly configured to send coredumps over the network using the netdump protocol. For more information, see Configuring an ESXi 5.0 host to capture a VMkernel coredump from a purple diagnostic screen via the Network Dump Collector (2002955).
- In ESXi 5.1, check the functionality of netdump by running the command:
esxcli system coredump network check
You see this output when core dump transmission is successful:
Verified the configured netdump server is running
You see this output when coredump transmission is unsuccessful:
Attempt to contact configured netdump server failed: Configured netdump server did not respond in a timely manner
- Confirm that the host is able to connect to the remote netdump Dump Collector service on the configured UDP port, and that the Dump Collector service reports the test connection:
- Open a ESXi Shell console session to the host. For more information, see Using Tech Support Mode in ESXi 4.1 and ESXi 5.x (1017910).
- Refresh the firewall rules so the changes take effect by running the command:
esxcli network firewall refresh
- Determine the VMkernel interface name and destination IP address configured for sending network coredumps by running the command:
esxcli system coredump network get
You see output similar to:
Host VNic: vmk0
Network Server IP: 10.11.12.13
Network Server Port: 6500
Note: In ESXi 5.0, VMkernel ports that use virtual switch VLAN tagging may require further configuration. For more information, see Mixed vSphere 5.0 and 5.1 environments behind VLAN require configuration changes (2032821).
- Determine the IPv4 Address of the VMkernel interface configured to send network coredumps by running the command:
esxcli network ip interface ipv4 get --interface-name=vmk0
You see output similar to:
Name IPv4 Address IPv4 NetMask IPv4 Broadcast Address Type DHCP DNS
---- ------------ ------------ -------------- ------------ --------
vmk0 10.55.66.77 255.0.0.0 10.255.255.255 STATIC false
- Send test traffic from the ESXi host to the Dump Collector service at the IP Address and Port by running the command:
nc -z -w 1 -s VMkernelIPAddress -u DumpCollectorIPAddress DumpCollectorPortNumber
nc -z -w 1 -s 10.55.66.77 -u 10.11.12.13 6500
nccommand reports a successful connection regardless of whether the remote Netdump Server receives the traffic.
- Review the logs from the receiving Dump Collector service for messages indicating that the connection was established.
For example, the vCenter Server 5.0 Dump Collector logs report the unknown client connection with a message similar to:
- <YYYY-MM-DD>T<TIME>:SS.nnnZ| netdumper| Bad magic:0xa656761. Expected:0xadeca1bf
- <YYYY-MM-DD>T<TIME>:SS.nnnZ| netdumper| Skipping bad packet.
- Review the logs from the receiving Dump Collector service for any errors. For more information, see Location of vSphere ESXi Dump Collector log files (2003277).
Note: The date and timestamps in the Dump Collector logs and received zdump filenames reflect the time that on the server running the Dump Collector, not the time on the ESXi host which supplied the coredump.
A normal startup of the Dump Collector service logs is similar to:
netdumper| Log for vmware-netdumper pid=PidNumber version=VVV build=build-BBBBB option=Release
netdumper| The process is 32-bit.
netdumper| Host codepage=UTF-8 encoding=UTF-8
log FIFO capture : Msg_Reset:
log FIFO capture : [msg.dictionary.load.openFailed] Cannot open file "/path": No such file or directory
netdumper| Configured to handle 1024 clients in parallel.
netdumper| Configuring /path/to/coredump/storage as the directory to store the cores
netdumper| Configured to use ListeningIPAddress:Port as the IP address:port
netdumper| Using /var/log/vmware/netdumper/netdumper.log as the logfile.
netdumper| Configure to daemonize netdumper
msg.dictionary.load.openFailedentries refer to several configuration files that do not exist. This is normal.
- During an outage, use the console VMkernel log viewer to review log messages on the screen leading up to the outage. This may indicate a reason for network coredump failure. For more information, see Using the local debugger to review logs after an ESXi host fails with a purple diagnostic screen (2003067).
A normal netdump appears similar to:
netdumper| Starting network coredump from VMkernelIPAddress to DumpCollectorIPAddress.
netdumper| Dump: nnnn: Compressed dump took bbbbbb bytes total.
netdumper| NetDump: Successful.
netdumper| Stopping Netdump.
- If network connectivity from the ESX/ESXi host to the Network Dump Collector server succeeds, but the host is unable to reach the server during a critical fault and purple diagnostic screen, the network connection may be impacted by the outage. In this case, rely on disk-based and serial-port-based methods of capturing diagnostic information. For more information, see Configuring an ESX/ESXi host to capture a VMkernel coredump from a purple diagnostic screen (1000328) and Enabling serial-line logging for an ESX or ESXi host (1003900).
- Configuring an ESXi/ESX host to capture a VMkernel coredump from a purple diagnostic screen (1000328)
- Collecting diagnostic information from an ESX or ESXi host that experiences a purple diagnostic screen (1004128)
- Configuring an ESXi 5.x host to capture a VMkernel coredump from a purple diagnostic screen via the Network Dump Collector (2002955)
- Using the local debugger to review logs after an ESXi host fails with a purple diagnostic screen (2003067)