Knowledge Base

|
ESX host fails with Lost Heartbeat purple diagnostic screen mentioning VmkDev: 3102: Can't find serial# (2009232)
Symptoms
- A VMware ESX host fails with a purple diagnostic screen. The screen has this message:
COS Panic: Lost heartbeat @esxsc_panic+0x43/0x4f
- The vmkernel log extracted from the zdump contains messages similar to:
cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn6 from virt handle 2000
cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn7 from virt handle 2000
cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn8 from virt handle 2000
...
cpu0:4096)ALERT: VMNIX: ALERT: HB: 362: Lost heartbeat (comm=ProcessName pid=x t=29 to=30 clt=1).
cpu0:4096)VMNIX: VmkDev: 2747: a/r=2 cmd=0xnn sn=nnnnnnnnn9 dsk=vsaN:N:N reqbuf=nnnnnnn (sg=n)
- The service console logs in /var/log/messages or on the console contain messages similar to:
<3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn1 after 360s
<3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn2 after 360s
<3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn3 after 360s
<3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn4 after 360s
<3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn5 after 360s
- The SCSI command serial numbers logged are greater than 4294967295.
Purpose
This article describes a specific issue. If you experience all of the above symptoms, consult the sections below. If you experience some but not all of these symptoms, your issue is not related to this article. Search the KB for your symptoms, see the related articles Interpreting an ESX host purple diagnostic screen (1004250) and Understanding a "Lost Heartbeat" purple diagnostic screen (1009525) for more information, or Open a Support Request.
Cause
This issue occurs when the SCSI command serial numbers begin to exceed 4294967295 and completed SCSI commands do not return values to the service console.
Resolution
This issue is resolved in ESX 4.1 Update 3. To download ESX 4.1 Update 3, see the VMware Download Center.
For more information, see VMware ESX 4.1, Patch ESX410-201208201-UG: Updates the VMware ESX 4.1 Core and CIM components (2020337).
Note: This issue does not affect ESXi, because ESXi does not have a console operating system.
To work around this issue when you do not want to upgrade:
- Monitor the value of the current SCSI command serial number counter, and ensure it does not reach the max value, 4294967295.
To check how close the counter is to that value, use this script:bootDev=$(df | grep " /$" | awk '{print $1}' | sed -e 's!/dev/!!' -e 's/[0-9]*$//')
devRead=$(grep " ${bootDev} " /proc/diskstats | awk '{print $4}')
devWrite=$(grep " ${bootDev} " /proc/diskstats | awk '{print $8}')
(( devIO = devRead + devWrite ))
(( microFull = devIO / 42950 ))
percentFull=$(echo ${microFull} | awk '{printf "%04s",$1}' | sed 's/\(...\)$/.\1/')
echo Percent Full: ${percentFull}%
The output is the percentage of that max value which the counter has reached.
Example:Percent Full: 69.848%. This indicates that the counter is at 2999948756 (69.848% of 4294967295).
- Schedule a reboot of the host before the counter fills completely.
See Also
Request a Product Feature
- Updated:
- Categories:
- Languages:
- Product Family:
- Product(s):
- Product Version(s):

