Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

ESX host fails with Lost Heartbeat purple diagnostic screen mentioning VmkDev: 3102: Can't find serial# (2009232)

Symptoms

  • A VMware ESX host fails with a purple diagnostic screen. The screen has this message:

    COS Panic: Lost heartbeat @esxsc_panic+0x43/0x4f

  • The vmkernel log extracted from the zdump contains messages similar to:

    cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn6 from virt handle 2000
    cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn7 from virt handle 2000
    cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn8 from virt handle 2000
    ...
    cpu0:4096)ALERT: VMNIX: ALERT: HB: 362: Lost heartbeat (comm=ProcessName pid=x t=29 to=30 clt=1).
    cpu0:4096)VMNIX: VmkDev: 2747: a/r=2 cmd=0xnn sn=nnnnnnnnn9 dsk=vsaN:N:N reqbuf=nnnnnnn (sg=n)


  • The service console logs in /var/log/messages or on the console contain messages similar to:

    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn1 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn2 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn3 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn4 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn5 after 360s


  • The SCSI command serial numbers logged are greater than 4294967295.

Cause

This issue occurs when the SCSI command serial numbers begin to exceed 4294967295 and completed SCSI commands do not return values to the service console.

Resolution

This issue is resolved in ESX 4.1 Update 3. For more information see:

To download ESX 4.1 Update 3, see the VMware Download Center.

Note
: This issue does not affect ESXi, because ESXi does not have a console operating system.

To work around this issue when you do not want to upgrade:

  1. Monitor the value of the current SCSI command serial number counter, and ensure it does not reach the max value, 4294967295.

    To check how close the counter is to that value, use this script:

    bootDev=$(df | grep " /$" | awk '{print $1}' | sed -e 's!/dev/!!' -e 's/[0-9]*$//')
    devRead=$(grep " ${bootDev} " /proc/diskstats | awk '{print $4}')
    devWrite=$(grep " ${bootDev} " /proc/diskstats | awk '{print $8}')
    (( devIO = devRead + devWrite ))
    (( microFull = devIO / 42950 ))
    percentFull=$(echo ${microFull} | awk '{printf "%04s",$1}' | sed 's/\(...\)$/.\1/')
    echo Percent Full: ${percentFull}%


    The output is the percentage of that max value which the counter has reached.

    Example: Percent Full: 69.848%. This indicates that the counter is at 2999948756 (69.848% of 4294967295).

  2. Schedule a reboot of the host before the counter fills completely.

See Also

Update History

06/13/2013 - Added link to ESX 4.1 U3 release notes

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 5 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 5 Ratings
Actions
KB: