Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

ESX host fails with Lost Heartbeat purple diagnostic screen mentioning VmkDev: 3102: Can't find serial# (2009232)

Symptoms

  • A VMware ESX host fails with a purple diagnostic screen. The screen has this message:

    COS Panic: Lost heartbeat @ esxsc_panic+0x43/0x4f

  • The vmkernel log extracted from the zdump contains messages similar to:

    cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn6 from virt handle 2000
    cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn7 from virt handle 2000
    cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn8 from virt handle 2000
    ...
    cpu0:4096)ALERT: VMNIX: ALERT: HB: 362: Lost heartbeat (comm=ProcessName pid=x t=29 to=30 clt=1).
    cpu0:4096)VMNIX: VmkDev: 2747: a/r=2 cmd=0xnn sn=nnnnnnnnn9 dsk=vsaN:N:N reqbuf=nnnnnnn (sg=n)


  • The service console logs in /var/log/messages or on the console contain messages similar to:

    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn1 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn2 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn3 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn4 after 360s
    <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn5 after 360s


  • The SCSI command serial numbers logged are greater than 4294967295.

Purpose

This article describes a specific issue. If you experience all of the above symptoms, consult the sections below. If you experience some but not all of these symptoms, your issue is not related to this article. Search the KB for your symptoms, see the related articles Interpreting an ESX host purple diagnostic screen (1004250)  and Understanding a "Lost Heartbeat" purple diagnostic screen (1009525) for more information, or Open a Support Request

Cause

This issue occurs when the SCSI command serial numbers begin to exceed 4294967295 and completed SCSI commands do not return values to the service console.

Resolution

This issue is resolved in ESX 4.1 Update 3.  To download ESX 4.1 Update 3, see the VMware Download Center.

For more information, see  VMware ESX 4.1, Patch ESX410-201208201-UG: Updates the VMware ESX 4.1 Core and CIM components (2020337).

Note: This issue does not affect ESXi, because ESXi does not have a console operating system.

To work around this issue when you do not want to upgrade:

  1. Monitor the value of the current SCSI command serial number counter, and ensure it does not reach the max value, 4294967295.

    To check how close the counter is to that value, use this script:

    bootDev=$(df | grep " /$" | awk '{print $1}' | sed -e 's!/dev/!!' -e 's/[0-9]*$//')
    devRead=$(grep " ${bootDev} " /proc/diskstats | awk '{print $4}')
    devWrite=$(grep " ${bootDev} " /proc/diskstats | awk '{print $8}')
    (( devIO = devRead + devWrite ))
    (( microFull = devIO / 42950 ))
    percentFull=$(echo ${microFull} | awk '{printf "%04s",$1}' | sed 's/\(...\)$/.\1/')
    echo Percent Full: ${percentFull}%


    The output is the percentage of that max value which the counter has reached.

    Example: Percent Full: 69.848%. This indicates that the counter is at 2999948756 (69.848% of 4294967295).

  2. Schedule a reboot of the host before the counter fills completely.

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 3 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 3 Ratings
Actions
KB: