"ql_fcoe_delayed_wq" in backtrace of ESXi host that has PSOD
search cancel

"ql_fcoe_delayed_wq" in backtrace of ESXi host that has PSOD

book

Article ID: 340280

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • ESXi host has many connection resets and All Paths Down (APD) or similar path down scenarios.
  • ESXi 6.5 ,ESXi 6.7 or 7.0 host experiences PSOD with references to the FCoE module (qfle3f) in the backtrace.
PSOD: Panic bora/vmkernel/main/dlmalloc.c:4908 - Corruption in DLMALLOC referencing details ql_fcoe_delayed_wq.
  • You see a backtrace similar to:
0x451b9fd9bd50:[0x418037d0ba15]PanicvPanicInt@vmkernel#nover+0x439 stack: 0x4302d004c490, 0x4180380a7558, 0x451b9fd9bdf8, 0x0, 0x100000001
 0x451b9fd9bdf0:[0x418037d0bc48]Panic_NoSave@vmkernel#nover+0x4d stack: 0x451b9fd9be50, 0x451b9fd9be10, 0x43120f780c20, 0x4180380a7539, 0x132c
 0x451b9fd9be50:[0x418037d54363]DLM_free@vmkernel#nover+0x6a8 stack: 0x43120f78acc0, 0x418037d51501, 0x5beea699da51a, 0x418037d15653, 0x0
 0x451b9fd9be70:[0x418037d51500]Heap_Free@vmkernel#nover+0x115 stack: 0x0, 0x43120f78acc0, 0x2f, 0x40000000, 0x0
 0x451b9fd9bec0:[0x418037c3d987]vmk_SpinlockDestroy@vmkernel#nover+0x48 stack: 0x43120f5df000, 0x418038ab09ed, 0x0, 0x418038abcb52, 0x43120f5df000
 0x451b9fd9bee0:[0x418038ab09ec]DeleteFabric@(qfle3f)#<None>+0x29 stack: 0x43120f5df000, 0x43120f5df200, 0x0, 0x418038ab2c00, 0x43120f5f3610
 0x451b9fd9bf40:[0x418038ab0bd9]_ReleaseFabricReference@(qfle3f)#<None>+0x2e stack: 0x43120f786000, 0x43120f786018, 0x1, 0x418038abc27b, 0x418038abc1f8
 0x451b9fd9bf70:[0x418038abc27a]ql_fcoe_do_singlethread_work@(qfle3f)#<None>+0x83 stack: 0x2f, 0x418037d2902f, 0x2f, 0x418038abc1f8, 0x418037d2902a
 0x451b9fd9bf90:[0x418037d2902e]vmkWorldFunc@vmkernel#nover+0x4f stack: 0x418037d2902a, 0x0, 0x451b8a6a3100, 0x451b9fda3000, 0x451b8a6a3100
 0x451b9fd9bfe0:[0x418037f0e322]CpuSched_StartWorld@vmkernel#nover+0x77 stack: 0x0, 0x0, 0x0, 0x0, 0x0


Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.7

Cause

When the FCoE FIP discovery process fails queuing of discovery timeout handler, resulting in failing of this iteration of discovery process as expected.

This issue occurs when a reference to that session object remains. This causes an incomplete cleanup of resources.

Resolution

This issue is resolved in

  • VMware vSphere ESXi 6.7 driver version 2.0.123.0, available at VMware Downloads
  • VMware vSphere ESXi 7.0 driver version 3.0.125.0, available at VMware Downloads

Note: For ESXi 6.5, contact the server hardware vendor.

 


    Workaround:
    To work around this issue:

    If you are not using FCoE for storage disable qfle3f
    1. Connect to the ESXi host with an SSH sessions.
    2. Run this command: 
    esxcli system module set --enabled=false --module=qfle3f
    1. Reboot the server for above command to take effect.
    If multiple FCoE VLANs are configured, remove multiple VLAN configuration on same fabric.

    Additional Information

    VMware Skyline Health Diagnostics for vSphere - FAQ

    For more information on how to install driver refer to Installing async drivers in ESXi 5.x/6.x/7.x using esxcli and offline bundle