ESXi host with a CD-ROM drive model DU-8A5LH can cause PSOD with vmhci_driver backtrace
search cancel

ESXi host with a CD-ROM drive model DU-8A5LH can cause PSOD with vmhci_driver backtrace

book

Article ID: 318727

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To avoid the PSOD.

Symptoms:
  • ESXi Host using CD-ROM drive model DU-8A5LH  fails with PSOD referencing vmw_ahci with lines such as:

PanicvPanicInt@vmkernel#nover+0x545 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Panic_WithBacktrace@vmkernel#nover+0x56 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Heartbeat_DetectCPULockups@vmkernel#nover+0x4be xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Timer_BHHandler@vmkernel#nover+0xdc stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
BH_DrainAndDisableInterrupts@vmkernel#nover+0x7b stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
IntrCookie_VmkernelInterrupt@vmkernel#nover+0xc6 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
IDT_IntrHandler@vmkernel#nover+0x9d stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
gate_entry_@vmkernel#nover+0x0 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Power_ArchSetCState@vmkernel#nover+0x10a stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSchedIdleLoopInt@vmkernel#nover+0x39b stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSchedDispatch@vmkernel#nover+0x114a stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSchedWait@vmkernel#nover+0x27a stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSchedTimedWaitInt@vmkernel#nover+0xa8 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSched_EventQueueWaitShared@vmkernel#nover+0x2c stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
UserThread_QueueWait@(user)#<None>+0x34 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LinuxThread_Futex@(user)#<None>+0x273 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
User_LinuxSyscallHandler@(user)#<None>+0x113 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
User_LinuxSyscallHandler@vmkernel#nover+0x1d stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
gate_entry_@vmkernel#nover+0x0 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


and

Util_FormatTimestampUTC@vmkernel#nover+0x1e stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LogFormatStringV@vmkernel#nover+0x9c stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LogWarningWithPcpu@vmkernel#nover+0x40f stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
vmk_vLogNoLevel@vmkernel#nover+0x63 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
vmk_LogNoLevel@vmkernel#nover+0x3e stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CompletionBottomHalf@(vmw_ahci)#<None>+0x69b stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
HBAIntrHandler@(vmw_ahci)#<None>+0x84 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AHCI_EdgeIntrHandler@(vmw_ahci)#<None>+0x20 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
IntrCookieBH@vmkernel#nover+0x1e0 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
BH_Check@vmkernel#nover+0xfe stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSchedDispatch@vmkernel#nover+0xed4 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSchedWait@vmkernel#nover+0x27a stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSched_NoEvqWait@vmkernel#nover+0x19 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
vmk_WorldWait@vmkernel#nover+0x65 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ExceptionHandlerWorld@(vmw_ahci)#<None>+0x9b stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
vmkWorldFunc@vmkernel#nover+0x4f stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CpuSched_StartWorld@vmkernel#nover+0x99 stack: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • vmkernel.log entries such as:

2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_UNK_FIS exception.
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: LogExceptionSignal:Port 7, Signal:  --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x10
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_UNK_FIS exception.
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: LogExceptionSignal:Port 7, Signal:  --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x10
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_UNK_FIS exception.
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: LogExceptionSignal:Port 7, Signal:  --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: HBAIntrHandler:new interrupts coming, PxIS = 0x10, no repeat
2018-12-14T13:51:11.701Z cpu11:66006)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x10



Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere 7.0.x
VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5

Cause

ESXi host with a CD-ROM drive model DU-8A5LH, the CD-ROM drive might report an unknown File Interchange Service (FIS) exception. The vmw_ahci driver does not handle the exception properly and creates repeated PORT_IRQ_UNK_FIS exception logs in the kernel. The repeated logs cause lack of physical CPU heartbeat

Resolution

VMware Engineering are aware of this issue.

Workaround:
  • Disable vmw_ahci via the following command from the ESXi cmdline:
esxcli system module set -e false -m vmw_ahci
  • Reboot the host


Additional Information

Impact/Risks:
No Impact