"state in doubt; requested fast path state update" error in ESXi
search cancel

"state in doubt; requested fast path state update" error in ESXi

book

Article ID: 318844

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article explains why you may see the state in doubt; requested fast path state update entries in the /var/log/vmkernel.log file and what the message means.

Symptoms:
In the /var/log/vmkernel.log file of the ESXi host, you see entries similar to:

<YYYY-MM-DD>T<time> esx12 vmkernel: 116:03:44:19.039 cpu4:4196)<6>qla2xxx 0000:0f:00.0: scsi(6:0:152): Abort command issued -- 1 67a23dcd 2002.
<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100020e0b00) to NMP device "sym.029010111831353837" failed on physical path "vmhba2:C0:T0:L152" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "sym.029010111831353837" state in doubt; requested fast path state update...</time>


Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.0
VMware vSphere ESXi 8.0
VMware vSphere ESXi 7.0

Resolution

These messages appear as the Host Bus Adapter (HBA) driver aborts a command because the command took longer than the timeout period of 5 seconds to complete. An operation can take longer than the timeout period because of several reasons including:
  • Array backup operations (LUN backup, replication, etc.)
  • General overload on the array
  • Read/Write Cache on the array (misconfiguration, lack of cache, etc.)
  • Incorrect tiered storage used (SATA over SCSI)
  • Fabric issues (Bad ISL, outdated firmware, bad fabric cable/GBIC)
Note: The preceding list does not cover the full list of potential reasons as discussing the causes for the Qlogic abort message is not the focus of this Knowledge Base article.
 
This message indicates that the command was aborted for scsi(6:0:152), which translates to LUN 152:
 
<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4196)<6>qla2xxx 0000:0f:00.0: scsi(6:0:152): Abort command issued -- 1 67a23dcd 2002
 
This is our mid-layer noticing that a command aborted, specifically command 0x2a, which is a 10 byte WRITE command. Our mid-layer returns a Host status of 0x2, which translates to DID_BUS_BUSY. This is a catch-all message produced as a direct result of the QLogic driver aborting the command:
 
<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100020e0b00) to NMP device "sym.029010111831353837" failed on physical path "vmhba2:C0:T0:L152" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
 
When a command is aborted, our mid-layer issues a TUR or TEST_UNIT_READY command down the path in which the command did not complete to ensure that this path is still good to use. A TUR command is issued every 300 seconds down each path as part of the path evaluation code (Disk.PathEvalTime). However, in this instance a TUR command is issued immediately due to the failed command.
 
<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "sym.029010111831353837" state in doubt; requested fast path state update...


Additional Information