Search the VMware Knowledge Base (KB)
View by Article ID

Issuing a 0x85 SCSI command from a VMware ESXi 6.0 host with the EMC XtremIO storage array may result in a PDL error (2133286)

  • 16 Ratings

Symptoms

When a VMware vSphere ESXi 6.0 host requests SMART data from a EMC XtremIO storage array, a response may be received from the storage array that can trigger a Permanent Device Loss (PDL) condition.
  • In the /var/log/vmkernel.log file on the ESXi Host, you see entries similar to:

    2015-07-23T20:34:05.108Z cpu2:33198)WARNING: NMP: nmp_PathDetermineFailure:2872: Cmd (0x85) PDL error (0x5/0x25/0x0) - path vmhba4:C0:T0:L10 device naa.514f0c514ba0000e - triggering path evaluation
    2015-07-23T20:34:05.108Z cpu2:33198)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x85 (0x439e16768f40, 34616) to dev "naa.514f0c514ba0000e" on path "vmhba4:C0:T0:L10" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0. Act:EVAL
    2015-07-23T20:34:05.108Z cpu2:33198)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.514f0c514ba0000e" state in doubt; requested fast path state update...
    2015-07-23T20:34:05.108Z cpu2:33198)ScsiDeviceIO: 2646: Cmd(0x439e16768f40) 0x85, CmdSN 0x385c from world 34616 to dev "naa.514f0c514ba0000e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

  • All commands listed in the errors are:

    Cmd(0x85)

  • You see these exact responses from the storage array to the 0x85 command:

    failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0

  • In ESXi 6.0 Update 2, you may see the preceding-noted messaging but these may be coincident with:

    • Widespread IO timeouts and subsequent aborts with the H:0x5 failure code

      For example:

      2016-03-10T20:36:12.203Z cpu2:33199)ScsiDeviceIO: 2646: Cmd(0x439e05768e50) 0x28, CmdSN 0x379d from world 34527 to dev "naa.514f0c514ba0000e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

      Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
    • Hosts may take a long time to reconnect to vCenter after reboot or hosts may enter a Not Responding state in vCenter Server
    • Storage-related tasks such as HBA rescan may take a very long time to complete

Cause

This issue occurs because in this specific scenario, a ESXi host has sent a request for SMART data to a storage array, and the array has responded with an unexpected illegal request error. The response received by the host triggers a Permanent Device Loss (PDL) detection, and the kernel performs a path evaluation to determine if there is need to fail the link in question.

In ESXi 6.0 Update 2, a change to the PDL response behavior can result in this condition blocking additional IO operations, resulting in the aborts and timeouts described in the Symptoms section. For more information, see General Storage Issues section in the ESXi 6.0 Update 2 Release Notes.

Resolution

This is a known issue affecting VMware ESXi 6.0 with  EMC XtremIO storage arrays.

This is a firmware issue on the storage array and the vendor will need to be contacted for a fixed version is available.

To work around these issues, use one of  these options:

Note: VMware recommends that you apply one of these workarounds prior to upgrading your ESXi hosts to ESXi 6.0 Update 2.

Option 1
 
Disable the SMART daemon (smartd). However, this affects local data capture of  SMART data for internal drives.

Note: VMware recommends against disabling smartd if possible.

To stop and disable smartd on an ESXi Host:
  1. Connect to the ESXi host through an SSH or a local console session using root credentials. For more information, see Using ESXi Shell in ESXi 5.x and 6.0 (2004746).
  2. Stop the smartd service using this command:

    /etc/init.d/smartd stop

  3. Disable the service using this command:

    chkconfig smartd off
 
Option 2
 
Depending on the array type used in the environment, there may be a firmware update available from the manufacturer that prevents the PDL sense code from being returned in response to the SMART command.

VMware recommends that you engage with your array vendor to determine if there is a firmware update that can be applied to prevent this behavior.
 
Notes:
  • While there may be other applicable storage platforms, this issue is known to be present on certain firmware versions of the EMC XtremIO storage array. This issue is known to be resolved in the 4.0.2-80 (or later ) firmware version for the XtremIO storage array. For more information on this issue and the steps to resolve it with XtremIO, see EMC KB 467750.

    Note: The preceding link was correct as of June 29, 2016. If you find the link is broken, provide a feedback and a VMware employee will update the link.

  • As with all storage platforms, contact your array vendor for a final assessment of any given behavior or fixes in a firmware release.

Additional Information

For more information on Permanent Device Loss, its detection and causes, see:
For more information about SCSI sense code errors, see Understanding SCSI device/target NMP errors/conditions in ESX/ESXi 4.x and ESXi 5.x/6.0 (1030381).

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 16 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 16 Ratings
Actions
KB: