Search the VMware Knowledge Base (KB)
View by Article ID

Storage PDL responses may not trigger path failover in vSphere 6.0 (2144657)

  • 34 Ratings

Symptoms

  • ESXi 6.0 may not fail over to an alternate, available path after encountering a Permanent Device Loss (PDL) condition on the active path.
  • Lack of failover to the alternate path result in aborted I/O, this can cause LUN availability issues which in turn cause virtual machine failures.
  • The PDL condition may be encountered on a subset (but not all) paths during non-disruptive upgrade events on certain storage platforms.
  • In the /var/log/vmkernel.log file on the ESXi host, you see path-evaluation activity followed by I/O failures associated with SCSI sense code  H:0x8:

    cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x2a) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
    cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x2a) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
    cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
    cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
    [ ... ]
    cpu28:36724)NMP: nmp_ThrottleLogForDevice:3286: Cmd 0x2a (0x43a61055c5c0, 36134) to dev "naa.514f0c5ec2000008" on path "vmhba1:C0:T6:L2" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
    cpu14:36168)NMP: nmp_ThrottleLogForDevice:3286: Cmd 0x89 (0x439e11581700, 32806) to dev "naa.514f0c5ec2000008" on path "vmhba2:C0:T7:L2" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

  • The LUN and datastore do not return to availability after this event until the original path is available again or the ESXi host is rebooted.

Cause

An inadvertent change in PDL multipathing behavior in ESXi 6.0 results in alternative working paths for a LUN not being checked if a PDL condition/error is detected. When encountering a PDL condition on the active path, the ESXi host initiates a health check against the remaining paths but does not fail over if another path is responsive/healthy.
The correct response is to failover to one of the healthy working paths. The result is, the host is no longer able to issue I/O to these LUNs until the ESXi host is rebooted. 

Resolution

This is a known issue affecting ESXi 6.0.

This behavior is changed in VMware vSphere 6.0 Update 2 and later, available at  VMware Downloads .

To work around this issue when LUN and datastore are accessible after the maintenance activity, restart the ESXi host.

Note: If this issue is leading to a critical condition in your environment, or if you think you are likely to encounter this issue due to pending upgrades/updates to the storage infrastructure, file a support request with VMware Support and note this KB article ID (2144657) in the problem description. For more information, see How to file a Support Request in My VMware (2006985).

Additional Information

To be alerted when this article is updated, click Subscribe to Document in the Actions box.

For more information about PDL, see Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x (2004684).

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 34 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 34 Ratings
Actions
KB: