Storage PDL responses may not trigger path failover in vSphere 6.0
search cancel

Storage PDL responses may not trigger path failover in vSphere 6.0

book

Article ID: 317977

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Upgrading to ESXi 6.0 Update 2 resolves this issue.

Symptoms:
    ESXi 6.0 may not failover to an alternate available path after encountering a Permanent Device Loss (PDL) condition on the active path.

    Note: For additional symptoms and log entries, see the Additional Information section.

    Environment

    VMware vSphere ESXi 6.0

    Cause

    An inadvertent change in PDL multipathing behavior in ESXi 6.0 results in alternative working paths for a LUN not being checked if a PDL condition/error is detected. When encountering a PDL condition on the active path, the ESXi host initiates a health check against the remaining paths but does not fail over if another path is responsive/healthy.
     
    The correct response is to failover to one of the healthy working paths. As a result is, the host is no longer able to issue I/O to these LUNs until the ESXi host is rebooted.

    Resolution

    This issue is resolved in ESXi 6.0 Update 2 and later, available at VMware Downloads.
     
    To work around this issue if you do not want to upgrade, restart the ESXi host.


    Additional Information

    You experience these additional symptoms:
    • In the /var/log/vmkernel.log file on the ESXi host, you see path-evaluation activity followed by I/O failures associated with SCSI sense code H:0x8:

      cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x2a) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
      cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x2a) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
      cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
      cpu6:32909)WARNING: NMP: nmp_PathDetermineFailure:2961: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T5:L2 device naa.514f0c5ec2000008 - triggering path evaluation
      [ ... ]
      cpu28:36724)NMP: nmp_ThrottleLogForDevice:3286: Cmd 0x2a (0x43a61055c5c0, 36134) to dev "naa.514f0c5ec2000008" on path "vmhba1:C0:T6:L2" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
      cpu14:36168)NMP: nmp_ThrottleLogForDevice:3286: Cmd 0x89 (0x439e11581700, 32806) to dev "naa.514f0c5ec2000008" on path "vmhba2:C0:T7:L2" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL


      Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
       
    • Lack of failover to the alternate path result in failed I/O, this can cause LUN availability issues which in turn cause virtual machine failures.
    • The PDL condition may be encountered on a subset (but not all) paths during non-disruptive upgrade events on certain storage platforms.
    • The LUN and datastore do not return to availability after this event until the original path is available again or the ESXi host is rebooted.
     

    SCSI events that can trigger ESX server to fail a LUN over to another path
    Understanding SCSI host-side NMP errors/conditions in ESX/ESXi 4.x, ESXi 5.x, and 6.x
    Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
    How to file a Support Request in Customer Connect
    ストレージ PDL 応答が vSphere 6.0 でパス フェイルオーバーをトリガしないことがある
    在 vSphere 6.0 中存储 PDL 响应可能不会触发路径故障切换