Search the VMware Knowledge Base (KB)
View by Article ID

NFS connectivity issues on NetApp NFS filers on ESXi 5.x/6.x (2016122)

  • 108 Ratings

Symptoms

When using NFS datastores on some NetApp NFS filer models on an ESXi/ESX host, you experience these symptoms:
  • The NFS datastores appear to be unavailable (grayed out) in vCenter Server, or when accessed through the vSphere Client
  • The NFS shares reappear after few minutes
  • Virtual machines located on the NFS datastore are in a hung/paused state when the NFS datastore is unavailable
  • This issue is most often seen after a host upgrade to ESXi 5.x or the addition of an ESXi 5.x host to the environment
  • In the /var/log/vmkernel.log file on the ESXi host, you see entries similar to:

    • NFSLock: 515: Stop accessing fd 0xc21eba0 4
      NFS: 283: Lost connection to the server 192.168.100.1 mount point /vol/datastore01, mounted as bf7ce3db-42c081a2-0000-000000000000 ("datastore01")
      NFSLock: 477: Start accessing fd 0xc21eba0 again
      NFS: 292: Restored connection to the server 192.168.100.1 mount point /vol/datastore01, mounted as bf7ce3db-42c081a2-0000-000000000000 ("datastore01")
    • <YYYY-MM-DD>T<TIME> Z cpu2:8194)StorageApdHandler: 277: APD Timer killed for ident [b63367a0-e78ee62a]
      <YYYY-MM-DD>T<TIME> 607Z cpu2:8194)StorageApdHandler: 402: Device or filesystem with identifier [b63367a0-e78ee62a] has exited the All Paths Down state.
      <YYYY-MM-DD>T<TIME> Z cpu2:8194)StorageApdHandler: 902: APD Exit for ident [b63367a0-e78ee62a]!
      <YYYY-MM-DD>T<TIME> Z cpu16:8208)NFSLock: 570: Start accessing fd 0x4100108487f8 again
      <YYYY-MM-DD>T<TIME>  Z cpu2:8194)WARNING: NFS: 322: Lost connection to the server 10.20.90.2 mount point /vol/nfs_snapmirror_test, mounted as bd5763b1-19271ed7-0000-000000000000 ("AFO_SNAPMIRROR_TEST")
      <YYYY-MM-DD>T<TIME>  Z cpu2:8194)WARNING: NFS: 322: Lost connection to the server 10.20.90.2 mount point /vol/nfs_vmware_isos_vol01, mounted as 654dc625-6010e4e6-0000-000000000000 ("NFS_SATA_ISOS_VOL01")

  • In the /var/log/vobd.log file on the ESXi host, you see entries similar to:

    <YYYY-MM-DD>T<TIME>  Z: [vmfsCorrelator] 6084893035396us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.100.1 /vol/datastore01 bf7ce3db-42c081a2-0000-000000000000 volume-name:datastore01
    <YYYY-MM-DD>T<TIME> Z: [vmfsCorrelator] 6085187880809us: [esx.problem.vmfs.nfs.server.restored] 192.168.100.1 /vol/datastore01 bf7ce3db-42c081a2-0000-000000000000 volume-name:datastore01


  • When examining a packet trace from the VMkernel port used for NFS, zero window TCP segments may be seen originating from the NFS filer in Wireshark:

    No Time Source Destination Protocol Length Info
    784095 325.356980 10.1.1.35 10.1.1.26 RPC 574 [TCP ZeroWindow] Continuation
    792130 325.452001 10.1.1.35 10.1.1.26 TCP 1514 [TCP ZeroWindow] [TCP segment of a reassembled PDU]


  • Host may disconnect in the environment.

Note
: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Purpose

This article provides steps to work around the issue if you are unable to unmount NFS datastore in ESXi 5.x/6.x.

Resolution

An NFS connectivity issue occurs in vSphere 5.5 U1 which presents similar symptoms. For more information on  vSphere 5.5 U1, see Intermittent NFS APDs on VMware ESXi 5.5 U1 (2076392).

Discussed here is an issue with certain versions of OnTap reducing TCP window size to 0 under certain high load circumstances. For more information, see NetApp Bug ID 654196.

Note: You must be registered in NetApp website to view this document.

Note: The preceding links were correct as of September 29, 2015. If you find a link is broken, provide feedback and a VMware employee will update the link.

Workaround 1

To work around this issue and prevent it from occurring, reduce the NFS.MaxQueueDepth advanced parameter to a much lower value. This reduces or eliminates the disconnections.

When sufficiently licensed, utilize the Storage I/O Control feature to work around the issue. An Enterprise Plus license for all ESXi hosts is required to use this feature.

When Storage I/O Control is enabled, it dynamically sets the value of MaxQueueDepth, circumventing the issue.

For more information on Storage I/O control, see:
Workaround 2

To set the NFS.MaxQueueDepth advanced parameter using the vSphere Client:
  1. Click the host in the Hosts and Clusters view.
  2. Click the Configuration tab. Then under Software, click Advanced Settings.
  3. Click NFS, then scroll down to NFS.MaxQueueDepth.
  4. Change the value to 64.
  5. Click OK.
  6. Reboot the host for the change to take effect.
To set the NFS.MaxQueueDepth advanced parameter using the vSphere 5.1 Web Client:
  1. Click the Hosts and Clusters tab.
  2. Click the ESXi host you want to modify.
  3. Click Manage > Settings > Advanced System Settings.
  4. Select the variable NFS.MaxQueueDepth.
  5. Change the value to 64 and click OK.
  6. Reboot the host for the change to take effect.
To set the NFS.MaxQueueDepth advanced parameter on the command line:
  1. Connect to the host using SSH. For more information, see Using ESXi Shell in ESXi 5.x and 6.0 (2004746).
  2. Run the command:

    # esxcfg-advcfg -s 64 /NFS/MaxQueueDepth

  3. Reboot the host for the change to take effect.
  4. After the host reboots, run this command to confirm the change:

    # esxcfg-advcfg -g /NFS/MaxQueueDepth
    Value of MaxQueueDepth is 64
Note: VMware suggests a value of 64. If this is not sufficient to stop the disconnects, further reduce the value by half. For example, change the value to 32 or 16 accordingly until the disconnects cease.

Additional Information

 
Note: The preceding link was correct as of September 29, 2015. If you find the link is broken, provide feedback and a VMware employee will update the link.

To be alerted when this article is updated, click Subscribe to Document in the Actions Box.

Tags

Unable to unmount NFS datastore, nfs.maxqueuedepth, nfs storage luns disconnecting, NFS Datastores get disconnected intermittently & reconnects back

See Also

Update History

02/14/2013 - Added info on resolution in NetApp Data ONTAP 8.0.5 and 7.3.7P1D2. 04/05/2013 - Updated NetApp Data ONTAP version to 7.3.7P2. 05/29/2013 - Updated the NetApp Ontap version 8.1.2P4,8.1.3RC1,8.2RC1 under resolution section. 05/16/2014 - Added link to NetApp Bug ID 654196 alonwith a Note under Resolution. 03/30/2015 - Added ESXi 6.0 to Products.

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 108 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 108 Ratings
Actions
KB: