- This issue is believed to affect any version of NFS.
- The performance of VMs decrease on NFS datastores within a couple of days after a host reboot.
- IO Meter tests show increasing latencies and a significant drop of IOPS after a week, and gets worse after 2 to 3 weeks.
- A host reboot fixes the issue temporarily.
- The ESXi host may also crash with a PSOD similar to the below
2019-10-23T02:00:01.778Z cpu30:65681)@BlueScreen: Spin count exceeded - possible deadlock with PCPU 5
2019-10-23T02:00:01.778Zcpu30:65681)Code start: 0x418017a00000 VMK uptime: 50:01:19:48.273
2019-10-23T02:00:01.778Zcpu30:65681)Saved backtrace from: pcpu 5 SpinLock spin out NMI
2019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9b3e0:[0x4180189c6da1]NFSSchedQFree@(nfsclient)#+0x81 stack:0x4309a6be3188
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b400:[0x4180189c7fac]NFSSched_FreeMPE@(nfsclient)#+0x7c stack: 0x43928bd9b4a0
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b430:[0x4180189c375c]NFSVolumeFreeMPE@(nfsclient)#+0x24 stack: 0x430400000000
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b440:[0x4180189c494b]NFSVolumeUmountInt@(nfsclient)#+0x1e7 stack: 0x4309a6bd7000
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b570:[0x4180189c56b0]NFSVolume_Umount@(nfsclient)#+0xcc stack: 0x48
2019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9b5a0:[0x4180189c1c76]NFSVsi_UmountSet@(nfsclient)#+0xba stack: 0x1
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b5d0:[0x418017a02034]VSI_SetInfo@vmkernel#nover+0x408 stack: 0x43928bd9b6b0
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b650:[0x41801811cf64]UWVMKSyscallUnpackVSI_Set@(user)#+0x308 stack: 0x0
2019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9bef0:[0x4180181103e0]User_UWVMKSyscallHandler@(user)#+0xa4 stack: 0x0
2019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9bf20:[0x418017b0fd05]User_UWVMKSyscallHandler@vmkernel#nover+0x1d stack: 0x0
2019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9bf30:[0x418017b3d067]gate_entry_@vmkernel#nover+0x0 stack: 0x0
2019-10-23T02:00:01.778Zcpu30:65681)base fs=0x0 gs=0x418047800000 Kgs=0x0
2019-10-23T02:00:01.778Z cpu5:47364475)NMI: 663: NMI IPI: We Halt. RIPOFF(base):RBP:CS [0xfc6da1(0x418017a00000):0x1c19b75:0x4010] (Src 0x4, CPU5)
2019-10-23T02:00:01.778Zcpu30:65681)NMI: 689: NMI IPI: RIPOFF(base):RBP:CS [0xf2b13(0x418017a00000):0xeb2d5bd1:0x4010] (Src 0x1, CPU30)
- Before the PSOD you will see similar messages as below:
2019-10-23T02:00:01.778Z cpu15:34361800)WARNING: NFS: 5603: Failed to convert sgArr to NFSIoInfo: Out of memory.
- ESXi host may crash with a PSOD while collecting logs from vSphere Client with similar backtrace as below:
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b550:[0x418028f775e8]NFSVolume_GetMPEWorldList@(nfsclient)#<None>+0x8d stack: 0x43186847f630
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b5b0: [0x418028f72a5a]NFSVsi_MPEWorldList@(nfsclient)#<None>+0x2f stack: 0x0
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b5d0:[0x4180278027be]VSI_GetListInfo@vmkernel#nover+0x2c3 stack: 0x451aac69b690
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b650:[0x418027fb6935]UW64VMKSyscallUnpackVSI_GetList@(user)#<None>+0x21e stack: 0x451aac69b690
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69bed0:[0x418027f46660]User_UWVMK64SyscallHandler@(user)#<None>+0x249 stack: 0x0
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69bf30:[0x418027961f78]SyscallUWVMK64@vmkernel#nover+0x90 stack: 0x0
2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69bf38:[0x418027963066]gate_entry@vmkernel#nover+0x67 stack: 0x0
^[[7m2019-10-23T02:00:02.778Z cpu36:2098074)WARNING: Heartbeat: 760: PCPU 52 didn't have a heartbeat for 7 seconds; *may* be locked up.^[[0m
^[[31;1m2019-10-23T02:00:02.778Z cpu52:28559919)ALERT: NMI: 696: NMI IPI: RIPOFF(base):RBP:CS [0x1118eb(0x418027800000):0x451ac841b470:0xfc8] (Src 0x1, CPU52)^[[0m
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b420:[0x4180279118ea]MCSWaitForUnblock@vmkernel#nover+0xdb stack: 0x1ae00000000
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b470:[0x418027912009]MCS_LockWait@vmkernel#nover+0xae stack: 0x0
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b540:[0x41802791252a]MCSLockWithFlagsWork@vmkernel#nover+0x23 stack: 0x0
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b550:[0x418028f775b4]NFSVolume_GetMPEWorldList@(nfsclient)#<None>+0x59 stack: 0x451ac841b560
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b5b0:[0x418028f72a5a]NFSVsi_MPEWorldList@(nfsclient)#<None>+0x2f stack: 0x0
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b5d0:[0x4180278027be]VSI_GetListInfo@vmkernel#nover+0x2c3 stack: 0x451ac841b690
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b650:[0x418027fb6935]UW64VMKSyscallUnpackVSI_GetList@(user)#<None>+0x21e stack: 0x451ac841b690
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841bed0:[0x418027f46660]User_UWVMK64SyscallHandler@(user)#<None>+0x249 stack: 0x0
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841bf30:[0x418027961f78]SyscallUWVMK64@vmkernel#nover+0x90 stack: 0x0
2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841bf38:[0x418027963066]gate_entry@vmkernel#nover+0x67 stack: 0x0
^[[7m2019-10-23T02:00:04.329Z cpu6:2103178)WARNING: Lock: 1701: (held by 38: Spin count exceeded 1 time(s) - possible deadlock.^[[0m
2019-10-23T02:00:04.329Z cpu6:2103178)0x451ac621af18:[0x41802782e3e5]Lock_CheckSpinCount@vmkernel#nover+0x261 stack: 0x43086b5924
- NFS heap and NFS slabs run out of memory with similar log messages to:
2018-12-21T09:36:27.671Z cpu37:67667)WARNING: NFSLock: 311: out of memory
2018-12-21T09:36:27.671Z cpu37:67667)NFSLock: 1779: Failed to create lock info for lock file .lck-ca1b000000000000 : Out of memory
2018-12-21T09:36:27.703Z cpu37:67667)WARNING: NFS: 2189: Unable to allocate fileData
2018-12-21T09:36:27.703Z cpu37:67667)NFS: 2333: Failed to get object (0x43914299b386) 40 28f0e377 a2c9ae 7ed2 23243005 20 d87e0000 26e3af68 >272c752 d0ee3700000040 0 0 0 :Out of memory
2018-12-21T09:36:27.717Z cpu37:67667)WARNING: NFS: 2079: no memory for result
Note: The preceding log excerpts are only examples .Date,time and environmental variables may vary depending on your environment.