vSAN cluster summary tab shows health alarm "vSAN object health", "Home Object", "After 1 additional host failure" or "Host with connectivity issues"
search cancel

vSAN cluster summary tab shows health alarm "vSAN object health", "Home Object", "After 1 additional host failure" or "Host with connectivity issues"

book

Article ID: 326403

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • Customer may see the below error on vSAN cluster summary page after upgrading from vSAN 6.6 to 6.7 .
  • It seems that all VMs and hosts are working in a normal manner.
  • In the vSAN health check tab, you may see the below error
  • The logs will show following log pattern:
The ESXi hosts /var/log/vsanmgmt.log will show below errors:
 
2019-04-27T13:50:53Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] entry = {'healthReason': 0, 'healthFlags': 0, 'timestamp': 127419771773} 
 
2019-04-27T13:50:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] Failed to get disk encryption info Traceback (most recent call last):   File "/build/mts/release/bora-12775454/bora/build/vsan/release/vsanhealth/usr/lib/vmware/vsan/perfsvc/VsanHealthSystemImpl.py", line 1813, in _QueryPhysicalDiskHealthSummary ValueError: Failed to open device /vmfs/devices/disks/naa.5002538a488c0a60

2019-04-30T21:33:07.993Z error hostd[5297249] [Originator@6876 sub=vmomi.soapStub[58]] Resetting stub adapter for server <cs p:0000001210900cb0, TCP:localhost.localdomain:9095> : service state request failed: N7Vmacore15SystemExceptionE(Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem,  timeout, or service overload.)

2019-04-27T08:01:27Z VSANMGMTSVC: ERROR vsanperfsvc[906d9cca-68c2-11e9] [VsanEsxHclUtil::__init__] Failed to run tool storcli: Exception 'RunCommandError' occured running command '['/opt/lsi/storcli/storcli', '++group=host/vim/tmp', '/call', 'show', 'J']'

On ESXi hosts /var/log/hostd.log:

2019-05-02T08:29:05.352Z warning hostd[5297256] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x000000120dea5a70, h:120, <TCP '127.0.0.1 : 42447'>, <TCP '127.0.0.1 : 9095'>>, e: 111(Connection refused)

On ESXi hosts /var/log/syslog.log:

2019-05-02T09:56:35Z Unknown: out of memory [7124098]  ( This message repeated multiple times consecutively ) 

The hostd logs may point to a network issue, but it might not be a networking issue. Double check the NIC drivers to ensure they are listed on the HCL: https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io

Also, you might see in /var/log/hostd.log on the ESXi host:

2019-05-02T08:29:05.588Z info hostd[5297215] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 2764 : vSAN virtual NIC has been added.
 
2019-05-02T08:28:56.188Z cpu40:7368912)WARNING: UserSocketInet: 2266: python: waiters list not empty!
2019-05-02T08:28:56.188Z cpu40:7368912)WARNING: UserSocketInet: 2266: python: waiters list not empty!
2019-05-02T08:29:00.349Z cpu70:7368912)WARNING: CMMDS: CMMDSArenaMemUnmapFromUser:194: Failed to unmap MPNs from world 7368917: Not found
2019-05-02T08:29:05.486Z cpu0:2099591)CMMDS: CMMDSVSIUpdateNetworkCbk:2836: RECONFIGURE of interface vmk2 with cmmds (Success).
2019-05-02T08:29:05.487Z cpu21:2099591)CMMDS: CMMDSUtil_PrintArenaEntry:41: [1035794]:Inserting (actDir:0):u:6217c35b-b6b7-53db-922e-6805ca7f6d1a o:5b9f2b83-6de1-4786-630f-6805ca7f6d1a r:23 t:NET_INTERFACE
2019-05-02T08:29:05.487Z cpu21:2099591)CMMDS: CMMDSUtil_PrintArenaEntry:41: [1035795]:Removing (actDir:0):u:6217c35b-b6b7-53db-922e-6805ca7f6d1a o:5b9f2b83-6de1-4786-630f-6805ca7f6d1a r:22 t:NET_INTERFACE
2019-05-02T08:29:09.198Z cpu42:2100048)WARNING: LSOM: LSOMVsiGetVirstoInstanceStats:800: Throttled: Attempt to get Virsto stats on unsupported disk52942248-4166-09b8-34ac-e5d4c1a8291b


Environment

VMware vSAN 6.x

Cause

The storecli service is causing the "out of memory" issue and disrupting other services with specific version of storcli (vmware-storcli-007.0209.0000.0000)

Resolution

Remove the storcli VIB from the hosts.
 
# esxcli software vib remove -n vmware-storcli-007.0209.0000.0000

The command to remove the storcli VIB may fail. If this occurs, put the host in maintenance mode with ensure accessibility, and reboot the host. Attempt the command again when the host fully boots up. 

Workaround:
Restarting the vsanmgmt service can clear the error, but it may return a few hours later.

# /etc/init.d/vsanmgmtd restart