vSAN Disk group disks showing 'disks in use' as 0
search cancel

vSAN Disk group disks showing 'disks in use' as 0

book

Article ID: 315509

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

To identify the issue displayed by the vSAN health test result indicating a to memory leak.
 


Symptoms:

Unable to place a vSAN ESXi host into maintenance mode

Disk group in vSAN shows ‘disks in use’ as 0 in  => vCenter > Cluster > Configure > vSAN> Disk management

The Health plugin => vCenter > Cluster > Monitor > vSAN > Health  will show different issues every time the health test is run, with a few examples below:

  1. Network - Hosts with connectivity issues
  2. Network - Network latency check
  3. Host configured with different Environment variables
You will see the following error traces in the logs  : 
# less vsanmgmt.log |grep -i MEMORY
2019-05-21T03:48:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthDaemon::doVsanHealthCheck] Get exception in invoking QueryCheckLimits : [Errno 12] Cannot allocate memory
2019-05-21T03:53:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthDaemon::doVsanHealthCheck] Get exception in invoking QueryCheckLimits : [Errno 12] Cannot allocate memory
2019-05-21T03:58:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthDaemon::doVsanHealthCheck] Get exception in invoking QueryCheckLimits : [Errno 12] Cannot allocate memory
2019-05-21T04:03:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthDaemon::doVsanHealthCheck] Get exception in invoking QueryCheckLimits : [Errno 12] Cannot allocate memory
2019-05-21T04:08:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthDaemon::doVsanHealthCheck] Get exception in invoking QueryCheckLimits : [Errno 12] Cannot allocate memory
2019-05-21T04:09:11Z VSANMGMTSVC: INFO vsanperfsvc[MainThread] [statsdaemon::_logDaemonMemoryStats] Daemon memory stats: eMin=97.280MB, eMinPeak=103.396MB, rMinPeak=103.396MB
 
#grep -r "memory stats" var/run/log/vsanmgmt* | grep PRESSURE
No result

Restarting the vSAN mgmt service (vsanmgmt) will fix the issue for a few minutes (10 to 180 minutes) and then issue reappears on the vSAN Health Test page.

On all the hosts in the cluster run:
/etc/init.d/vsanmgmtd restart

You may also verify if the issue is resolved by starting the below services if the "vsanmgmtd"  restart does not resolve the issue:
/etc/init.d/vsanvpd restart
/etc/init.d/vpxa restart
/etc/init.d/hostd restart

From vCenter Appliance SSH:

service-control --stop --all
service-control --start –all


Environment

VMware vSAN 6.x

Cause

It’s an issue with Controller Driver "lsi_msgpt3" version (15.00.00.00-1OEM) that is handling the ioctl command from sas3flash due to a known issue.

#/opt/lsi/bin/sas3flash -list
command Hung up


#ps -cPTgjstz | grep sas3flash
101712 101712 sas3flash 101712 18713105 18713105 18713106 U WAIT LOCK 0-23 0.5025 /opt/lsi/bin/sas3flash -list
18714213 18714213 sas3flash 18714213 18713105 18713105 18713106 U WAIT LOCK 0-23 0.5225


#/opt/lsi/bin/sas3flash -list
.
.
Many sas3flash processes waiting.
 
#ps | grep sas3flash | wc -l
29

 
A large number of processes running in a WAIT state are causing the vsanmgmt service to crash and the health plugin will display vague results.

The SAS3FLASH plugin initiates the "list "command at regular intervals to the driver, upon not receiving the response the "query" initiated stays in memory. Multiple "queries" remain in memory, finally causing the vsanmgmt service to crash, as can be seen by the ps command mentioned above. The following command can be executed to ensure it is displaying the output without getting hung: 
 
#/opt/lsi/bin/sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 12.00.00.00 (2015.11.19)
Copyright 2008-2015 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 0
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:03:00:00
        SAS Address                    : 5001636-0-026d-fe28
        NVDATA Version (Default)       : 09.00.1a.12
        NVDATA Version (Persistent)    : 09.00.1a.12
        Firmware Product ID            : 0x2721 (IR)
        Firmware Version               : 09.00.00.00
        NVDATA Vendor                  : Quanta
        NVDATA Product ID              : SAS3008
        BIOS Version                   : 08.21.00.00
        UEFI BSD Version               : 10.00.00.00
        FCODE Version                  : N/A
        Board Name                     : Quanta Mezz
        Board Assembly                 : 35S2BMA0000
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.

Resolution



Workaround:
Restarting services on the hosts will fix the issue for few minutes only.

Additional Information

Impact/Risks:
No impact on production, but disrupts a few management tasks like placing a host into maintenance mode, failing vSAN health test etc.