vSAN-Health Service fails to start
search cancel

vSAN-Health Service fails to start

book

Article ID: 318854

calendar_today

Updated On:

Products

VMware Live Recovery VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
Attempting to submit a backup job via vCenter Server Appliance VAMI fails with error
Invalid vCenter Server Status: All required services are not up! Stopped services: 'vsan-health'.

Attempting to configure a VM for vSphere Replication fails with error :
Operation Failed
A generic error occurred in the vSphere Replication Management Server. Exception details: 'Unexpected status code: 503'.
Operation ID: 3be4d590-8170-40e8-b4ae-14bf8258e3de


You may find vSAN-Health Service fails to start with stack below

service-control --start vmware-vsan-health
Operation not cancellable. Please wait for it to finish...
Performing start operation on service vsan-health...
Error executing start on service vsan-health. Details {
    "problemId": null,
    "detail": [
        {
            "args": [
                "vsan-health"
            ],
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "localized": "An error occurred while starting service 'vsan-health'"
        }
    ],
    "resolution": null,
    "componentKey": null
}
Service-control failed. Error: {
    "problemId": null,
    "detail": [
        {
            "args": [
                "vsan-health"
            ],
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "localized": "An error occurred while starting service 'vsan-health'"
        }
    ],
    "resolution": null,
    "componentKey": null
}


/var/log/vmware/vsan/-health/vmware-vsan-health-runtime.log.stderr log reports an error as below.
Starting service process with pid: 53357.
Traceback (most recent call last):
  File "/usr/lib/vmware-vpx/vsan-health/VsanVcMgmtd.py", line 9, in
    os.initgroups(entry.pw_name, entry.pw_gid)
PermissionError: [Errno 1] Operation not permitted


Every time an attempt to start the service will reflect change in PID in vmware-vsan-health-runtime.log.stderr and the service remains stopped.

Environment

VMware vCenter Server Appliance 6.7.x
VMware vSphere Replication 8.x

Cause

  • vsan-health service starts with root account and the same is the case with other services. 
  • It is most likely to happen if the service was disabled and upgrade succeeded to 6.7 U3.
  • Service property and json files fails to update the required attribute.

Resolution

To solve this issue, please follow the steps given below:
  1. Take a snapshot of vCenter Server Appliance.
  2. Open a SSH session to vCenter Server Appliance.
  3. Go to /etc/vmware/vmware-vmon/svcCfgfiles/
cd /etc/vmware/vmware-vmon/svcCfgfiles/
 
  1. use the below command to list hidden files in this location.
ls -la
  1. Make a copy .state_vsan-health.json outside svcCfgfiles folder
cp .state_vsan-health.json ../.state_vsan-health.json
Note: Ignore step 4 and 5 if the .state_vsan-health.json is not found in location
  1. Delete /etc/vmware/vmware-vmon/svcCfgfiles/.state_vsan-health.json
rm /etc/vmware/vmware-vmon/svcCfgfiles/.state_vsan-health.json
  1. Navigate to /usr/lib/vmware-vmon
cd /usr/lib/vmware-vmon
  1. Run the command:
vmon-cli -U vsan-health -R root
  1. Start the service using 
service-control --start vmware-vsan-health
 
Note: Remember to consolidate the snapshot taken in step 1 once the solution is verified.


Additional Information

Impact/Risks:
  • VAMI backup will fail.
  • Cannot configure a VM for vSphere Replication job.