VMware Live RecoveryVMware vCenter ServerVMware vSphere ESXi
Issue/Introduction
Symptoms: Attempting to submit a backup job via vCenter Server Appliance VAMI fails with error Invalid vCenter Server Status: All required services are not up! Stopped services: 'vsan-health'.
Attempting to configure a VM for vSphere Replication fails with error : Operation Failed A generic error occurred in the vSphere Replication Management Server. Exception details: 'Unexpected status code: 503'. Operation ID: 3be4d590-8170-40e8-b4ae-14bf8258e3de
You may find vSAN-Health Service fails to start with stack below
service-control --start vmware-vsan-health Operation not cancellable. Please wait for it to finish... Performing start operation on service vsan-health... Error executing start on service vsan-health. Details { "problemId": null, "detail": [ { "args": [ "vsan-health" ], "id": "install.ciscommon.service.failstart", "translatable": "An error occurred while starting service '%(0)s'", "localized": "An error occurred while starting service 'vsan-health'" } ], "resolution": null, "componentKey": null } Service-control failed. Error: { "problemId": null, "detail": [ { "args": [ "vsan-health" ], "id": "install.ciscommon.service.failstart", "translatable": "An error occurred while starting service '%(0)s'", "localized": "An error occurred while starting service 'vsan-health'" } ], "resolution": null, "componentKey": null }
/var/log/vmware/vsan/-health/vmware-vsan-health-runtime.log.stderr log reports an error as below. Starting service process with pid: 53357. Traceback (most recent call last): File "/usr/lib/vmware-vpx/vsan-health/VsanVcMgmtd.py", line 9, in os.initgroups(entry.pw_name, entry.pw_gid) PermissionError: [Errno 1] Operation not permitted
Every time an attempt to start the service will reflect change in PID in vmware-vsan-health-runtime.log.stderr and the service remains stopped.
Environment
VMware vCenter Server Appliance 6.7.x VMware vSphere Replication 8.x
Cause
vsan-health service starts with root account and the same is the case with other services.
It is most likely to happen if the service was disabled and upgrade succeeded to 6.7 U3.
Service property and json files fails to update the required attribute.
Resolution
To solve this issue, please follow the steps given below:
Take a snapshot of vCenter Server Appliance.
Open a SSH session to vCenter Server Appliance.
Go to /etc/vmware/vmware-vmon/svcCfgfiles/
cd /etc/vmware/vmware-vmon/svcCfgfiles/
use the below command to list hidden files in this location.
ls -la
Make a copy .state_vsan-health.json outside svcCfgfiles folder
cp .state_vsan-health.json ../.state_vsan-health.json Note: Ignore step 4 and 5 if the .state_vsan-health.json is not found in location