vCenter Server upgrade to 7.0 U3f fails because WCP service is not starting
search cancel

vCenter Server upgrade to 7.0 U3f fails because WCP service is not starting

book

Article ID: 311942

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
1. vCenter is configured with vSphere with Tanzu Supervisor Cluster.
2. The vSphere with Tanzu cluster was enabled prior to upgrading vCenter to 7.0 U3d
3. The vCenter is upgraded from a version earlier than 70u3d up to vCenter Server 70u3d, then to vCenter Server 70u3f.

Note: This issue does not apply  in the following scenarios:
  • If vSphere with Tanzu is not configured at vCenter Server pre-70u3d. For example, vCenter itself was installed on older version, but vSphere with Tanzu Supervisor Cluster is first enabled on 7.0 U3d.
  • vCenters that do not have vSphere with Tanzu enabled.
  • To any other vCenter Server upgrade paths.
  • VMware Cloud Foundation (VCF) users, those do not have vCenter Server upgrade path mentioned in  point  # 3. 
4. vCenter upgrade to 70u3f fails at 80% with the error message "Exception occurred in postInstallHook" in VMware Appliance Management Interface (VAMI).

5. vCenter upgrade to 70u3f is unsuccessful because WCP service did not start post upgrade.
  • WCP status can be checked using "service-control --status wcp" command from vCenter SSH session.
  • From vCenter, the following error can be seen in /var/log/vmware/wcp/wcpsvc.log
Failed to init kubeLifecycle schema: ERROR: column "instance_id" of relation "cluster_db_configs" contains null values

6. There may be a few additional vCenter Server services in stopped state due to dependencies and RPM upgrade ordering during VC upgrade.



Environment

VMware vCenter Server 7.0.3

Cause

Upgrading vCenter Server upgrades the WCP service (wcp svc) running on vCenter Server. The WCP service upgrade fails because the cluster_db_configs table has entries that have instance_id as null. The schema for the table expects instance_id to be a non-null value. WCP service fails to start because of this mismatch. To fix the problem, instance_id needs to be set to a random UUID.

Resolution

This issue is resolved in vCenter server 7.0 U3h or higher versions.
Please refer to :- VMware vCenter Server 7.0 Update 3h Release Notes 

Workaround:
This workaround can be applied either at vCenter Server 7.0u3f itself that is the failed state or vCenter Server 7.0u3d which is before starting upgrading to 7.0u3f.

1. Connect to vCenter Server SSH session with root credentials.

2. Connect to WCP database using the below command:
PGPASSFILE=/etc/vmware/wcp/.pgpass /opt/vmware/vpostgres/current/bin/psql -d VCDB -U wcpuser -h localhost

3. Run the following command to check the entries that have instance_id as null:
SELECT cluster, instance_id FROM cluster_db_configs WHERE instance_id is NULL;

4. Update the instance_id in cluster_db_configs to random UUID where it is null:
UPDATE cluster_db_configs SET instance_id=gen_random_uuid() WHERE instance_id is NULL;

5. WCP service (and any other service that has not started after the upgrade) needs to be restarted once the DB entry has been fixed.
service-control --status --all 
service-control --restart --all (--stop or --start)
or 
service-control --restart wcp (--stop or --start)

6. Re-run Step 2 and 3 to verify instance_id is not NULL. Now  vCenter Server must be up and running.

7. At this stage if the user have applied this workaround at vCenter Server 70u3d, then proceed upgrading to vCenter Server 70u3f or If the user has applied the workaround at vCenter Server 70u3f, then visit the VMware Appliance Management Interface (VAMI) or CLI installer and resume the upgrade.