"vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services", vCLS virtual machines are not getting deployed after VCSA upgrade to 7.0
search cancel

"vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services", vCLS virtual machines are not getting deployed after VCSA upgrade to 7.0

book

Article ID: 318191

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • After vCenter Server Appliance (VCSA) upgrade to 7.0 Update 1 or later, vSphere Cluster Service (vCLS) virtual machine(s) are not getting deployed.
  • You see a warning message in vSphere Client as "vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS"
  • You may see the below errors/warnings:
  • Can’t provision VM for Cluster Agent due to lack of suitable datastore
  • Couldn’t acquire token due to: Signature validation failed
  • You may see the below error snippets in /var/log/vmware/eam/eam.log 
2020-11-17T10:57:19.312Z |  INFO | cluster-agent-1 | AgentBase.java | 229 | [checkGoal:ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null')] task in progress.
2020-11-17T10:57:19.312Z |  INFO | cluster-agent-1 | VcEventManager.java | 792 | [EventIndex: 141046] Posting event.
2020-11-17T10:57:19.312Z | ERROR | cluster-agent-1 | AuditedJob.java | 106 | JOB FAILED: [#1878814658] DeployVmJob(ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null'))
com.vmware.eam.job.DeployVmJob$DeployVmJobFailure: Can't provision VM for ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null') due to lack of suitable datastore.

2020-11-17T10:57:19.312Z |  INFO | cluster-agent-1 | AgentBase.java | 229 | [checkGoal:ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null')] task in progress.
2020-11-17T10:57:19.312Z |  INFO | cluster-agent-1 | VcEventManager.java | 792 | [EventIndex: 141046] Posting event.
2020-11-17T10:57:19.312Z | ERROR | cluster-agent-1 | AuditedJob.java | 106 | JOB FAILED: [#1878814658] DeployVmJob(ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null'))
com.vmware.eam.job.DeployVmJob$DeployVmJobFailure: Can't provision VM for ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null') due to lack of suitable datastore.
 
2020-11-17T10:57:49.103Z |  INFO | sts-0 | Workflow.java | 121 | [CreateSAMLToken:577f77cb515aed10] FAILED
com.vmware.eam.sso.exception.TokenNotAcquired: Couldn't acquire token due to: Signature validation failed
Caused by: com.vmware.vim.sso.client.exception.MalformedTokenException: Signature validation failed


Note:The preceding log excerpts are only examples. Date,time and environmental variables may vary depending on your environment


Environment

VMware vCenter Server 7.0.x

Cause

As part of the vCLS deployment workflow, EAM Service will identify the suitable datastore to place the vCLS VMs. This workflow was failing due to EAM Service unable to validate the STS Certificate in the token.

Resolution

This issue is resolved in VMware vCenter Server 7.0 Update 3, available at VMware Downloads.

Workaround:
To workaround this issue, reset the STS Certificate using the attached fixsts script, detailed procedure mentioned below :

Note:
  • This script should only be run once per SSO domain; STS certs are thereby updated for all vCenters in the SSO domain.
Please follow the below steps:
  1. Download the attached fixsts.sh script from this article and upload to the /tmp folder on impacted vCenter Server.
  2. If the connection to upload to the vCenter by the SCP client is rejected, run this from an SSH session to the vCenter: chsh -s /bin/bash, refer to Error when uploading files to vCenter Server Appliance using WinSCP for more information.
  3. Connect to the vCenter Server with an SSH session if you have not already per Step 2.
  4. Navigate to the /tmp directory ( cd /tmp ) :
  5. Run chmod +x fixsts.sh to make the file executable.
  6. Run ./fixsts.sh.
  7. Restart services on all vCenters in your SSO domain by using below commands:
    service-control --stop --all
    service-control --start --all
  8. vCLS VM(s) should get deployed successfully


Additional Information

"Signing certificate is not valid" error in VCSA 6.5.x/6.7.x and vCenter Server 7.0.x
For more information on vCLS, see vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1
For more information on STS certificates, see Security Token Service STS

Impact/Risks:

Warning:

This script interacts with the VMDIR's database. Take an offline snapshot concurrently for all vCenter Servers in the SSO domain before running the script. Failing to do so may result in an unrecoverable error and require redeploying vCenter Server.

Once the script is complete, restart services for all vCenters in the site domain. As such, the below script fix will require outages for all vCenters in the site domain.

Attachments

fixsts get_app