"503 Service Unavailable (Failed to connect to endpoint: _serverNamespace = /websso", vCenter Server stops responding due to Secure Token Service (vmware-stsd) OutOfMemoryError
search cancel

"503 Service Unavailable (Failed to connect to endpoint: _serverNamespace = /websso", vCenter Server stops responding due to Secure Token Service (vmware-stsd) OutOfMemoryError

book

Article ID: 344908

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • You are running with vCenter Server 6.7 and intermittently UI client (vsphere-ui) becomes unresponsive with below error message
503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x00007f36c81e2f50] _serverNamespace = /websso action = Allow _port = 7080)
  • You will notice multiple Java heap dumps under /var/log/vmware/sso
Example:
[ /var/log/vmware/sso ]# ls *.hprof
java_pid1774.hprof  java_pid1798.hprof  java_pid1803.hprof
  • You are using Smart Cards to login vCenter Server and CRL are being checked due to smart card login
  • Log file /var/log/vmware/sso/utils/vmware-stsd.err shows below entries
INFO: Server startup in 20715 ms
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at java.io.OutputStream.write(OutputStream.java:75)
        at sun.security.util.DerValue.encode(DerValue.java:434)
        at sun.security.util.DerValue.toByteArray(DerValue.java:867)
        at sun.security.x509.X509CRLEntryImpl.parse(X509CRLEntryImpl.java:456)
        at sun.security.x509.X509CRLEntryImpl.<init>(X509CRLEntryImpl.java:133)
        at sun.security.x509.X509CRLImpl.parse(X509CRLImpl.java:1160)
        at sun.security.x509.X509CRLImpl.<init>(X509CRLImpl.java:146)
        at sun.security.provider.X509Factory.engineGenerateCRL(X509Factory.java:390)
        at java.security.cert.CertificateFactory.generateCRL(CertificateFactory.java:497)
        at com.vmware.identity.idm.server.clientcert.IdmCrlCache.downloadCrl(IdmCrlCache.java:168)
        at com.vmware.identity.idm.server.clientcert.IdmCrlCache.refresh(IdmCrlCache.java:101)
        at com.vmware.identity.idm.server.clientcert.TenantCrlCache.refreshCrl(TenantCrlCache.java:50)
        at com.vmware.identity.idm.server.IdentityManager.refreshTenantCrlCache(IdentityManager.java:7322)
        at com.vmware.identity.idm.server.IdentityManager.access$400(IdentityManager.java:217)
        at com.vmware.identity.idm.server.IdentityManager$IdmCrlCachePeriodicChecker.run(IdentityManager.java:282)
java.lang.OutOfMemoryError: Java heap space
  • Log file /var/log/vmware/sso/vmware-identity-sts-default.log shows below entries
[2020-03-15T11:33:00.848+03:00 Thread-9 IdmCrlCachePeriodicChecker ERROR com.vmware.identity.idm.server.IdentityManager] IdmCrlCachePeriodicChecker refreshTenantCrl failed : GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at sun.security.util.DerInputBuffer.dup(DerInputBuffer.java:66) ~[?:1.8.0_221]
        at sun.security.util.DerValue.<init>(DerValue.java:284) ~[?:1.8.0_221]
        at sun.security.util.DerInputStream.getDerValue(DerInputStream.java:451) ~[?:1.8.0_221]
        at sun.security.x509.X509CRLEntryImpl.parse(X509CRLEntryImpl.java:459) ~[?:1.8.0_221]
        at sun.security.x509.X509CRLEntryImpl.<init>(X509CRLEntryImpl.java:133) ~[?:1.8.0_221]
        at sun.security.x509.X509CRLImpl.parse(X509CRLImpl.java:1160) ~[?:1.8.0_221]
 


Environment

VMware vCenter Server 6.7.x
VMware vCenter Server 7.0.x

Cause

This issue occurs when vCenter is given an extremely large CRL to process.

Resolution

This is a known issue affecting VMware vCenter Server 6.7 and 7.0.


Workaround:
To workaround this issue, increase the heap size of vmware-stsd or change the certificate revocation checking method from CRL to OCSP.

Heap Size
Increase the heap size of vmware-stsd service by following below steps. These steps are applicable only from vCenter Server 6.7 Update 3 and above builds. Update the vCenter Server to 6.7 Update 3 or above builds before proceeding with below steps :
  • Connect to VCSA using SSH
  • Change the shell to Bash
Connected to service

    * List APIs: "help api list"
    * List Plugins: "help pi list"
    * Launch BASH: "shell"

Command> shell
Shell access is granted to root
root@vcsa [ ~ ]#
  • Increase the Heap Size to 1024MB. Please note, you might have to increase the Memory (RAM) of vCenter Server before increasing the heap size. Refer to Related Information of this KB for more details.
cloudvm-ram-size -C 1024 vmware-stsd​​​​​​
  • Restart the vmware-stsd Service
service-control --stop vmware-stsd && service-control --start vmware-stsd
 
Certification Revocation Checking
Updating the Certificate revocation settings to OCSP can also be used as a workaround. Your PKI infrastructure must support his method of certificate revocation to work properly. You can update the certificate revocation settings in the vSphere Client.
  1. From the Home menu, select Administration.
  2. Under Single Sign On, click Configuration.
  3. Click Smart Card Authentication.
  4. Under Smart card authentication settings, click Certificate revocation and click Edit.
  5. Change the Revocation check to Use OCSP only and provide the location.
For more information on configuring certificate revocation, reference Set Revocation Policies for Smart Card Authentication in the VMware Documentation

Additional Information

The cloudvm-ram-size utility is an utility of VC/PSC that given the total available memory divides and distributes said memory between individual processes that make up VC/PSC. Therefore, if cloudvm-ram-size is used to increase a memory size of one process, it will automatically decrease the size of all other processes because the total memory available is static.

Therefore, it is important to increase the total memory (RAM) of VC/PCS. By increasing total memory of VM that is hosting VC/PCS, and then increasing a memory footprint of a single process, results in not starving other processes of their memory, and does not introduce more instability into the system due other processes starving.

Follow below best practices for increasing the heap size of the process:
  • SSH into the VC/PSC as a root user
  • Execute the command: "cloudvm-ram-size -l". This will list the current memory distribute and division of memory between each service. Take note of these allocations
  • Increase the heap size by executing the command: "cloudvm-ram-size -C 1024 vmware-stsd"
  • Execute "cloudvm-ram-size -l" again and compare the result to make sure no other service has been adjusted down. 
  • If other services were adjusted down too much (more than 5%), then increase the total available memory (RAM) of vCenter Server
  • Restart the VM and execute "cloudvm-ram-size -C 1024 vmware-stsd" again so that utility will recalculate memory allocations based on the new total memory
  • Check "cloudvm-ram-size -l" allocation again, and compare the distributions to the originals.
The goal is to make sure that cloudvm-ram-size has enough total memory to work with and that it adjusts vmware-stsd up without compromising other services and adjusting anything else down.