vmware-vapi-endpoint fails to start or crashes after upgrading to vCenter Server 6.5 Update 2
search cancel

vmware-vapi-endpoint fails to start or crashes after upgrading to vCenter Server 6.5 Update 2

book

Article ID: 342879

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • After upgrading to vCenter Server 6.5 Update 2, the vmware-vapi-endpoint fails to start or crashes.
  • In the endpoint.log file, you see entries similar to:
# less /var/log/vmware/vapi/endpoint/endpoint.log

Caused by: javax.net.ssl.SSLHandshakeException: com.vmware.vim.vmomi.client.exception.VlsiCertificateException: Server certificate chain is not trusted and thumbprint verification is not configured
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1964)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:328)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1614)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:987)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
        at com.vmware.vim.vmomi.client.http.impl.ThumbprintTrustManager$HostnameVerifier.verify(ThumbprintTrustManager.java:420)
        ... 45 more
Caused by: com.vmware.vim.vmomi.client.exception.VlsiCertificateException: Server certificate chain is not trusted and thumbprint verification is not configured
        at com.vmware.vim.vmomi.client.http.impl.ThumbprintTrustManager.checkServerTrusted(ThumbprintTrustManager.java:206)
        at sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:985)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1596)
        ... 53 more
Caused by: com.vmware.identity.vecs.VecsGenericException: Native platform error [code: 87][Enum of entries on store 'TRUSTED_ROOT_CRLS' failed. [Server: __localhost__, User: __localuser__]]
        at com.vmware.identity.vecs.VecsEntryEnumeration.BAIL_ON_ERROR(VecsEntryEnumeration.java:108)
        at com.vmware.identity.vecs.VecsEntryEnumeration.enumEntries(VecsEntryEnumeration.java:139)
        at com.vmware.identity.vecs.VecsEntryEnumeration.fetchMoreEntries(VecsEntryEnumeration.java:122)
        at com.vmware.identity.vecs.VecsEntryEnumeration.<init>(VecsEntryEnumeration.java:36)
        at com.vmware.identity.vecs.VMwareEndpointCertificateStore.enumerateEntries(VMwareEndpointCertificateStore.java:369)
        at com.vmware.provider.VecsCertStoreEngine.engineGetCRLs(VecsCertStoreEngine.java:77)
        at java.security.cert.CertStore.getCRLs(CertStore.java:181)
        at com.vmware.vim.vmomi.client.http.impl.ThumbprintTrustManager.checkForRevocation(ThumbprintTrustManager.java:246)
        at com.vmware.vim.vmomi.client.http.impl.ThumbprintTrustManager.checkServerTrusted(ThumbprintTrustManager.java:158)
        ... 55 more
2018-10-15T15:45:59.685+02:00 | INFO  | state-manager1            | HealthStatusCollectorImpl      | HEALTH ORANGE Failed to retrieve SSO settings from component manager.
2018-10-15T15:45:59.685+02:00 | ERROR | state-manager1            | DefaultStateManager            | Could not initialize endpoint runtime state.
com.vmware.vapi.endpoint.config.ConfigurationException: Failed to retrieve SSO settings.
        at com.vmware.vapi.endpoint.cis.SsoSettingsBuilder.buildInitial(SsoSettingsBuilder.java:63)
        at com.vmware.vapi.state.impl.DefaultStateManager.build(DefaultStateManager.java:354)
        at com.vmware.vapi.state.impl.DefaultStateManager$1.doInitialConfig(DefaultStateManager.java:168)
        at com.vmware.vapi.state.impl.DefaultStateManager$1.run(DefaultStateManager.java:151)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
  • In the vmafdd-syslog file, you see the same certificates being pushed to VECS over and over. You can verify this by running the command. 

# grep "Added cert to VECS DB" /var/log/vmware/vmafdd/vmafdd-syslog.log

18-10-13T11:27:24.090346+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:28:24.085596+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:29:24.089158+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:30:24.041227+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:31:24.084083+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:32:24.095645+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:33:24.087458+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: 7ec611450f6b70edd15c936358731ce2a1030038
18-10-13T11:34:24.318936+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:35:24.091393+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:36:24.108070+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:37:24.082253+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:38:24.098974+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:39:24.084759+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:40:24.086880+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:41:24.092401+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4
18-10-13T11:42:24.099424+02:00 notice vmafdd  t@140015749916416: Added cert to VECS DB: e8366b69a8bf3724a3c2446a3e1cc8cb3eaf44e4

 
Note: The CRL store is filled with spurious entries and the number grows indefinitely over time. Run the following command to see the current number and to monitor growth:

# /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOT_CRLS --text | wc -l

Environment

VMware vCenter Server 6.7.x
VMware vCenter Server Appliance 6.5.x
VMware vCenter Server 7.0.x

Cause

This issue is caused by one or more corrupt CRL files in /etc/ssl/certs. To verify that you have corrupt entries complete the following steps. 
  • SSH to the vCenter Server Appliance.
  • Navigate to the /etc/ssl/certs location and run the following command to return the "Authority Key Identifier" for all CRLs, if you see a failure then you may have a corrupt entry.
# for i in `grep -l "BEGIN X509 CRL" *`;do openssl crl -inform PEM -text -noout -in $i | grep -A 1 " Authority Key Identifier";done
 
Expected output example:
 
X509v3 Authority Key Identifier:
keyid:DF:35:D5:0F:B8:82:A3:5E:02:CA:CA:34:04:16:0F:90:92:EA:B6:5C
X509v3 Authority Key Identifier:
keyid:6F:F3:16:F6:4C:3D:D7:02:D9:EA:2B:C4:7A:3A:14:5F:D5:9F:A6:27
X509v3 Authority Key Identifier:
keyid:73:58:C4:3D:55:7C:1A:83:7C:5C:63:68:BA:B8:9D:1E:1E:DA:E0:80
X509v3 Authority Key Identifier:
keyid:0D:E7:CA:07:1C:CA:01:DC:57:EE:30:6E:FC:FB:55:86:39:96:D0:

 
  • Run the following command to check for an corruption relating to CA certificates. This should return  with the "Subject Key Identifier", if you see a failure then you may have a corrupt entry.
# for i in `grep -l "BEGIN CERTIFICATE" *`;do openssl x509 -in $i -noout -text | grep -A 1 "Subject Key Identifier";done

Expected output example:
 
X509v3 Subject Key Identifier:
6A:72:26:7A:D0:1E:EF:7D:E7:3B:69:51:D4:6C:8D:9F:90:12:66:AB
X509v3 Subject Key Identifier:
6A:72:26:7A:D0:1E:EF:7D:E7:3B:69:51:D4:6C:8D:9F:90:12:66:AB
X509v3 Subject Key Identifier:
C7:A0:49:75:16:61:84:DB:31:4B:84:D2:F1:37:40:90:EF:4E:DC:F7
X509v3 Subject Key Identifier:
C7:A0:49:75:16:61:84:DB:31:4B:84:D2:F1:37:40:90:EF:4E:DC:F7

Resolution

To resolve this issue, delete any corrupt files in /etc/ssl/certs and remove all entries from the CRL store so that VMDIR push down fresh certificates to VECS. This in turn allows the VAPI service to start successfully. 

Ensure you a have a valid backup or snapshot of the vCenter Server before proceeding. Overview of Backup and Restore options in vCenter Server 6.x (2149237)

A script has been written to automate this process. 
  1. SSH to the vCenter Server Appliance. 
  2. CD into /tmp. 
  3. Create a file for the script. For example # vi crl-fix.sh
  4. Copy and paste the following into the file:
#!/bin/bash
cd /etc/ssl/certs
mkdir /tmp/pems
mkdir /tmp/OLD-CRLS-CAs
mv *.pem /tmp/pems && mv *.* /tmp/OLD-CRLS-CAs
h=$(/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOT_CRLS --text | grep Alias | cut -d : -f 2)
for hh in "echo "${h[@]}"";do echo "Y" | /usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOT_CRLS --alias $hh;done
mv /tmp/pems/* .
for l in `ls *.pem`;do ln -s $l ${l/pem/0};done
service-control --stop vmafdd && service-control --start vmafdd
  1. Save the file and change the permissions before executing the script. 
# chmod +x crl-fix.sh
  1. Run the script using following syntax. 
# ./crl-fix.sh
  1. Reboot the vCenter Server Appliance.