vSAN KMS Health Check intermittently fails with SSL Handshake Timeout error: "QLC_ERR_TIMEOUT_EXPIRED" on vSphere 6.5
search cancel

vSAN KMS Health Check intermittently fails with SSL Handshake Timeout error: "QLC_ERR_TIMEOUT_EXPIRED" on vSphere 6.5

book

Article ID: 326414

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • The vSAN Health Check shows a Warning state for the test "vCenter and all hosts are connected to Key Management Servers" in which ESXi Hosts report Warning/Failures for KMS Connectivity. However, the vCenter Server does not report issues.
  • Errors in vsanmgmt.log similar to below - using command:
# grep "QueryEncryptionHealth.CheckKmsStatus" /var/run/log/vsanmgmt.log
vsanmgmt.log:2018-11-13T18:15:43Z VSANMGMTSVC: INFO vsanperfsvc[2232d7da-e770-11e8] [VsanHealthEncUtil::GenerateEncryptionHealthSummary] QueryEncryptionHealth.CheckKmsStatus start

vsanmgmt.log:2018-11-13T18:15:44Z VSANMGMTSVC: INFO vsanperfsvc[2232d7da-e770-11e8] [VsanHealthEncUtil::GenerateEncryptionHealthSummary] QueryEncryptionHealth.CheckKmsStatus finish
vsanmgmt.log:2018-11-13T18:15:44Z VSANMGMTSVC: WARNING vsanperfsvc[2232d7da-e770-11e8] [VsanHealthUtil::log]   QueryEncryptionHealth.CheckKmsStatus: 1.21s
  • Errors in vsansystem.log similar to below "grep QLC_ERR_TIMEOUT_EXPIRED vsansystem.log -B 12" 
2019-02-05T09:43:13.450Z info vsansystem[9ADA6F0700] [Originator@6876 sub=Libs] VsanUtil: Get kms client key and cert, old:0
2019-02-05T09:43:13.450Z info vsansystem[9ADA6F0700] [Originator@6876 sub=Libs] VsanUtil: Create client context for server kmip
{69804} configure_backend_platform() - Configuring dynamic Linux backends
{69804} open_shared_lib() - Loaded crypto from /lib64/libcrypto.so.1.0.2
{69804} open_shared_lib() - Loaded ssl from /lib64/libssl.so.1.0.2
{69804} open_shared_lib() - Loaded qlopenssl from /usr/lib/vmware/vsan/lib64/libqlopenssl.so
{69804} try_openssl_backend() - Configured OpenSSL backend
{69804} peer_verify_cb() - Pending 3248
{69804} peer_verify_cb() - Verified certificate
{69804} connect_loop() - Connect timeout (1000000) expired
{69804} setup_ssl() - Failed to connect SSL
2019-02-05T09:43:14.596Z error vsansystem[9ADA6F0700] [Originator@6876 sub=Libs] VsanUtil: Failed to connect to key server, Err:QLC_ERR_TIMEOUT_EXPIRED

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vSAN 6.x
VMware vSAN 6.6.x

Cause

In vSAN 6.5 environments, this error can occur if round-trip latency between the vSAN Hosts and KMS Server exceeds 1 full second. This KMS Timeout value of 1 second is more stringent than the timeout that vCenter Server utilizes which helps explain why the Health Check failure is only reported in the Host KMS Status tab of the vSAN Health Check. 

This issue can also be caused by large SSL certificates (larger than 2048 bytes) on the KMS Server which can result in delayed synchronization between ESXi Hosts and KMS.

Resolution

This issue is resolved in vSphere 6.7, available at VMware Downloads.

Note: In vSphere 6.7, this KMS Timeout is increased from 1 second to 5 seconds to allow the vSAN Health Check to be flexible and closely resemble the KMS check that vCenter Server uses.

Workaround:
To work around this issue, in cases where the SSL Certificate size is a contributing factor to network latency between the ESXi Hosts and KMS Server, VMware recommends reducing the size of the certificate to 2048 bits.

Additional Information

Impact/Risks:
Provided vCenter Server reports no loss of connectivity to the KMS Server and Network Health between the ESXi Hosts and KMS has been validated, this issue can be considered cosmetic in vSphere 6.5 environments and can safely be attributed to the strict network timeout of 1 second.