"install.vmafd.vmdir_vdcpromo_error_21" error during VC convergence, cross domain repoint and fresh deployment
search cancel

"install.vmafd.vmdir_vdcpromo_error_21" error during VC convergence, cross domain repoint and fresh deployment

book

Article ID: 344902

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • You are observing failure while performing one of below the operations on vCenter Server Appliance 6.7 U2c or 6.7 U3 :
    • Upgrading from either version to vCenter Server 7.0
    • Converging a vCenter Server (fails at 42%)
    • Deploying a new vCenter Server by joining to existing vSphere Domain (ELM)
    • Cross Domain Repoint of vCenter Server to an existing vSphere Domain
  • In the /var/log/firstboot/vmafd-firstboot.py_<PID>_stderr.log file, you see entries similar to:
2019-09-09T21:10:03.961Z  Initializing Directory server instance ...
Vdcpromo failed. Error[9126]
Could not connect to the local service VMware Directory Service.
Verify VMware Directory Service is running.
2019-09-09T21:10:03.961Z  <class 'cis.baseCISException.BaseInstallException'>
2019-09-09T21:10:03.964Z  Exception: Traceback (most recent call last):
  File "/usr/lib/vmware-vmafd/firstboot/vmafd-firstboot.py", line 177, in main
    controller.firstboot()
  File "/usr/lib/vmware-vmafd/firstboot/vmafd-firstboot.py", line 53, in firstboot
    self.init()
  File "/usr/lib/vmware-vmafd/firstboot/vmafd-firstboot.py", line 59, in init
    service.init()
  File "/usr/lib/vmware-vmafd/firstboot/identityinstall/vmdirInstall.py", line 404, in init
    self.setup_domain()
  File "/usr/lib/vmware-vmafd/firstboot/identityinstall/vmdirInstall.py", line 259, in setup_domain
            "translatable": "Could not connect to the local service VMware Directory Service. Verify VMware Directory Service is running.",
            "id": "install.vmafd.vmdir_vdcpromo_error_21",
            "localized": "Could not connect to the local service VMware Directory Service. Verify VMware Directory Service is running."
  • In the /var/log/firstboot/vmafd-firstboot.py_<PID>_stdout.log file, you see entries similar to:
2019-09-09T21:04:32.587Z  Starting the First boot for VMDIR
2019-09-09T21:04:32.590Z  Setting up as a secondary domain Controller
2019-09-09T21:04:32.591Z  Running command: ['/usr/lib/vmware-vmafd/bin/vdcpromo', '-u', 'Administrator', '-s', '<domain name>', '-h', '<destination vCenter / PSC>', '-H', '<source vCenter / PSC>']
2019-09-09T21:10:03.961Z  VMAFD Boot failed
  • In the /var/log/vmware/vmdird/vmdird-syslog.log file, you see entries similar to:
2019-09-09T21:04:54.244357+00:00 info vmdird  t@140243357394688: VmDirSrvInitializeHost success: (<Domain Name>)(Administrator)(<domain name>)(ldap://<Replication Partner FQDN>)
2019-09-09T21:04:54.245695+00:00 info vmdird  t@140243617404672: _VmDirGetRemoteDBUsingRPC: Connected to the replication partner (Replication Partner FQDN).
2019-09-09T21:04:54.247188+00:00 info vmdird  t@140243617404672: _VmDirGetRemoteDBUsingRPC: copying remote file /storage/db/vmware-vmdir/data.mdb with data size 128 MB with Map size 20480 MB ...
2019-09-09T21:09:58.472168+00:00 err vmdird  t@140243617404672: VmDirReadDatabaseFile failed. Error[382312502]
2019-09-09T21:09:58.472873+00:00 err vmdird  t@140243617404672: _VmDirGetRemoteDBFileUsingRPC: RpcVmDirReadDatabaseFile() failed on remote file /storage/db/vmware-vmdir/data.mdb with error: 382312502
2019-09-09T21:09:58.638164+00:00 err vmdird  t@140243617404672: VmDirFirstReplicationCycle: _VmDirGetRemoteDBUsingRPC() call failed with error: 1
2019-09-09T21:09:58.638522+00:00 warning vmdird  t@140243617404672: vdirReplicationThrFun: VmDirReplURIToHostname or VmDirFirstReplicationCycle failed, error (-402647496).
2019-09-09T21:09:58.638907+00:00 err vmdird  t@140243617404672: Vmdird force exiting ...
2019-09-09T21:09:58.639466+00:00 info vmdird  t@140243617404672: VmDir State (5)
2019-09-09T21:09:58.639903+00:00 err vmdird  t@140243617404672: vdirReplicationThrFun: Replication has failed with unrecoverable error.


Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vCenter Server Appliance 6.7.x

Cause

VMAFD firstboot process copies the VMware Directory Service Database (data.mdb) from the replication partner. The copy operations gets stuck on the source Platform Services Controller or Replication Partner vCenter Appliance while sending the data. This results in VMAFD firstboot failure. The underlying reason for the hang during data transfer is a networking bug in the TCP/IP stack of Linux kernel version used in vCenter Server Appliance 6.7 U2c and 6.7 U3.

Resolution

This issue is resolved in vCenter Server 6.7 U3a, available at VMware Downloads.

During upgrades from vCenter Server 6.7U2c or 6.7U3 to vCenter Server 7.0, the workaround below must be applied first.


Workaround:
To work around this issue, disable TSO (TCP Segmentation Offload) and GSO (Generic Segmentation Offload) on the Ethernet Adapter Configuration of the source Platform Services Controller or Replication Partner vCenter Appliance before performing Convergence / Fresh Deployment / Cross Domain Repoint.

To disable TSO and GSO:
  1. Connect to the source PSC or Replication Partner vCenter Appliance using SSH.
  2. Change shell to Bash.
    For example:
    Command> shell
     
  3. Execute these commands:
    ethtool -K eth0 tso off
    ethtool -K eth0 gso off
Note: TSO & GSO will be automatically enabled during appliance reboot.

To enable TSO and GSO after Convergence / Fresh Deployment / Cross Domain Repoint is completed:
  1. Execute these commands:
ethtool -K eth0 tso on
ethtool -K eth0 gso on

 
To identify on which VCSA to apply the Workaround :
  • Scenario-1 [Convergence]: - Converging instances of vCenter Server Appliance with an external Platform Services Controller instances into vCenter Server Appliance with an embedded Platform Services Controller

VC1 -> PSC1
Converging VC1 pointing to PSC1 - Apply the workaround on PSC1 
  • Scenario-2[Convergence]: - Converging instances of vCenter Server Appliance with an external Platform Services Controller instances into vCenter Server Appliance with an embedded Platform Services Controller connected in Embedded Linked Mode

PSC1 <-> Embedded VC2
  ^
  |
VC1
Converging VC1 pointing to Embedded VC2 as Replication Partner where VC2 is an already Converged node - Apply the workaround on Embedded VC2
  • Scenario-3[Fresh Install]: - Adding a new vCenter Server Appliance with an embedded Platform Services Controller connected in Embedded Linked Mode

VC2 Embedded[Installing] <-> VC1 Embedded  
VC2 getting installed pointing to Embedded VC1 - Apply the workaround on VC1 
  • Scenario-4[Embedded Domain repoint]: - Repointing a vCenter Server Appliance with an Embedded Platform Services Controller to an existing vsphere domain

Embedded VC1[vsphere.local] <-> Embedded VC2[vmc.local]
VC1 is being repointed to VC2 - Apply the workaround on VC2
 
Note: You may apply the workaround on all the vCenter Server Appliances which are part of Enhanced Linked Mode if difficulty in finding out on which vCenter Server Appliance the workaround needs to be applied and enable it again when the activity is complete. Please note rebooting vCenter Server Appliance will automatically enable GSO and TSO by default.


Additional Information

'install.vmafd.vmdir _vdcpromo_error_ 23' error while deploying, upgrading or installation of vCenter Server vmafd firstboot fails (59538)

VMware vSphere 7.0 Release Notes