Search the VMware Knowledge Base (KB)
View by Article ID

Errors with Xenon/Container Service in vRealize Automation 7.3 HA environment (2150912)

  • 0 Ratings

Symptoms

  • In a vRA clustered environment, the container service might not respond when docker hosts are removed after Xenon service is running for some days.
  • The /var/log/vmware/xenon log file consumes all disk space and contains log entries similar to:

    [validateStageTransitionAndState][Moving from STARTED(REQUEST_FAILED) to STARTED(REQUEST_FAILED).]
    [lambda$synchronizeChildrenInQueryPage$5][Synchronization failed for service {service-resource} with status code 404, message Service https://{address}:8494/{service-resource} returned error 404 for {method}. id {opId} message Service not found: http://127.0.0.1/{service-resource}]
    [checkAndCompleteOperation][(Original id: {opId}) Replication request to https://{address}:8494/{service-resource}-{method} failed with 500, Service https://{address}:8494/{service-resource} returned error 500 for {method}. id {opId} message queue limit exceeded] [lambda$handleServiceNotFoundOnReplica$5][Service {service-resource} not found on replica. Retrying replication request ..


  • When one or more nodes are restarted, you see some inconsistencies similar to:

    • Inconsistent data can be collected, where some of the Docker host containers might not be discovered. 
    • Inconsistent data can be displayed depending on what node the UI is (internally) requesting the data to.


Cause

This issue occurs due to issues during the setup of the Xenon cluster and in the clustering implementation itself.

Resolution

To resolve this issue, apply the patch 2150912_patch.zip attached to this KB article. A backup of all container related data is created automatically by the patch script, no manual actions are required for backup.
 
 
To apply the patch:
  1. Download the 2150912_patch.zip file and add it to any active vRA appliances.

    Note: This does not include virtual appliances used for Code Stream or vRO.

  2. Extract the 2150912_patch.zip file to get the patch.sh script.

  3. Copy the patch.sh script to a working directory on each vRealize Automation node.

  4. Add the execute permissions to the script:

    1. Update the owner of the file as root by running this command:

      chown -R root <patch file with directory path>

    2. Change the file permissions to 744 by running this command:

      chmod 744 <patch file with directory path>

      NOTE: Replace <patch file with directory path> with the full directory path of  patch.sh script.

  5. Execute bash patch.sh sequentially on each node.

    NOTE: Do not execute the script in parallel on all nodes.

  6. If output of the patch execution reports:

    Node will not start. Available node detected but it is not responsive yet. Try again later.

    Execute the patch on the other node(s) and start Xenon service manually once the patch execution succeeded on other nodes:


    service xenon-service start

Rollback steps
:

To rollback, restore the /etc/xenon directory from the backup archive created automatically by the script. 


Additional Information

These fixes are scheduled in a future release and not required to reapply the patch for future versions. 

Steps that are executed by this patch script:
  1. A backup archive of all container related data is created in /tmp directory.
  2. The necessary files are extracted to a temporary folder and then an installer script is invoked.
  3. At first the Xenon service instance is stopped, then necessary files are copied and then Xenon is started again.
  4. The temporary folder is deleted from the system. 

Tags

vRealize Automation, vRA, Containers, Xenon, Xenon Service, Docker, HA, High-Availability

Attachments

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 0 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 0 Ratings
Actions
KB: