Search the VMware Knowledge Base (KB)
View by Article ID

Cluster fails to initialize during upgrade In vRealize Operations Manager 6.1 and later (2146902)

  • 0 Ratings

Symptoms

  • vRealize Operations Manager cluster fails to initialize during upgrade.
  • In the /storage/vcops/log/pakManager/vcopsPakManager.root.post_apply_adapter.log file, you see entries similar to:

    2016-07-25 19:36:36,885 [13960] - DEBUG - vcopsPakManager.checkResponse:2675 -
    Response body:
    {"taskId":"74a26686-8cba-4071-8893-a6d857dfab1e","description":"Distributed
    Task Execution for
    ADAPTER_INSTALL","taskState":"ERROR","createdTime":1469499637357,"lastUpdateTime":1469500595924,"errorMessages":[],"links":[{"href":"/suite-api/api/tasks/74a26686-8cba-4071-8893-a6d857dfab1e","rel":"SELF"}]}
    2016-07-25 19:36:36,889 [13960] - DEBUG -
    vcopsPakManager.waitForTaskToComplete:2708 - Task state: ERROR for URL:
    https://localhost/suite-api/api/tasks/74a26686-8cba-4071-8893-a6d857dfab1e
    2016-07-25
    19:36:36,889 [13960] - ERROR -
    vcopsPakManager.waitForTaskToComplete:2723 - job:
    https://localhost/suite-api/api/tasks/74a26686-8cba-4071-8893-a6d857dfab1e
    failed
    with state: ERROR after: 959.0 seconds
    2016-07-25 19:36:36,991 [13960] - ERROR - vcopsPakManager.HandleError:552 -
    Exiting with exit code: 1, message: The adapter file:
    /storage/db/pakRepoLocal/VMwarevSphere-604140256/extracted/vSphereSolutionPak.zip
    failed to install, exiting--exiting

  • In the /storage/vcops/log/cassandra/system.log file, you see entries similar to:

    WARN  [GossipTasks:1] 2016-07-25 19:35:38,704 Gossiper.java:714 - Gossip stage
    has 2 pending tasks; skipping status check (no nodes will be marked down)
    INFO  [Service Thread] 2016-07-25 19:37:40,989 GCInspector.java:252 - ParNew GC
    in 625ms.  CMS Old Gen: 526205664 -> 541839016; Par Eden Space: 171704320 -> 0;

  • In the /storage/vcops/log/analytics-uuid.log file, you see entries similar to:

    2016-07-25 19:36:28,565 ERROR [DistTaskDistributedTaskInstallUninstallAdapters]
    com.integrien.alive.controller.DistributedTaskInstallUninstallAdapters.installAdapters
    - DistributedTaskInstallUninstallAdapters finished with Fail:
    DistributedTaskInstallUninstallAdapters failed: GlobalDataPersistenceException:
    Unable to perform batch action    Msg: Unable to getSafe for future:
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout
    during write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)   
    Error code: DB_EXCEPTION        Kv Exception msg:
    Unable to getSafe for future:
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout
    during write query at consistency QUORUM (2 replica were required but only 1acknowledged the write)

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Resolution

This is a known issue affecting vRealize Operations Manager 6.3.0. This issue is resolved in vRealize Operations Manager 6.4.0, available at VMware Downloads.
 
To workaround the issue, reset the cluster upgrade state and run the initialize cluster script manually.

To reset the cluster upgrade state and initialize cluster script manually:

  1. Log in to the vRealize Operations master node as root through SSH or console.
  2. Make a note of the pak file name in the /storage/db/casa/pak/to_install/ directory.
  3. Navigate to the /storage/db/pakRepoLocal directory.
  4. Change to the directory of the pak file found in step 2.

    For example:

    cd /storage/db/pakRepoLocal/ep-ops-os-and-availability-104126536

  5. Open the pakID.results file using a text editor.

    Note:  Replace pakID with the name of the pak file found in step 2.

  6. Edit the results of each action from 1 to be empty.

    For example:

    Change '''"apply_adapter_exit_code": "1",''' to '''"apply_adapter_exit_code": "",'''

    Set stage_adapter_exit_code to 0,

    For example:

    '''"stage_adapter_exit_code": "0",'''

  7. Save and close the file.
  8. Repeat steps 3-5 for all pak files noted in step 2.
  9. Repeat steps 1-6 on all nodes in the cluster.
  10. Run this commadn to restart the CaSA service:

    service vmware-casa restart

  11. Initialize the cluster by running this command:

    $VMWARE_PYTHON_BIN /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsClusterManager.py init-cluster

  12. Complete the apply_adapter_pre_script, apply_adapter, and apply_adapter_post_script actions manually using one of the two methods:

    Skip running a pak actions:

    Note:  Only use this for paks such as the vRealize Operations Manager Application upgrade

    1. Open /storage/db/pakRepoLocal/pakID/pakID.results file using a text editor.

      Note: Replace pakID with the name of the application upgrade pak file.

      For example:

      /storage/db/pakRepoLocal/vRealizeOperationsManagerEnterprise-6304276417/vRealizeOperationsManagerEnterprise-6304276417.results

    2. Change the 'action_exit_code' value to 0.

      For example:

      '''"action_exit_code": "0",''

    3. Save and close the file.


    Run a pak action:

    1. Run this command to replace pakID with the pak file name noted in step 2 and action_name with the action.

      $VMWARE_PYTHON_BIN $ALIVE_BASE/../vmware-vcopssuite/utilities/pakManager/bin/vcopsPakManager.py --pak pakID --action action_name

      For example:

      $VMWARE_PYTHON_BIN $ALIVE_BASE/../vmware-vcopssuite/utilities/pakManager/bin/vcopsPakManager.py --pak ep-ops-os-and-availability-104126536 --action apply_adapter_pre_script

    2. Wait for the action to complete by checking the status using the command:

      $VMWARE_PYTHON_BIN $ALIVE_BASE/../vmware-vcopssuite/utilities/pakManager/bin/vcopsPakManager.py --pak pakID --action query_pak_status

      Note: The action shows a status of  Completed successfully when complete.

    3. Repeat step b for the apply_adapter_pre_script, apply_adapter, apply_adapter_post_script actions.

  13. Run the cleanup action using the Run A pak Action command with the cleanup action.

    Example:

    $VMWARE_PYTHON_BIN $ALIVE_BASE/../vmware-vcopssuite/utilities/pakManager/bin/vcopsPakManager.py --pak pakID --action cleanup

  14.  Repeat steps 11-12 on all nodes in the cluster.

    Note: These steps may fail if the environmental issues persist.

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 0 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 0 Ratings
Actions
KB: