Using the 'lsdoctor' Tool
search cancel

Using the 'lsdoctor' Tool

book

Article ID: 320837

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Lookup Service Doctor (lsdoctor) is a tool used to address issues with data stored in the PSC database, as well as data local to a vCenter (regardless of whether the PSC is external or embedded).  The tool can be used to detect and correct problems that could cause failures in topology changes (converge, repoint, etc.), upgrades, or failures incurred as a result of maintenance (e.g. incorrectly applying new SSL certificates).  This article will outline its functions and use.

Please note, external PSCs (Platform Services Controllers) were only available in vCenter version 6.7 and below.  vCenter 6.X has reached end of general support since October 15th, 2022.  Please see this post for more information.

Environment

VMware vCenter Server 6.5.x
VMware vCenter Server 6.7.x
VMware vCenter Server 7.0.x
VMware vCenter Server 8.0.x

Resolution

Installation

To use lsdoctor, you must download the ZIP file attached to this article.  Then, use the file-moving utility of your choice (WinSCP for example) to copy the entire ZIP directory to the node on which you wish to run it.
NOTE:  If you have troubles connecting to a vCenter appliance using WinSCP, please see Error when uploading files to vCenter Server Appliance using WinSCP

Once the tool is copied to the system, unzip the file:
Windows
:
Right-click the file and click “Extract All…”
VCSA
Change your directory to the location of the file, and run the following command:
unzip lsdoctor.zip

NOTE:  When running the tool, be sure you are currently in the “lsdoctor-main” directory.
 

Launching the Tool

First, ensure you are in the lsdoctor-main directory from a command line (SSH session for VCSA).
To run lsdoctor, use the following command:

VCSA
python lsdoctor.py --help

image.png

 
 
Functions
Skip to function:  

-l, --lscheck

This option checks for common issues in the lookup service.  Does not make any changes to the environment.  This will show issues found on any node in the SSO domain.  See output for findings and path to JSON report.


Instructions:

  1. copy and extract lsdoctor to the filesystem of any node
  2. Run “python lsdoctor.py -l”
  3. Provide the password for your SSO administrator account
  4. Review output for issues found.  Each node will be represented by it’s SSO site followed by its hostname or PNID

Follow up actions needed:

  • Follow output instructions if applicable.

-p, --pscHaUnconfigure

This option is used when removing a PSC HA configuration (multiple external PSCs behind a load balancer).  To use this option, follow these steps:


Instructions:

  1. For each PSC behind the load balancer:
    1. copy and extract lsdoctor to the filesystem
    2. Run “python lsdoctor.py -p”
    3. Verify that you have taken the appropriate snapshots
    4. Provide the password for your SSO administrator account
    5. Once the script completes, move to the next node behind the load balancer.

Follow up actions needed:

  • Once lsdoctor has run on all nodes behind the LB (all PSCs in the SSO site), restart services on all of the PSCs on which you have run the script
  • Repoint your vCenter server(s) to the node of your choice.  For more information, see https://kb.vmware.com/s/article/2113917
  • Re-register any external solutions that were previously pointed to the load balancer VIP (SRM, vSphere Replication, NSX-V, etc. – See product documentation for instructions)

-s, --stalefix

This option cleans up any stale configurations left over from a system upgraded from 5.x.


Instructions:

  1. copy and extract lsdoctor to the filesystem of the affected node
  2. Run “python lsdoctor.py -s”
  3. Verify that you have taken the appropriate snapshots
  4. Provide the password for your SSO administrator account

Follow up actions needed:

  • Once the script completes, restart all services
  • Re-register any external solutions that were previously pointed to the affected node (SRM, vSphere Replication, NSX-V, etc. – See product documentation for instructions)

-t, --trustfix

This option corrects SSL trust mismatch issues in the lookup service.  The lookup service registrations may have an SSL trust value that doesn’t match the MACHINE_SSL_CERT on port 443 of the node.  This can be caused by a failure during certificate replacement, among other failures.


Instructions:

  1. copy and extract lsdoctor to the filesystem of any node in the same SSO site as the affected node(s)
  2. Run “python lsdoctor.py -t”
  3. Verify that you have taken the appropriate snapshots
  4. Provide the password for your SSO administrator account

Follow up actions needed:

  • Once the script completes, restart all services on all nodes in the SSO site
  • Re-register any external solutions that were previously pointed to the affected node(s) (SRM, vSphere Replication, NSX-V, etc.)

-u, --solutionusers

This option is used to recreate solution users for a node.  There are many reasons a solution user may be missing or inconsistent, but this script will delete any existing entry and recreate them from scratch.


Instructions:

  1. copy and extract lsdoctor to the filesystem of the affected node
  2. Run “python lsdoctor.py -u”
  3. Verify that you have taken the appropriate snapshots
  4. Provide the password for your SSO administrator account
  5. Once the script completes, restart all services on this node

Follow up actions needed:

  • None

-r, --rebuild

This option is used when rebuilding one or all service registrations for a given node.  This action performs the most significant changes, and extra care should be taken to ensure a safe rollback is possible (see Impact/Risks section).

Important: If running "lsdoctor -r" on vCenter Server 6.7 U3q or above (does not include vCenter 7.x or 8.x), extra steps need to be taken to recreate the SSO endpoints that are used by some external solutions. See the note at the end of this section for directions.
 

Instructions:

  1. copy and extract lsdoctor to the filesystem of any node in the same SSO site as the affected node(s)
  2. Run “python lsdoctor.py -r”
  3. Verify that you have taken the appropriate snapshots
  4. Provide the password for your SSO administrator account
  5. You will be presented with a menu:
NOTE:  Version, Deployment type, and hostname references will be unique to your environment.


Option 1:

Option 1 is only used when lsdoctor does not have the appropriate files for your build of vCenter.  This will generate the file for your build (see output for path to the template).  You can then copy the template generated from a healthy system to an unhealthy system of the same build for use with the other options.  This should rarely be needed.


Option 2:

 Option 2 will take a backup of the existing lookup service registrations for this node, remove them, and register new ones based on a template.  Most templates are provided in the templates directory, and you should only need to select a template if it doesn’t exist.  If the template for your build of vCenter does not exist, you will be prompted to select one.  You should try to find the template that most closely matches your system.  Alternatively, you can use Option 1 on a healthy system of the same build and copy the template to the affected system.


Option 3:

Option 3 will take a backup of all lookup service registrations for this node, load the template for your current build, and provide a service select menu.  Select a service type you wish to replace, and the tool will unregister all services for this node with the selected service type and register a fresh one based on the detected template.  As with Option 2, you may be prompted to select a template if one is not automatically provided.


Option 4:

Option 4 allows you to restore the lookup service registrations from an automatic backup taken by performing option 2 or option 3.  You will be prompted to select a backup that is marked by timestamp (oldest to newest).  It will then remove all current service registrations and register the ones from the backup.


Follow up actions needed:

  • Once the script completes, restart all services
  • Re-register any external solutions that were previously pointed to this node (SRM, vSphere Replication, NSX-V, etc.)

Using the rebuild option on vCenter Server 6.7 U3q or above (not including vCenter 7.x or 8.x) requires the following extra commands to be run after it completes. This will recreate the legacy SSO endpoints. Ensure you replace VCSA_FQDN with the vCenter's FQDN and that the machine_ssl certificate exists at /root/machine_ssl.cer
/usr/java/jre-vmware/bin/java -cp /opt/vmware/lib64/*:/usr/lib/vmware-sso/commonlib/*:/usr/lib/vmware/common-jars/log4j-core-2.12.4.jar:/usr/lib/vmware/common-jars/log4j-api-2.12.4.jar:/usr/lib/vmware/common-jars/log4j-slf4j-impl-2.12.4.jar:/usr/lib/vmware/common-jars/jcl-over-slf4j-1.7.30.jar:.:* -Dlog4j.configuration=initls-log4j.properties -Dvmware.log.dir=/var/log/vmware/sso/ -XX:ErrorFile=/var/log/vmware/sso/hs_err_stsinstaller_pid%p.log-XX:HeapDumpPath=/var/log/vmware/sso/ com.vmware.vim.lookup.tools.InitializeLookupService --cert-path /root/machine_ssl.cer --host-name VCSA_FQDN --http-port 443

/usr/java/jre-vmware/bin/java -cp /opt/vmware/lib64/*:/usr/lib/vmware-sso/commonlib/*:/usr/lib/vmware/common-jars/log4j-core-2.12.4.jar:/usr/lib/vmware/common-jars/log4j-api-2.12.4.jar:/usr/lib/vmware/common-jars/log4j-slf4j-impl-2.12.4.jar:/usr/lib/vmware/common-jars/jcl-over-slf4j-1.7.30.jar:.:* -Dlog4j.configuration=initls-log4j.properties -Dvmware.log.dir=/var/log/vmware/sso/ -XX:ErrorFile=/var/log/vmware/sso/hs_err_stsinstaller_pid%p.log-XX:HeapDumpPath=/var/log/vmware/sso/ com.vmware.vim.lookup.tools.InitializeLookupService --cert-path /root/machine_ssl.cer --host-name VCSA_FQDN --http-port 443 --legacy


Additional Information

VMware Skyline Health Diagnostics for vSphere - FAQ


Glossary

Node – Refers to a vCenter Server with embedded Platform Services Controller, an external Platform Services Controller, or a vCenter server pointed to an external Platform Services Controller

SSO domain -- Each vCenter server (or PSC if external) is associated with a vCenter Single Sign-On domain. The domain name defaults to vsphere.local, but you can change it during installation of the first Platform Services Controller. The domain determines the local authentication space.  All nodes in Enhanced Linked Mode are part of the same SSO domain

SSO site -- You can split an SSO domain into multiple sites and assign each embedded vCenter server or external Platform Services Controller and external vCenter Server instance to a site. Sites are logical constructs that group nodes together, but usually correspond to geographic location.

 



Impact/Risks:

WARNING

Before using lsdoctor to make any changes, ensure you have taken proper snapshots of your SSO domain. This means that you must shut down all VCs or PSCs that are in the SSO domain at the same time, then snapshot them, and power them on again.  If you need to revert to one of these snapshots, shut all the nodes down, and revert all nodes to the snapshot. Failure to perform these steps will lead to replication problems across the PSC databases.


LIMITATIONS

Currently, lsdoctor supports vCenter Server 6.7 and above (VCSA). When new builds of vCenter are released, lsdoctor must be updated asynchronously.  This means that lsdoctor support for the latest version of vCenter may be updated sometime after a new build is released.
  

Attachments

lsdoctor-240201 get_app