Search the VMware Knowledge Base (KB)
Products:
View by Article ID

Component metadata health check fails with invalid state error (2145347)

  • 15 Ratings
Language Editions

Symptoms

  • The vSAN Health check plug-in reports the Component metadata health test as Failed.
  • The component UUID reports the component state as Invalid state.
  • At the cluster level in your vCenter Server > Monitor > Virtual San > Health, you see an error similar to:

Purpose

This article explains how to fix Invalid state components which cause Component metadata health alarms.

Cause

This issue occurs due to an issue in the vSAN Local Log Structured Object Management (LSOM) component that results in components with corrupted metadata.

Note: This issue will not typically impact the availability of any VM objects where the “host failures to tolerate” policy option is greater than zero.

Resolution

This issue is resolved in ESXi 6.0, patch ESXi600-201611001, available at VMware Patch Downloads.

For more information on downloading the patch, see How to download patches in MyVMware (1021623).

Important:
  • If you do not have any invalid state components at this time, upgrading to the patch is sufficient.
  • If your vSAN cluster is already reporting invalid state errors, you must evacuate and destroy the disk or disk group that contains the corrupted component(s) to recover the space consumed by the component(s) and to clear the alarm.
The majority of failure scenarios are addressed in ESXi 6.0 patch, ESXi600-201611001. However, in rare cases it may be necessary to manually remove the disk(s). If this is necessary, see the manual identification and destruction procedure detailed in the Additional Information section. Also, you will find a script to help you automate the identification process.


Additional Information

Manual process to identify and and destroy the corrupted components:
  1. Ensure that you are on the latest on disk version for your ESXi build.  For more information on the on disk format versions, see:

  2. Identify the disk or disk group where the corrupt component is located using the attached script.

    Note: The script is for identification purposes only, no data modification occurs.

    1. Unzip the archive 2145347_IdentifyInvalidComponentsVsan.zip to the temp directory of your ESXi host.
    2. Run the IdentifyInvalidComponentsVsan.sh script. 
    3. Copy the output of the script to a notepad and then skip to step 4.

  3. Alternately, if you cannot use the script, use these steps:

    1. Note the component UUID listed in the Component metadata health check error.
    2. Connect to RVC and navigate to /localhost/<Your Datacenter>/computers folder 
    3. Run the command:

      vsan.cmmds_find 0 -u <component uuid>

      You will see an output similar to:

      vsan.cmmds_find 0 -u dc3ae056-0c5d-1568-8299-a0369f56ddc0

      ----+---------+-----------------------------------------------------------+
      | Health  | Content                                                   |
      +---------+-----------------------------------------------------------+
      | Healthy | {"diskUuid"=>"52e5ec68-00f5-04d6-a776-f28238309453",      |
      |         |  "compositeUuid"=>"92559d56-1240-e692-08f3-a0369f56ddc0", 
      |         |  "capacityUsed"=>167772160,                               |
      |         |  "physCapacityUsed"=>167772160,                           | 
      |         |  "dedupUniquenessMetric"=>0,                              |
      |         |  "formatVersion"=>1}                                      |
      +---------+-----------------------------------------------------------+

    4. Note the diskUuid from the preceding output.
    5. Run the command: 

      vsan.cmmds_find 0 -t DISK -u <disk uuid>

      You will see an output similar to:

       > vsan.cmmds_find 0 -t DISK -u 52e5ec68-00f5-04d6-a776-f28238309453
         | Health  | Content                                               |
      ---+---------+-------------------------------------------------------+
         | Healthy | {"capacity"=>145303273472,                            |
         |         |  "iops"=>100,                                         |
         |         |  "iopsWritePenalty"=>10000000,                        |
         |         |  "throughput"=>200000000,                             |
         |         |  "throughputWritePenalty"=>0,                         |
         |         |  "latency"=>3400000,                                  |
         |         |  "latencyDeviation"=>0,                               |
         |         |  "reliabilityBase"=>10,                               |
         |         |  "reliabilityExponent"=>15,                           |
         |         |  "mtbf"=>1600000,                                     |
         |         |  "l2CacheCapacity"=>0,                                |  
         |         |  "l1CacheCapacity"=>16777216,                         |
         |         |  "isSsd"=>0,                                          |   
         |         |  "ssdUuid"=>"52bbb266-3a4e-f93a-9a2c-9a91c066a31e",   |
         |         |  "volumeName"=>"NA",                                  |
         |         |  "formatVersion"=>"3",                                |
         |         |  "devName"=>"naa.600508b1001c5c0b1ac1fac2ff96c2b2:2", | 
         |         |  "ssdCapacity"=>0,                                    |
         |         |  "rdtMuxGroup"=>80011761497760,                       |
         |         |  "maxComponents"=>47661,                              |
         |         |  "logicalCapacity"=>0,                                |
         |         |  "physDiskCapacity"=>0,                               |
         |         |  "dedupScope"=>0}                                     |
      ---+---------+-------------------------------------------------------+


    6. Note the naa id. In this case, naa.600508b1001c5c0b1ac1fac2ff96c2b2, which hosts the corrupt component.
    7. Repeat the steps a to f for all components in Invalid state and identify each of the disks impacted.

  4. After identifying the disk(s) that own the corrupt components, use the vSphere Web Client to safely destroy the disk(s).

    1. Open the vSphere Web Client.
    2. Navigate to vCenter Server > Datacenter > Cluster > Manage > Virtual SAN > Disk Management.
    3. Select the disk that owns the corrupt components.

      Note: If there are multiple disks that own corrupt components in the same disk group, it is easier to destroy the entire disk group if there is adequate free capacity in the vSAN cluster.

    4. Destroy the disk or disk group.

      Warning
      : Do not use the No data migration option.

      VMware recommends to use the Full data migration option to ensure data integrity and availability. If there is inadequate free capacity in the vSAN cluster to accommodate full migration, use the Ensure accessibility option. 

      For more information, see the instructions in the Remove Disk Groups or Devices from Virtual SAN section in the Administering VMware Virtual SAN guide.

  5. Add the disk(s) back to the vSAN configuration. For more information, see the Add Devices to the Disk Group section in the Administering VMware Virtual SAN guide.

For information on using RVC, see:

Tags

Component metadata health

See Also

Update History

ESXi 6.0-P04 resolved this issue.

Attachments

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 15 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 15 Ratings
Actions
KB: