Search the VMware Knowledge Base (KB)
View by Article ID

Collecting and applying raw metadata dumps on VMFS volumes using DD (Data Description) (1020645)

  • 15 Ratings

Purpose

This article provides steps for collecting VMFS metadata dumps and applying repaired or corrected metadata grafts to a VMFS datastore.

Warning: Perform these steps when requested by VMware Technical Support, and ONLY after dismounting the datastore from all other hosts in the cluster. There can be no I/O against the volume when the metadata is to be fixed by overlay.



Resolution

Collecting VMFS metadata

To collect VMFS metadata:

  1. Log into the VMware ESX host by SSH or the local console. For more information, see Connecting to an ESX host using a SSH client (1019852).
  2. Identify the LUN containing the corrupted VMFS volume.
  3. Make note of the Universally Unique Identifier (UUID) for the respective VMFS datastore using the command:

    [root@esxhost ~]# ls -l /vmfs/volumes


    The output appears similar to:

    total 1040
    drwxr-xr-t 1 root root 2380 Jul 27 12:28 4b96afb0-b2474ede-fc0b-001aa004abc2
    drwxr-xr-t 1 root root 2380 Jul 27 12:28 4a2e936d-0a1823ae-23cf-000d6084dcb0
    drwxr-xr-t 1 root root 2380 Jul 27 12:28 4a3fb589-edfb575c-7011-000d6084dcb0
    lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage1 -> 4b96afb0-b2474ede-fc0b-001aa004abc2
    lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage2 -> 4a2e936d-0a1823ae-23cf-000d6084dcb0
    lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage3 -> 4a3fb589-edfb575c-7011-000d6084dcb0

  4. Locate the device identifier associated to the UUID.

    Note: Make a note of the path referenced (highlighted with green), as this is required later.

    • For ESX/ESXi 3.5 and earlier, run this command:

      [root@esxhost ~]# esxcfg-vmhbadevs -m

      For ESX, the output appears similar to:

      vmhba1:0:0:1 /dev/sda1 4b96afb0-b2474ede-fc0b-001aa004abc2
      vmhba1:0:1:1 /dev/sdb1 4a2e936d-0a1823ae-23cf-000d6084dcb0
      vmhba1:0:2:1 /dev/sdc1 4a3fb589-edfb575c-7011-000d6084dcb0


      Note: Marked in green above, the /dev/sda1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.

      For ESXi, the output appears similar to:

      vmhba1:0:0:1 /vmfs/devices/disks/vmhba1:0:0:1 4b96afb0-b2474ede-fc0b-001aa004abc2
      vmhba1:0:1:1 /vmfs/devices/disks/vmhba1:0:1:1 4a2e936d-0a1823ae-23cf-000d6084dcb0
      vmhba1:0:2:1 /vmfs/devices/disks/vmhba1:0:2:1 4a3fb589-edfb575c-7011-000d6084dcb0


      Note: Marked in green above, the /vmfs/devices/disks/vmhba1:0:0:1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.
       
    • For ESX/ESXi 4.0 and above, run this command:

      [root@esxhost ~]# esxcfg-scsidevs -m

      For ESX, the output appears similar to:

      naa.6001c230d8abfe000ff72d38073cdf11:1 /dev/sda1 4b96afb0-b2474ede-fc0b-001aa004abc2 0 VMFS-Storage1
      naa.6001c230d8abfe000ff76c51486715db:1 /dev/sdb1 4a2e936d-0a1823ae-23cf-000d6084dcb0 0 VMFS-Storage2
      naa.6001c230d8abfe000ff76c198ddbc13e:1 /dev/sdc1 4a3fb589-edfb575c-7011-000d6084dcb0 0 VMFS-Storage


      Note: Marked in green above, the /dev/sda1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or  VMFS-Storage1.

      For ESXi, the output appears similar to:

      naa.6001c230d8abfe000ff72d38073cdf11:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1 4b96afb0-b2474ede-fc0b-001aa004abc2 0 VMFS-Storage1
      naa.6001c230d8abfe000ff76c51486715db:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L1:1 4a2e936d-0a1823ae-23cf-000d6084dcb0 0 VMFS-Storage2
      naa.6001c230d8abfe000ff76c198ddbc13e:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L2:1 4a3fb589-edfb575c-7011-000d6084dcb0 0 VMFS-Storage


      Note: Marked in green above, the /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.
Note: If you are using a SSD the output will be slightly different when running the esxcfg-scsidevs -m command. 
For Example the out put will be similar to the below:

t10.ATA_____OCZ2DAGILITY3______OCZ2D07E7HGR1N91AG5R2:3 /vmfs/devices/disks/t10.ATA_____OCZ2DAGILITY3______OCZ2D07E7HGR1N91AG5R2:3 4f69e9e4-e68d5622-de77-001517e8fae9 0 datastore1
  1. Verify that you have sufficient space for a dump to be completed. Run this command:

    Note: This example uses the /tmp directory (located in local storage on the ESX host).

    [root@bs-tse-d14 ~]# df -h /tmp

    The output appears similar to:

    Filesystem Size Used Avail Use% Mounted on
    /dev/sde5 4.9G 2.7G 2.0G 57% /

Note: Replace the path from /tmp to one of the vmfs volume in the examples where you have to gather 1.2 G or 1.5 G chunk from the corrupted volume.

# dd if=/vmfs/devices/disks/naa.ID of=/vmfs/volumes/datastore/naaID.bin bs=1M count=1500


  1. Note: VMware ESXi servers without large amounts of /tmp space need to utilize another directory or datastore.

  2. Dump the first 20 MB, 30 MB, 1.2 GB, or 1.5 GB (for VMFS-5) of the identified device (highlighted with green) from step 4.

    Note: The amount of data that needs to be dumped depends on the type of corruption encountered or suspected.

    • If the LVM header or partition table is suspected, then dump 20 MB of the volume
    • If VMFS header corruption is suspected or HeartBeat slots, then 30 MB is required.
    • If the File Descriptors are suspected, 1.2 GB is required (this is 1.5 GB for VMFS-5 on ESXi 5.x).
    • If the corruption or problem area is not understood yet, dump 1.2 GB of information (this is 1.5 GB for VMFS-5 on ESXi 5.x).

      For example, on ESX/ESXi run the appropriate command to collect 1.2 gb:

      On ESX 3.5:
          ~ # dd if=/dev/sda of=/tmp/sda.bin bs=1M count=1200 conv=notrunc

      On ESXi 3.5:
          ~ # dd if=/vmfs/devices/disks/vmhba1:0:0 of=/tmp/vmhba100.bin bs=1M count=1200 conv=notrunc

      On ESX 4.x:
          ~ # dd if=/dev/sda of=/tmp/vmhba100.bin bs=1M count=1200 conv=notrunc

      On ESXi 4.x/5.x:
          ~ # dd if=/vmfs/devices/disks/mpx.vmhba1:C0:T0:L0 of=/tmp/vmhba1000.bin bs=1M count=1200 conv=notrunc

      Note: Change the value for count to change the amount of data collected.

      Note: You can also use the NAA ID for the if= value in ESX/ESXi 4.x & 5.x:

      # dd if=/vmfs/devices/disks/naa.6001c230d8abfe000ff72d38073cdf11 of=/tmp/vmhba1000.bin bs=1M count=1200 conv=notrunc

      The output appears similar to:

      1200+0 records in
      1200+0 records out


      Note: The 1, or partition number, from /dev/sda1 is omitted in the interest of having dd collect data from the beginning of the disk device, as opposed to the beginning of the specified partition. This allows for the collection of the device's partition table, as well as partition contents.

    • The green value (Input File) corresponds with the troubled volume's disk identifier, such as /dev/sda in the above examples.
    • The blue value (Output File) specifies where the dumped data should be written to. In this example, it is into a yet nonexistent file in /tmp named sda.bin.
    • The bs (Block Size) value defines how much data to collect in a block or moment.
    • The count value indicates how many blocks or data samples to collect. Multiplying the two together will define how large the dump will be.

    Warning: Take extra care specifying the of parameter, as dd has the ability to overwrite the specified file or specified value.

  3. Create a md5sum on the dump file to verify the size of the file. For more/related information, see Using MD5sum to verify the integrity of copied files (1003259)

    # md5sum /tmp/vmhba1000.bin > /tmp/vmhba1000.bin.md5

  4. Compress the collected dump and md5sum files for submission to VMware Technical Support. Run this command:

    # tar czvf /tmp/sda.tgz /tmp/vmhba1000.bin /tmp/vmhba1000.bin.md5

    Note: This command compresses the collected dump file in /tmp to the archive sda.tgz in /tmp.

  5. Submit the resulting archive file to VMware Technical Support.

    Note: Retain the archive files for backup purposes until the Support Request has been resolved.

Applying repaired VMFS metadata

Steps for performing this process are not supplied in this article. For more information, contact VMware Technical Support to correctly graft repaired dump data to your existing datastore or LUN.

Note: Backups or snapshots of the LUN, if possible, are also highly recommended prior to a graft's application, so you may take steps to ready this process while awaiting a response from VMware.

Warning
: While VMware Technical Support will provide best-effort assistance with repairs, VMware cannot guarantee successful repairs of VMFS datastores. It is possible that there is further data corruption outside of critical VMFS metadata regions that cannot be corrected. Corruption issues have frequently been root-caused to exceptional conditions taking place at the SAN, requiring involvement of the SAN or hardware vendor to identify and apply preventative measures. Restoration from backups or leveraging data recovery services are required if repairs are not successful.

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 15 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 15 Ratings
Actions
KB: