Collecting and applying raw metadata dumps on VMFS volumes using DD (Data Description)

search cancel

Collecting and applying raw metadata dumps on VMFS volumes using DD (Data Description)

book

Article ID: 314377

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides steps for collecting VMFS metadata dumps and applying repaired or corrected metadata grafts to a VMFS datastore.

Warning: Perform these steps when requested by VMware Technical Support, and ONLY after dismounting the datastore from all other hosts in the cluster. There can be no I/O against the volume when the metadata is to be fixed by overlay.

Environment

VMware ESXi 3.5.x Installable
VMware ESX 4.0.x
VMware ESXi 4.0.x Embedded
VMware vSphere ESXi 6.5
VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.1
VMware ESXi 4.1.x Installable
VMware ESX Server 3.0.x
VMware ESX Server 3.5.x
VMware ESXi 4.1.x Embedded
VMware ESXi 4.0.x Installable
VMware ESXi 3.5.x Embedded
VMware ESX 4.1.x
VMware vSphere ESXi 6.0
VMware vSphere ESXi 5.0

Resolution

Collecting VMFS metadata

To collect VMFS metadata:

Log into the VMware ESX host by SSH or the local console. For more information, see Connecting to an ESX host using a SSH client (1019852).
Identify the LUN containing the corrupted VMFS volume.
Make note of the Universally Unique Identifier (UUID) for the respective VMFS datastore using the command:

[root@esxhost ~]# ls -l /vmfs/volumes

The output appears similar to:

total 1040 drwxr-xr-t 1 root root 2380 Jul 27 12:28 4b96afb0-b2474ede-fc0b-001aa004abc2 drwxr-xr-t 1 root root 2380 Jul 27 12:28 4a2e936d-0a1823ae-23cf-000d6084dcb0 drwxr-xr-t 1 root root 2380 Jul 27 12:28 4a3fb589-edfb575c-7011-000d6084dcb0 lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage1 -> 4b96afb0-b2474ede-fc0b-001aa004abc2 lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage2 -> 4a2e936d-0a1823ae-23cf-000d6084dcb0 lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage3 -> 4a3fb589-edfb575c-7011-000d6084dcb0
Locate the device identifier associated to the UUID.

Note: Make a note of the path referenced (highlighted with green), as this is required later.
- For ESX/ESXi 3.5 and earlier, run this command:
  
  [root@esxhost ~]# esxcfg-vmhbadevs -m
  
  For ESX, the output appears similar to:
  
  vmhba1:0:0:1 /dev/sda1 4b96afb0-b2474ede-fc0b-001aa004abc2 vmhba1:0:1:1 /dev/sdb1 4a2e936d-0a1823ae-23cf-000d6084dcb0 vmhba1:0:2:1 /dev/sdc1 4a3fb589-edfb575c-7011-000d6084dcb0
  
  Note: Marked in green above, the /dev/sda1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.
  
  For ESXi, the output appears similar to:
  
  vmhba1:0:0:1 /vmfs/devices/disks/vmhba1:0:0:1 4b96afb0-b2474ede-fc0b-001aa004abc2 vmhba1:0:1:1 /vmfs/devices/disks/vmhba1:0:1:1 4a2e936d-0a1823ae-23cf-000d6084dcb0 vmhba1:0:2:1 /vmfs/devices/disks/vmhba1:0:2:1 4a3fb589-edfb575c-7011-000d6084dcb0
  
  Note: Marked in green above, the /vmfs/devices/disks/vmhba1:0:0:1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.
- For ESX/ESXi 4.0 and above, run this command:
  
  [root@esxhost ~]# esxcfg-scsidevs -m
  
  For ESX, the output appears similar to:
  
  naa.6001c230d8abfe000ff72d38073cdf11:1 /dev/sda1 4b96afb0-b2474ede-fc0b-001aa004abc2 0 VMFS-Storage1 naa.6001c230d8abfe000ff76c51486715db:1 /dev/sdb1 4a2e936d-0a1823ae-23cf-000d6084dcb0 0 VMFS-Storage2 naa.6001c230d8abfe000ff76c198ddbc13e:1 /dev/sdc1 4a3fb589-edfb575c-7011-000d6084dcb0 0 VMFS-Storage
  
  Note: Marked in green above, the /dev/sda1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.
  
  For ESXi, the output appears similar to:
  
  naa.6001c230d8abfe000ff72d38073cdf11:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1 4b96afb0-b2474ede-fc0b-001aa004abc2 0 VMFS-Storage1 naa.6001c230d8abfe000ff76c51486715db:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L1:1 4a2e936d-0a1823ae-23cf-000d6084dcb0 0 VMFS-Storage2 naa.6001c230d8abfe000ff76c198ddbc13e:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L2:1 4a3fb589-edfb575c-7011-000d6084dcb0 0 VMFS-Storage
  
  Note: Marked in green above, the /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.

esxcfg-scsidevs -m command.

Note: Replace the path from /tmp to one of the vmfs volume in the examples where you have to gather 1.2 G or 1.5 G chunk from the corrupted volume.

# dd if=/vmfs/devices/disks/naa.ID of=/vmfs/volumes/datastore/naaID.bin bs=1M count=1500

Note: VMware ESXi servers without large amounts of /tmp space need to utilize another directory or datastore.
Dump the first 20 MB, 30 MB, 1.2 GB, or 1.5 GB (for VMFS-5) of the identified device (highlighted with green) from step 4.

Note: The amount of data that needs to be dumped depends on the type of corruption encountered or suspected.
- If the LVM header or partition table is suspected, then dump 20 MB of the volume
- If VMFS header corruption is suspected or HeartBeat slots, then 30 MB is required.
- If the File Descriptors are suspected, 1.2 GB is required (this is 1.5 GB for VMFS-5 on ESXi 5.x).
- If the corruption or problem area is not understood yet, dump 1.2 GB of information (this is 1.5 GB for VMFS-5 on ESXi 5.x).
  
  For example, on ESX/ESXi run the appropriate command to collect 1.2 gb:
  
  On ESX 3.5:
  ~ # dd if=/dev/sda of=/tmp/sda.bin bs=1M count=1200 conv=notrunc
  
  On ESXi 3.5:
  ~ # dd if=/vmfs/devices/disks/vmhba1:0:0 of=/tmp/vmhba100.bin bs=1M count=1200 conv=notrunc
  
  On ESX 4.x:
  ~ # dd if=/dev/sda of=/tmp/vmhba100.bin bs=1M count=1200 conv=notrunc
  
  On ESXi 4.x/5.x:
  ~ # dd if=/vmfs/devices/disks/mpx.vmhba1:C0:T0:L0 of=/tmp/vmhba1000.bin bs=1M count=1200 conv=notrunc
  
  Note: Change the value for count to change the amount of data collected.
  
  Note: You can also use the NAA ID for the if= value in ESX/ESXi 4.x & 5.x:
  
  # dd if=/vmfs/devices/disks/naa.6001c230d8abfe000ff72d38073cdf11 of=/tmp/vmhba1000.bin bs=1M count=1200 conv=notrunc
  
  The output appears similar to:
  
  1200+0 records in 1200+0 records out
  
  Note: The 1, or partition number, from /dev/sda1 is omitted in the interest of having dd collect data from the beginning of the disk device, as opposed to the beginning of the specified partition. This allows for the collection of the device's partition table, as well as partition contents.
- The green value (Input File) corresponds with the troubled volume's disk identifier, such as /dev/sda in the above examples.
- The blue value (Output File) specifies where the dumped data should be written to. In this example, it is into a yet nonexistent file in /tmp named sda.bin.
- The bs (Block Size) value defines how much data to collect in a block or moment.
- The count value indicates how many blocks or data samples to collect. Multiplying the two together will define how large the dump will be.
Warning: Take extra care specifying the of parameter, as dd has the ability to overwrite the specified file or specified value.
Create a md5sum on the dump file to verify the size of the file. For more/related information, see Using MD5sum to verify the integrity of copied files (1003259)

# md5sum /tmp/vmhba1000.bin > /tmp/vmhba1000.bin.md5
Compress the collected dump and md5sum files for submission to VMware Technical Support. Run this command:

# tar czvf /tmp/sda.tgz /tmp/vmhba1000.bin /tmp/vmhba1000.bin.md5

Note: This command compresses the collected dump file in /tmp to the archive sda.tgz in /tmp.
Submit the resulting archive file to VMware Technical Support.

Note: Retain the archive files for backup purposes until the Support Request has been resolved.

Applying repaired VMFS metadata

Steps for performing this process are not supplied in this article. For more information, contact VMware Technical Support to correctly graft repaired dump data to your existing datastore or LUN.

Note: Backups or snapshots of the LUN, if possible, are also highly recommended prior to a graft's application, so you may take steps to ready this process while awaiting a response from VMware.

Note: This not applicable to VMFS6 due to the architectural changes.

Warning: While VMware Technical Support will provide best-effort assistance with repairs, VMware cannot guarantee successful repairs of VMFS datastores. It is possible that there is further data corruption outside of critical VMFS metadata regions that cannot be corrected. Corruption issues have frequently been root-caused to exceptional conditions taking place at the SAN, requiring involvement of the SAN or hardware vendor to identify and apply preventative measures. Restoration from backups or leveraging data recovery services are required if repairs are not successful.

Additional Information

For more information about filing a support request, see:

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box Using MD5sum to verify the integrity of copied files
Uploading diagnostic information for VMware
Connecting to an ESX host using an SSH client
How to file a Support Request in Customer Connect
DD（データ記述）を使用した VMFS ボリュームでの RAW メタデータダンプの収集と適用
使用 DD（Data Description，数据描述）对 VMFS 卷执行原始元数据转储收集和应用操作

Feedback

thumb_up Yes

thumb_down No