Collecting VMFS metadata
To collect VMFS metadata:
- Log into the VMware ESX host by SSH or the local console. For more information, see Connecting to an ESX host using a SSH client (1019852).
- Identify the LUN containing the corrupted VMFS volume.
- Make note of the Universally Unique Identifier (UUID) for the respective VMFS datastore using the command:
[root@esxhost ~]# ls -l /vmfs/volumes
The output appears similar to:
total 1040
drwxr-xr-t 1 root root 2380 Jul 27 12:28 4b96afb0-b2474ede-fc0b-001aa004abc2
drwxr-xr-t 1 root root 2380 Jul 27 12:28 4a2e936d-0a1823ae-23cf-000d6084dcb0
drwxr-xr-t 1 root root 2380 Jul 27 12:28 4a3fb589-edfb575c-7011-000d6084dcb0
lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage1 -> 4b96afb0-b2474ede-fc0b-001aa004abc2
lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage2 -> 4a2e936d-0a1823ae-23cf-000d6084dcb0
lrwxr-xr-x 1 root root 35 Aug 6 11:31 VMFS-Storage3 -> 4a3fb589-edfb575c-7011-000d6084dcb0
- Locate the device identifier associated to the UUID.
Note: Make a note of the path referenced (highlighted with green), as this is required later.
- For ESX/ESXi 3.5 and earlier, run this command:
[root@esxhost ~]# esxcfg-vmhbadevs -m
For ESX, the output appears similar to:
vmhba1:0:0:1 /dev/sda1 4b96afb0-b2474ede-fc0b-001aa004abc2
vmhba1:0:1:1 /dev/sdb1 4a2e936d-0a1823ae-23cf-000d6084dcb0
vmhba1:0:2:1 /dev/sdc1 4a3fb589-edfb575c-7011-000d6084dcb0
Note: Marked in green above, the /dev/sda1
console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2
or VMFS-Storage1
.
For ESXi, the output appears similar to:
vmhba1:0:0:1 /vmfs/devices/disks/vmhba1:0:0:1 4b96afb0-b2474ede-fc0b-001aa004abc2
vmhba1:0:1:1 /vmfs/devices/disks/vmhba1:0:1:1 4a2e936d-0a1823ae-23cf-000d6084dcb0
vmhba1:0:2:1 /vmfs/devices/disks/vmhba1:0:2:1 4a3fb589-edfb575c-7011-000d6084dcb0
Note: Marked in green above, the /vmfs/devices/disks/vmhba1:0:0:1 console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2 or VMFS-Storage1.
- For ESX/ESXi 4.0 and above, run this command:
[root@esxhost ~]# esxcfg-scsidevs -m
For ESX, the output appears similar to:
naa.6001c230d8abfe000ff72d38073cdf11:1 /dev/sda1 4b96afb0-b2474ede-fc0b-001aa004abc2 0 VMFS-Storage1
naa.6001c230d8abfe000ff76c51486715db:1 /dev/sdb1 4a2e936d-0a1823ae-23cf-000d6084dcb0 0 VMFS-Storage2
naa.6001c230d8abfe000ff76c198ddbc13e:1 /dev/sdc1 4a3fb589-edfb575c-7011-000d6084dcb0 0 VMFS-Storage
Note: Marked in green above, the /dev/sda1
console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2
or VMFS-Storage1
.
For ESXi, the output appears similar to:
naa.6001c230d8abfe000ff72d38073cdf11:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1 4b96afb0-b2474ede-fc0b-001aa004abc2 0 VMFS-Storage1
naa.6001c230d8abfe000ff76c51486715db:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L1:1 4a2e936d-0a1823ae-23cf-000d6084dcb0 0 VMFS-Storage2
naa.6001c230d8abfe000ff76c198ddbc13e:1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L2:1 4a3fb589-edfb575c-7011-000d6084dcb0 0 VMFS-Storage
Note: Marked in green above, the /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:1
console device contains 4b96afb0-b2474ede-fc0b-001aa004abc2
or VMFS-Storage1
.
esxcfg-scsidevs -m command.
Note: Replace the path from /tmp to one of the vmfs volume in the examples where you have to gather 1.2 G or 1.5 G chunk from the corrupted volume.
# dd if=/vmfs/devices/disks/naa.ID of=/vmfs/volumes/datastore/naaID.bin bs=1M count=1500
Note: VMware ESXi servers without large amounts of /tmp
space need to utilize another directory or datastore.
-
Dump the first 20 MB, 30 MB, 1.2 GB, or 1.5 GB (for VMFS-5) of the identified device (highlighted with green) from step 4.
Note: The amount of data that needs to be dumped depends on the type of corruption encountered or suspected.
- If the LVM header or partition table is suspected, then dump 20 MB of the volume
- If VMFS header corruption is suspected or HeartBeat slots, then 30 MB is required.
- If the File Descriptors are suspected, 1.2 GB is required (this is 1.5 GB for VMFS-5 on ESXi 5.x).
- If the corruption or problem area is not understood yet, dump 1.2 GB of information (this is 1.5 GB for VMFS-5 on ESXi 5.x).
For example, on ESX/ESXi run the appropriate command to collect 1.2 gb:
On ESX 3.5:
~ # dd if=/dev/sda of=/tmp/sda.bin bs=1M count=1200 conv=notrunc
On ESXi 3.5:
~ # dd if=/vmfs/devices/disks/vmhba1:0:0 of=/tmp/vmhba100.bin bs=1M count=1200 conv=notrunc
On ESX 4.x:
~ # dd if=/dev/sda of=/tmp/vmhba100.bin bs=1M count=1200 conv=notrunc
On ESXi 4.x/5.x:
~ # dd if=/vmfs/devices/disks/mpx.vmhba1:C0:T0:L0 of=/tmp/vmhba1000.bin bs=1M count=1200 conv=notrunc
Note: Change the value for count to change the amount of data collected.
Note: You can also use the NAA ID for the if= value in ESX/ESXi 4.x & 5.x:
# dd if=/vmfs/devices/disks/naa.6001c230d8abfe000ff72d38073cdf11 of=/tmp/vmhba1000.bin bs=1M count=1200 conv=notrunc
The output appears similar to:
1200+0 records in
1200+0 records out
Note: The 1, or partition number, from /dev/sda1
is omitted in the interest of having dd
collect data from the beginning of the disk device, as opposed to the beginning of the specified partition. This allows for the collection of the device's partition table, as well as partition contents.
- The green value (Input File) corresponds with the troubled volume's disk identifier, such as /dev/sda in the above examples.
- The blue value (Output File) specifies where the dumped data should be written to. In this example, it is into a yet nonexistent file in
/tmp
named sda.bin
. - The
bs
(Block Size) value defines how much data to collect in a block or moment. - The
count
value indicates how many blocks or data samples to collect. Multiplying the two together will define how large the dump will be.
Warning: Take extra care specifying the of
parameter, as dd
has the ability to overwrite the specified file or specified value.
- Create a md5sum on the dump file to verify the size of the file. For more/related information, see Using MD5sum to verify the integrity of copied files (1003259)
# md5sum /tmp/vmhba1000.bin > /tmp/vmhba1000.bin.md5
- Compress the collected dump and md5sum files for submission to VMware Technical Support. Run this command:
# tar czvf /tmp/sda.tgz /tmp/vmhba1000.bin /tmp/vmhba1000.bin.md5
Note: This command compresses the collected dump file in /tmp
to the archive sda.tgz
in /tmp
.
- Submit the resulting archive file to VMware Technical Support.
Note: Retain the archive files for backup purposes until the Support Request has been resolved.
Applying repaired VMFS metadata
Steps for performing this process are not supplied in this article. For more information, contact VMware Technical Support to correctly graft repaired dump data to your existing datastore or LUN.
Note: Backups or snapshots of the LUN, if possible, are also highly recommended prior to a graft's application, so you may take steps to ready this process while awaiting a response from VMware.
Note: This not applicable to VMFS6 due to the architectural changes.
Warning: While VMware Technical Support will provide best-effort assistance with repairs, VMware cannot guarantee successful repairs of VMFS datastores. It is possible that there is further data corruption outside of critical VMFS metadata regions that cannot be corrected. Corruption issues have frequently been root-caused to exceptional conditions taking place at the SAN, requiring involvement of the SAN or hardware vendor to identify and apply preventative measures. Restoration from backups or leveraging data recovery services are required if repairs are not successful.