To identify the specific device that caused the failure:
1. Log in to the applicable ESXi host via SSH or KVM/physical console.
2. List vSAN disks using this command:
# esxcli vsan storage list
3. You will see output like this for a failed disk
Unknown:
Device: Unknown
Display Name: Unknown
Is SSD: false
VSAN UUID: 52703402-bfd2-9261-c40c-16d93dce226a
VSAN Disk Group UUID:
VSAN Disk Group Name:
Used by this host: false
In CMMDS: false
On-disk format version: -1
Deduplication: false
Compression: false
Checksum:
Checksum OK: false
Is Capacity Tier: false
Encryption Metadata Checksum OK: true
Encryption: false
DiskKeyLoaded: false
Is Mounted: false
Creation Time: Unknown
4. You can also use the command vdq -iH to list the disk mappings on the host to find the failed disk. If the disk is listed as a UUID and not the disk identifier then vSAN has failed out the disk as seen below:
[root@esx01:~] vdq -iH
Mappings:
DiskMapping[0]:
SSD: naa.58ce38ee2031fec5
MD: naa.58ce38ee2019a7f9
MD: naa.58ce38ee201bbbd1
MD: naa.58ce38ee201b02a5
MD: naa.58ce38ee201b9d69
MD: naa.58ce38ee2019aaf5
MD: naa.58ce38ee2019a7e5
MD: 52703402-bfd2-9261-c40c-16d93dce226a
5. To identify the display name of the disk and if the failure is recent enough run the following command:
grep 52703402-bfd2-9261-c40c-16d93dce226a /var/log/vmkernel.log
you should see similar output as below:
2021-01-09T05:45:41.638Z cpu0:7053521)LSOM: LSOMLogDiskEvent:7509: Disk Event permanent error propagated for MD 52703402-bfd2-9261-c40c-16d93dce226a (naa.58ce38ee2063aad9:2)
Note: The Disk Group must be removed first with the option "No Data migration"
(as the Disk Group is effectively lost), then replace the failed disk and re-create the Disk Group.