vSAN deduplication and compression hosts upgraded to vSAN 8.0 Update1 may experience reoccurring PSOD under certain conditions
search cancel

vSAN deduplication and compression hosts upgraded to vSAN 8.0 Update1 may experience reoccurring PSOD under certain conditions

book

Article ID: 317858

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article outlines a known vSAN issue that can result in ESXi host PSOD.

Symptoms:

vSAN hosts with deduplication and compression might fail after upgrade to ESXi 8.0 Update 1, with a backtrace message as follows:

Panic Details: Crash at 2023-05-01T11:31:12.957Z on CPU 51 running world 2019239 - VSAN_0x450080017780_PLOG. VMK Uptime:0:09:54:11.108

Panic Message: @BlueScreen: Failed at bora/modules/vmkernel/plog/dedup/dedup_util.h:231 -- VMK_ASSERT(blkNum < DDPGetEndBlkNumber(&ddCtx->devices[devIndex].sb, blkType, blkNumCheck, 1))


Environment

VMware vSAN 8.0.x

Cause

This issue occurs only on vSAN all-flash hosts with deduplication and compression enabled and only when the following conditions are met:
  1. You have an existing disk group created on a vSAN version earlier than vSAN 7.0 Update 3.
  2. You have added new capacity disks to such a disk group.
  3. You upgrade the vSAN host to ESXi 8.0 Update 1.The issue does not impact vSAN hosts with compression-only or checksum-only disk groups.
This PSOD may occur repeatedly but typically will not prevent a normal reboot of the host.

Issue Detection Tool
Attached to this KB is a script that helps determine if a vSAN cluster is exposed to this issue or not.
Customers planning to upgrade their vSAN clusters to vSAN 8.0U1 are requested to use this script to determine their exposure to this issue.

Two files are available for download in this KB:
  1. README-Issue-Detection-Tool-kb-92458.pdf
  2. Issue-detection-tool-kb-92458.zip

Resolution

The issue is fixed in ESXi 8.0 Update 1a release.

Workaround:
On the affected host:
  1. Reboot the affected host in the PSOD state.
  2. After the boot is successful, disable the advanced configuration option "lsomDedupMetadataScanEnabled" on the host via SSH with the following command:
esxcfg-advcfg -s 0 /LSOM/lsomDedupMetadataScanEnabled
  1. Run "backup.sh" to save the configuration change:
/sbin/auto-backup.sh
  1. Reboot the host so the setting change is applied.

After the workaround is applied on the affected host, please execute steps 2 through 4 on all remaining hosts in the vSAN cluster.

If the host experiences the same PSOD during reboot, please open a Support Request with VMware Global Support and select vSAN as the product.

Additional Information

Impact/Risks:
Host PSOD will result in VMs running on the impacted host crashing and being restarted by vSphere HA (if enabled).

Multiple concurrent PSODs can result in data being inaccessible until the hosts have been restarted and rejoined the vSAN cluster.

Attachments

Issue-detection-tool-kb-92458 get_app
README-Issue-Detection-Tool-kb-92458 get_app