Search the VMware Knowledge Base (KB)
View by Article ID

Destage process can result in poor performance in vSAN deduplication environments (2149066)

  • 0 Ratings

Symptoms

You might experience these multiple symptoms when running this specific build of vSAN 6.2 (ESXi 6.0, Patch Release ESXi600-201611001) or vSAN 6.5 GA when the vSAN deduplication feature is enabled.

If encountered, you observe multiple symptoms:
  • High congestion.
  • The vSAN Health Service or vCenter alarms may report high congestion on one or more nodes in the vSAN cluster.
  • Resync progresses very slowly.
  • Resync data, if any, may proceed very slowly for an extended period of time or appear to stall.
  • Hosts may enter a Not responding state in vCenter Server
  • Some virtual machines exhibit poor I/O performance.
  • vSAN observer, if available, shows very high ZeroDrained activity when examining the full SSD graphs from the vSAN Disks (Deep Dive) screen in vSAN Observer:

    If you have encountered this issue and Observer is available, you see that the PLOG elev bytes MB graph shows consistently high (or a high plateau of ZeroDrained activity, with effectively no activity of any other kind:


    For more information about vSAN observer, see How to use and interpret performance statistics collected using vSAN Observer (2064240).

  • Deleted space is not being returned to the diskgroup in a timely manner when deduplication is in use.

Purpose

vSAN 6.2 deployments running on ESXi 6.0, Patch ESXi600-201611001 are exposed to an issue that can cause I/O slowdown due to accumulation of zeroes in the cache tier.

Cause

In vSAN 6.2 P04 (ESXi 6.0, Patch ESXi600-201611001), zeroed data can accumulate in the cache tier of the vSAN disk group due to delayed writes. Typically, this is associated with accumulated delete activity over time. When a component is deleted, it is zeroed. These zeroes compress well and so do not count against data totals in the cache tier device and will not trigger a data destage operation. When a destage operation does occur (most likely due to resync activity or rapid increase in space usage in the cache tier), these accumulated zeroes must be processed. Processing the zero data can crowd out the processing of non-zero data, potentially resulting in increased congestion levels and I/O slowdown. This can have negative consequences on VM I/O handling and host management.

Resolution

This issue is resolved in ESXi 6.0 Update 3, available at VMware Downloads.

This is a known issue in vSAN 6.5 and resolved in ESXi 6.5.0(vSAN 6.6) VMware Downloads

Workaround details

If this issue has already been encountered and production is impacted, you must wait for the destage process to complete. After completion, this problem will self-resolve.

If this issue is encountered and additional maintenance is planned or expected, you can take preventative measures to prevent a recurrence of this issue by running the destage elevator process manually.

Note: Running the destage elevator process manually is not known to have any impact to production I/O. If you determine that performance is being affected, stop the elevator process by executing step 4 immediately, and plan a suitable time to rerun this action plan during low-I/O periods (for example, outside of normal business hours).

To run the destage elevator process manually:
  1. Log in to each host using SSH or KVM/physical console.
  2. On each host, start the destage elevator process manually:
    # esxcfg-advcfg -s 1 /LSOM/plogRunElevator

  3. Wait 60 minutes for the destage elevator process to run.
  4. On each host, stop the destage elevator process manually:
    # esxcfg-advcfg -s 0 /LSOM/plogRunElevator

  5. Perform maintenance as required.

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 0 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 0 Ratings
Actions
KB: