Search the VMware Knowledge Base (KB)
View by Article ID

vSAN is experiencing congestion in one or more disk groups (2150012)

  • 0 Ratings

Symptoms

In the vSAN performance diagnostics, you see this error:

vSAN is experiencing congestion in one or more disk groups.

Resolution

This error indicates that one or more disk groups are experiencing congestion. The disk group(s) which are experiencing congestion are listed, along with the type of congestion.

Note: Regardless of the type of congestion, temporary values of congestion are usually ok, and are not detrimental to the system performance.

However, consistent values of congestion warrant attention and resolution. You must pay attention if:

(i)        The congestion is present across all diskgroups.
In this case, it is likely that the vSAN cluster backend is unable to handle the IO workload. If possible, tune the benchmark by turning off some VMs or reducing the number of outstanding IOs/ threads in each VM. Alternatively, you may need to resize the vSAN configuration in terms of the number of disk groups, and SSDs/HDDs in each disk group.

(ii)       One or two diskgroups have higher congestion than the others.
In the case, where congestion on one disk group is far more than the other disk groups in the system indicates an imbalance in write IO activity across the disk groups. If this happens consistently, try to increase the number of disk stripes in the storage policy, or try a proactive rebalance. 

There are 6 different types of congestion that may be raised.

This table lists the types of congestion and specific remedy for each congestion:

SL.No

Congestion

Remedy

1

SSD Congestion

In both the hybrid and All-flash vSAN cluster, data is first written to the write cache (also known as write buffer). A process known as de-staging moves the data from the write buffer to the hard disks. The write cache absorbs a high write rate, ensuring that the write performance does not get limited by slower performing hard disks. However, in cases where incoming IO write rate is much higher than the drain rate (rate at which de-staging moves the data from the write buffer to the hard disks), the write cache will no longer be able to sustain the incoming IO writes. SSD Congestion is reported in such cases when the write cache capacity in near full utilization, and immediate de staging is required to continue IO activity. 

2

Log Congestion

This congestion indicates that the outstanding write logs consumes significant space. A large volume of small 4K unaligned writes can lead to log congestion.

 

3

Memory Congestion

This congestion indicates high memory usage by the vSAN Local Log Structured Object Management (LSOM) layer. 

 

4

Slab Congestion

This congestion indicates high slab memory usage by the LSOM layer.

5

Component Congestion (Comp-Congestion):

This congestion indicates that there is a large volume of outstanding commit operations to some components. Typically, heavy volume of writes to a few VM Disks causes this congestion.

 

6

IOPS Congestion

This congestion attempts to balance the IOs on different components of a congested disk group. This congestion usually does not affect aggregate performance, and can be ignored.

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 0 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 0 Ratings
Actions
KB: