vSAN Health Service - Physical Disk Health – Congestion
search cancel

vSAN Health Service - Physical Disk Health – Congestion

book

Article ID: 326891

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article explains the Physical Disk Health - Congestion check in the vSAN Health Service and provides details on why it might report an error

Environment

VMware vSAN 6.0.x
VMware vSAN 8.0.x
VMware vSAN 7.0.x

Resolution

Q: What does the Physical Disk Health - Congestion check do?

Congestion in vSAN happens when the lower layers fail to keep up with the I/O rate of the higher layers. If this health status is not green (OK), vSAN is still using the disk, but it is in a state of reduced performance (potentially severely), manifesting in low throughput/IOPS and high latencies for vSAN objects using this disk group. Congestion in these cases will be applicable to all objects on the disk group.

Q: What does it mean when it is in an error state?

Typical reasons for congestion are bad or badly sized hardware, misbehaving storage controller firmware, bad controller drivers, a low queue depth on the controller, or some problems in the software. For example, if the flash cache device is not sized correctly, virtual machines performing a lot of write operations could fill up of write buffers on the flash cache device. These buffers needs to be destaged to magnetic disks in hybrid configurations. To facilitate the now very frequently occurring destaging operations, congestion might be used to slow down the writes from the virtual machine.
 
One common scenario is a high read cache miss rate, which can also lead to congestion and slow down virtual machine read I/O.
 
High congestion could be the root cause of virtual machine storage performance degradation, operation failures, or even ESXi hosts going unresponsive.
 
For more information about this issue, refer to the following articles:

Q: How does one troubleshoot and fix the error state?

Under high load, when vSAN is operating at its maximum performance, a low amount of congestion (typically under a value of 200) is expected and is not a cause of concern. However, any value of congestion above 0 combined with low throughput/IOPS is an indication of an issue. This health check will be green (OK) for congestion values below 200, yellow (warning) for values between 200 and 220, and red (alert) for values above 220. The maximum value for congestion is 255.

Note: The threshold value for earlier versions to 6.7 U1, would continue to be 32 (Yellow) and 64 (Red).
 
For more information on congestion, see vSAN memory or SSD congestion reached threshold limit (2071384). VMware recommends to engage VMware Support on congestion related issues to ensure identification of the root cause. For more information, see How to file a Support Request in Customer Connect (2006985).


Additional Information

For more information on collecting VMware vSAN logs, see Collecting vSAN support logs and uploading to VMware (2072796).

Also, see:
vSAN Health Service - Cluster Health - vSAN Health Service up-do-date
vSAN Health Service - Cluster Health - Advanced vSAN configuration in sync
vSAN Health Service - Network Health - Hosts disconnected from vCenter Server
vSAN Health Service - Network Health - Unexpected vSAN cluster members
vSAN Health Service - Network Health - vSAN Cluster Partition
vSAN Health Service - Network Health – Hosts with vSAN disabled
vSAN Health Service - Network Health - All hosts have a vSAN vmknic configured
vSAN Health Service - Network Health - All hosts have matching subnets
vSAN Health Service - Network Health - All hosts have matching multicast settings
vSAN Health Service - Network Health - Hosts small ping test (connectivity check) and Hosts large ping test (MTU check)
vSAN Health Service - Network Health - Hosts with connectivity issues
vSAN Health Service - Network Health – Multicast assessment based on other checks
vSAN Health Service - Data Health – vSAN Object Health
vSAN Health Service - Physical Disk Health - Metadata Health
vSAN Health Service - Physical Disk Health - Overall Disk Health
vSAN Health Service - Limits Health – Current Cluster Situation
vSAN Health Service - Limits Health – After one additional host failure
vSAN Health Service - Physical Disk Health - Disk Capacity
vSAN Health Service – Physical Disk Health – Software State Health
vSAN Health Service – Physical Disk Health – Component Metadata Health
vSAN Health Service - Physical Disk Health – Memory pools
vSAN Health Service - vSAN HCL Health - Controller Release Support
vSAN Health Service – vSAN HCL Health – Controller Driver
vSAN Health Service - vSAN HCL Health – vSAN HCL DB up-to-date
vSAN Health Service - vSAN HCL Health – SCSI Controller on vSAN HCL
vSAN Health Service - Cluster Health – CLOMD liveness check
vSAN Health Service - Cluster Health - vSAN Health service installation
vSAN Health Check Information
vSAN Health Service - Network Health - Active Multicast connectivity check
vSAN 健全性サービス - 物理ディスクの健全性 - 輻輳