Symptoms:
There are multiple scenarios around hardware failure modes and a few workflows in vSAN which could cause Resync/repair to ensure VM accessibility.
Typical scenarios and workflows are:
• One or more node or disk failures
• Node or disk evacuation
• VM storage policy reconfiguration
• Cluster rebalancing in case disks are greater than 80% full
• Upgrade scenarios like disk format upgrade and enabling deduplication and compression
vSAN is using a congestion algorithm that first delays resync traffic before VM IO traffic is also delayed. However, VM IO might still be impacted in the following cases:
- If VM I/O is low compared to resync, VM I/O could become starved by the resync traffic and incur delay.
- If VM I/O and resync traffic are high, then the congestion algorithm would first impact resyncs, but this might not be enough to improve destaging at LSOM at which point additional build-up of VM IO could kick congestion for VM traffic causing latency increase in the VMs.