Changing the default repair delay time for a host failure in vSAN
search cancel

Changing the default repair delay time for a host failure in vSAN

book

Article ID: 327031

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article provide the steps to change the repair delay time in VMware vSAN (formerly known as Virtual SAN). This is the amount of time vSAN waits after a failure is detected on an ESXi host that is part of vSAN cluster, before repairing a disk component.

Environment

VMware vSAN 6.5.x
VMware vSAN 6.1.x
VMware vSAN 7.0.x
VMware vSAN 8.0.x
VMware vSAN 6.2.x
VMware vSAN 6.7.x
VMware vSAN 6.6.x

Cause

This VMware vSAN advanced setting specifies the amount of time vSAN waits before rebuilding a disk object after a host is either in a failed state or in Maintenance Mode. By default, the repair delay value is set to 60 minutes; this means that in the event of a host failure, vSAN waits 60 minutes before rebuilding any disk objects located on that particular host. This is because vSAN is not certain if the failure is transient or permanent.

Note: If a failure in a physical hardware component is detected, such as an Solid State Disk (SSD) or Magnetic Disk (MD), vSAN immediately responds by rebuilding a disk object.

Resolution

Note: The steps below are still valid for vSAN 6.x but an immediate repair can be triggered using the Repair objects immediately button in the vSAN Health plug-in, if desired.

To change the default repair delay time, modify the ESXi advanced option vsan.clomrepairdelay.

Note: The default 60 minutes is designed to cover a multitude of different configurations, setting the above option too aggressively can cause unnecessary resync operations to occur, when changing this advanced option consider these factors:
  • Installation of ESXi updates (if performing updates)
  • ESXi host boot time (Including Power On Self Test)
  • SSD Log recovery for vSAN

Note: Maximum value you can set in minutes for clom repair delay is 4294967295.

For vSAN 6.7 U1 and above

From vSAN 6.7 U1 introduces the option to change the "Clom Repair" value, from vCenter. So if  vCenter is available, we can navigate to Cluster object > Configure > vSAN > Services > Advanced Options:



Change Object Repair Timer Option to the desired value:


* You must wait for at least 180 seconds before proceeding with any Maintenance/Disk Group/Disk/reboot related activity after making above changes thru UI 

The clomd service does not need to be restarted via command line if using the GUI as with previous versions of vSAN.
 

For vSAN 6.7 GA and below

To change the repair delay time, run these steps on each ESXi host in the vSAN cluster:

  1. Open an SSH session to each ESXi host. For more information, see Using ESXi Shell in ESXi 5.x (2004746).
  2. Run this esxcli command to change the repair delay time:

    esxcli system settings advanced set -o /VSAN/ClomRepairDelay -i <value in minutes>

    Alternatively, you can use this esxcfg command:

    esxcfg-advcfg --set <Value in minutes> /VSAN/ClomRepairDelay

    Note: Setting the value of ClomRepairDelay very low can cause unnecessary copying of components in the event of a host reboot or temporary network outage that causes an ESXi host to become network partitioned.
     
  3. Restart the Cluster Level Object Manager (CLOM) service clomd to apply the changes by running this command:

    /etc/init.d/clomd restart

    Note: Restarting the clomd service briefly interrupts CLOM operations. The length of the outage should be less than one second. However, if a virtual machine is being provisioned at the time the clomd service is restarted, that provisioning task may fail.
     
  4. Apply steps 1 to 3 to each ESXi host in the vSAN cluster.

To change the repair delay time using the VMware vSphere Web Client, run these steps on each ESXi host in the vSAN cluster:

  1. Log in with admin credentials to the VMware vCenter Server using the vSphere Web Client.
  2. Select the vSAN Cluster and highlight the ESXi host > Manage > Settings.
  3. Select Advanced System Settings > VSAN.ClomRepairDelay.
  4. Click Edit.
  5. Modify VSAN.ClomRepairDelay value in minutes as required.
  6. Restart the Cluster Level Object Manager (CLOM) service clomd to apply the changes by running this command:

    /etc/init.d/clomd restart

    Note: Restarting the clomd service briefly interrupts CLOM operations. The length of the outage should be less than one second. However, if a virtual machine is being provisioned at the time the clomd service is restarted, that provisioning task may fail.
     
  7. Apply steps 1-6 to each ESXi host in the VSAN cluster.

Future Considerations

After the failure has been resolved, the setting should be reset to the default of 60 minutes. Please use the above steps as a reference.

 


Additional Information