Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

Advanced Configuration options for VMware High Availability in vSphere 5.0 and 5.1 (2033250)

Purpose

In the majority of environments, VMware High Availability (HA) default settings do not need to be changed. However, depending on your specific environment you may need to modify some HA options. This article describes the different configuration options available and how to apply them.

Resolution

Note: Not all configuration variables work in all versions of vCenter Server. As new variables are introduced in newer releases, they remain throughout later versions.

Applying a VMware HA customization

From VMware vSphere Web Client

  1. Log in to VMware vSphere Web Client
  2. Click Home > vCenter > Clusters.
  3. Under Object click on the cluster you want to modify
  4. Click Manage.
  5. Click vSphere HA.
  6. Click Edit.
  7. Click Advanced Options.
  8. Click Add and enter in Option and Value fields as appropriate (see below).
  9. Deselect Turn ON vSphere HA.
  10. Click OK.
  11. Wait for HA to unconfigure, then click Edit and check Turn ON vSphere HA.
  12. Click OK and wait for the cluster to reconfigure.
From VMware vSphere Client
  1. Log in to vCenter Server with vSphere Client as an administrator.
  2. Right-click the Cluster in the Inventory and select Edit Settings.
  3. Click VMware HA.
  4. Click the Advanced Options button.
  5. Enter Option and Value fields as appropriate (see below).
  6. Click OK.
  7. Click OK again.
  8. Wait for the Reconfigure Cluster task to complete and then right-click the Cluster again from the Inventory.
  9. Click Properties.
  10. Disable VMware HA and wait for the Reconfiguration Cluster task(s) to complete.
  11. Right-click the cluster and Enable VMware HA to have the settings take effect.

    Note: See below if Reconfiguration of the hosts is necessary.

There are three types of HA advanced options and each is set in a different way.

  • vCenter Server options (VC) -- these options are configured at the vCenter Server level and apply to all HA clusters unless overridden by cluster-specific options in cases where such options exist. If the vCenter Server options are configured using the vCenter Server options manager, a vCenter Server restart may not be required -- see the specific options for details. But if these options are configured by adding the option string to the vpxd.cfg file (as a child of the config/vpxd/das tag), a restart is required.
  • cluster options (cluster) -- these options are configured for an individual cluster and if they impact the behavior of the HA Agent (FDM), they apply to all instances of FDM in that cluster. These options are configured by using the HA cluster-level advanced options mechanism, either via the UI or the API. Options with names starting with "das.config." can also be applied using the "fdm options" mechanism below, but this is not recommended because the options should be equally applied to all FDM instances.
  • fdm options (fdm) -- these options are configured for an individual FDM instance on a host. They are configured by adding the option to the /etc/opt/vmware/fdm/fdm.cfg file of the host as a child of the config/fdm tag. Options set in this way will be lost when fdm is uninstalled (eg. if the host is removed from VC and then re-added) or if the host is managed by auto deploy and is rebooted.

Common Options

Version Name Description Reconfiguration Type of Option
5.0 das.allowNetworkX Allows you to specify the specific management networks used by HA, where X is a number between 0 and 9. E.g., if you set a value to ʺManagement Networkʺ, only the networks associated with port groups having this name will be used. Ensure that all hosts are configured with the named port group and the networks are compatible. Yes. Reconfigure HA on all hosts to have the specification take effect. Cluster
5.0 das.ignoreRedundantNetWarning HA will report a config issue on a host if the host is not configured with redundant networks for the networks used by HA. HA only uses management networks. Valid values are true/false. Set to true to suppress the config issue. False is assumed if the option is not set. Yes. Reconfigure HA on a host to have the config issue for that host cleared. Cluster
5.0 das.heartbeatDsPerHost HA chooses by default 2 heartbeat datastores for each host in a HA cluster. This option can be used to increase the number to a value in the range of 2 to 5 inclusive. Yes. Reconfigure HA on all hosts in the cluster. Cluster
5.0 das.ignoreInsufficientHbDatastore HA will report a host config issue if it was not able to select the required number of datastores for a host given by das.heartbeatDsPerHost. Set this option to true to suppress this warning, and false to enable it. A value of false is assumed if the option is not set. Yes. Reconfigure HA on all hosts in the cluster. Cluster
5.0 das.includeFTcomplianceChecks Whether to check the cluster for compliance with Fault Tolerance as part of the cluster profile compliance check. Set this option to false if you don't plan to use FT in the cluster. A value of true enables the checks. If unset, a value of true is assumed. No. Cluster
5.0 das.vmMemoryMinMB Value in MB to use for the memory reservation of a virtual machine if no non-zero memory reservation is set by a user. 0 is assumed if the option is not set. No. Cluster
5.0 das.vmCpuMinMHz Value in MHz to use for the CPU reservation of a virtual machine if no non-zero CPU reservation is set by a user. 32 is assumed if the option is not set. No. Cluster
5.0 das.slotCpuInMHz Maximum value in MHz to use for CPU component of the slot size. No limit is imposed if the option is not set. In 5.1, the CPU component of the slot size can be exactly specified in the UI and the API (see the vim.cluster.slotPolicy object). Note that this option and the UI/API behave differently -- this option sets a max while the UI/API sets the exact value. If a slot policy is defined and this option is specified, the value specified by this option is ignored. No. Cluster.
5.0 das.slotMemInMB Maximum value in MB to use for memory component of the slot size. No limit is imposed if the option is not set. In 5.1, the memory component of the slot size can be exactly specified in the UI and the API (see the vim.cluster.slotPolicy object). Note that this option and the UI/API behave differently -- this option sets a max while the UI/API sets the exact value. If a slot policy is defined and this option is specified, the value specified by this option is ignored. No. Cluster.
5.0 das.maxvmrestartcount The maximum number of times a FDM master will try to restart a virtual machine before giving up. Five attempts will be made if this option is unset. This limit only applies if the time since the first restart attempt was made is less than das.maxvmrestartperiod. Note that FT secondary virtual machine restarts are governed by the separate parameter, das.maxftvmrestartcount. No. Cluster.
5.0 das.maxvmrestartperiod The maximum amount of time (in seconds) during which a FDM master will attempt to restart a virtual machine after the first restart attempt failed. The time is measured from when the FDM master first tried to restart the virtual machine. This time limit takes precedence over das.maxvmrestartcount. No time limit is imposed if this option is unset. No. Cluster.
5.0 das.maxftvmrestartcount The maximum number of times a FDM master will try to start a secondary virtual machine for an FT virtual machine pair before giving up. Five attempts will be made if this option is unset. No. Cluster.
5.0 U1 das.maskCleanShutdownEnabled When a virtual machine powers off and its home datastore is not accessible, HA cannot determine whether the virtual machine should be restarted. So, it must make decision. If this option is set to false, the responding FDM master will assume the virtual machine should not be restarted, while if this option is set to true, the responding FDM will assume the virtual machine should be restarted. If the option is unset in 5.0U1, a value of false is assumed, whereas in 5.1+, a value of true is assumed. No Cluster
5.0 das.isolationAddressX IP addresses an FDM agent uses to check for isolation when no agent network traffic is observed on the network used by HA, where X = 0-9. HA will use the default management-network gateway as an isolation address by default plus those specified by this advanced option as additional addresses to check. We recommend adding an isolation address for each management network used by HA. No. Cluster
5.0 das.useDefaultIsolationAddress Whether the default isolation address (gateway of management network) should be used when determining if a host is network isolated. Valid values are true/false. By default, the management network default gateway is used. If the default gateway is a non-pingable address, set the “das.isolationaddressX” to a pingable address and disable the usage of the default gateway by setting this option to “false”. No. Cluster
5.1 das.config.fdm.isolationPolicyDelaySec The number of seconds an FDM agent waits before executing the isolation policy once it has determined that the host is isolated. The minimum value is 30. If set to a value less than 30, the delay will be 30 seconds. No Cluster
5.0 das.isolationShutdownTimeout The number of seconds a FDM waits for a virtual machine to power off after initiating a guest shutdown before the FDM issues a power off. If the option is unset, 300s is used. No Cluster
5.0 das.iostatsInterval If a FDM detects that a sufficient number of VMtools heartbeats are missing to trigger a virtual machine's configured virtual machine/App monitoring policy, the FDM checks if any I/O have been issued in the last ioStatsInterval, and will only reset the virtual machine if no I/O occurred in this interval. Values of 0 or greater are valid. 120s is assumed if the option is unset. No Cluster
5.0 das.maxFtVmsPerHost Specifies the number of Fault Tolerance virtual machines that can be run on a host at one time. If unset, a value of 4 is used. A value of -1 or 0 disables the limit. The limit is enforced by vCenter Server when executing user initiated power ons and vmotions, and by DRS when doing initial placement and load balancing. HA does not enforce this limit to maximize uptime. DRS does not correct any violations of this limit. No Cluster
5.0 das.config.log.maxFileNum Controls the number of FDM log-file rotations retained by the FDM file-based logger. The file-based logger is used by default only by the FDM when running on ESX versions earlier than ESX 5.0. If you wish to change the number of log-file rotations maintained for a pre ESX 5.0 host, set this option to the desired number of log files. For ESX 5.0+ hosts, the FDM logs to syslog by default and so you need to use the syslog configuration mechanism to change the amount of retained logging history. However, it is possible to enable the file-based logger for ESX 5.0+ hosts as well. To do so, set this option to a valid value. If you are using vSphere 5.0U1+, you must also set the option das.config.log.outputToFiles to true. For all ESX versions, setting the option das.config.log.maxFileNum to 1 will disable the log-file rotations. The location of log files can be changed using the option das.config.log.directory. Yes Cluster
5.0 das.config.log.maxFileSize Controls the size of each log file written out by the FDM file-based logger. Files are 1 MB in size unless this option is specified. This option is used in conjunction with das.config.log.maxFileNum to control the log history. Yes Cluster
Less Common Options:

Version Name Description Reconfiguration Type of Option
5.0 vpxd.das.aamMemoryLimit Memory limit in MB for the resource pool used by HA (the aam resource pool). If unspecified, 100 MB is used. Value applies to all clusters in the vCenter Server inventory. Yes, HA must be reconfigured on all hosts for which the change is required. VC
5.0 vpxd.das.electionWaitTimeSec How long does vCenter Server wait in seconds after sending the host list to a new host for vCenter Server to learn the outcome of the election. A timeout exception is thrown if the host is not a master or connected slave by the timeout. If not specified, a value of 120 seconds is used.
No. Applied the next time a FDM is configured. VC
5.0 fdm.nodeGoodness When a master election is held, the FDMs exchange a goodness value, and the FDM with the largest goodness value is elected master. Ties are broken using the host IDs assigned by vCenter Server. This parameter can be used to override the computed goodness value for a given FDM. To force a specific host to be elected master each time an election is held and the host is active, set this option to a large positive value. This option should not be specified at the cluster level. No. The new goodness value will be used in the next election. fdm
5.0 vpxd.das.sendProtectListIntervalSec Minimum time (in seconds) between consecutive calls by vCenter Server to the HA master agent (it is contact with) to request that it protect a new virtual machine. If not specified, 60s is used. This option also controls how frequently vCenter Server sends the master updates to the virtual machine to host compatibility information for virtual machines that are powered on when their compatibility with hosts changes.
Yes, vCenter Server needs to be restarted after setting this option. VC
5.0 vpxd.das.slotMemMinMB vCenter Server-wide default value in MB to use for memory reservation if no memory reservation is specified for a virtual machine. Setting the cluster option das.vmMemoryMinMB for a cluster will override this value for that cluster. If this option is not set, a value of zero is assumed unless overridden by das.vmMemoryMinMB. No. The value will be taken into account the next time admission control is done. VC
5.0 vpxd.das.slotCpuMinMHz vCenter Server-wide default value in MHz to use for cpureservation if no CPU reservation is specified for a virtual machine. Setting the cluster option das.vmCPUinMHz for a cluster will override this value for that cluster. If this option is not set, a value of 32 is assumed unless overridden by das.vmCPUinMHz. No. The value will be taken into account the next time admission control is done. VC
5.0 das.config.fdm.hostTimeout Controls the time in seconds a master FDM waits in seconds for a slave FDM to respond to a heartbeat before declaring the slave host not connected and initiating the work flow to determine whether the host is dead, isolated, or partitioned. If not specified, 10s is used. Yes. Reconfigure HA on all hosts. Cluster
5.0 fdm.deadIcmpPingInterval ICPM pings are used to determine whether a slave host is network accessible when the FDM on that host is not connected to the master. This option controls the interval (expressed in seconds) between pings. If not specified, 10s is used. In 5.0, after making a change, HA must be reconfigured on all hosts in the cluster. In 5.1+, no. Cluster
5.0 das.config.fdm.icmpPingTimeout Defines the time a FDM waits in seconds for an ICMP ping reply before assuming the host being pinged is not network accessible. If not specified, 5s is used. In 5.0, after making a change, HA must be reconfigured on all hosts in the cluster. In 5.1+, no. Cluster
5.0 vpxd.das.heartbeatPanicMaxTimeout This option impacts how long it takes for a host impacted by a PSOD to release file locks and hence allow HA to restart virtual machines that were running on it. If not specified, 60s is used. HA sets the host Misc.HeartbeatPanicTimeout advanced option to the value of this HA option. The HA option is in seconds. Yes, after setting the option, HA needs to be reconfigured on all hosts in all HA clusters. VC
5.0 das.config.fdm.policy.unknownStateMonitorPeriod Defines the number of seconds the HA master agent waits after it detects that a virtual machine has failed before it attempts to restart the virtual machine. If not specified, 10s is used. No Cluster
5.0 das.perHostConcurrentFailoversLimit The number of concurrent failovers a given FDM will have in progress at one time. Setting a larger value will allow more virtual machines to be restarted concurrently but will also increase the average latency to power each on since a greater number adds more stress on the hosts and storage. The default value is 32. This value was determined empirically to provide the minimum overall latency. No Cluster
5.0 das.config.fdm.ft.cleanupTimeout When a vSphere Fault Tolerance virtual machine is powered on by vCenter Server, vCenter Server informs the HA master agent that it is doing so. This option controls how many seconds the HA master agent waits for the power on of the secondary virtual machine to succeed. If the power on takes longer than this time (most likely because vCenter Server has lost contact with the host or has failed), the master agent will attempt to power on the secondary virtual machine. If the option is not specified, 900s is used. No Cluster
5.0 das.config.fdm.storageVmotionCleanupTimeout When a storage vmotion is done in a HA enabled cluster using pre 5.0 hosts and the home datastore of the virtual machine is being moved, HA may interpret the completion of the storage vmotion as a failure, and may attempt to restart the source virtual machine. To avoid this issue, the HA master agent waits the specified number of seconds for a storage vmotion to complete or fail. When the storage vmotion completes or the timer expires, the master will assess whether a failure occurred. If the option is not specified, 900s is used for the timeout. No Cluster
5.0U1 das.config.log.outputToFiles Enable the FDM file-based logger for 5.0+ hosts. 5.0 host log to the ESX syslog and so file-based logging is not enabled by default. This option has no affect on pre-5.0 hosts. To enable the file-based logger, set das.config.log.outputToFiles to true and das.config.log.maxFileNum to a number greater than 2. To disable file-based logging, set this option to false. Yes Cluster
5.0 das.config.log.directory Sets the directory used by the FDM file-based logger. If not specified, files are written into /var/log/vmware/fdm. See the option das.config.log.maxFileNum for more information. Yes Cluster
5.0 das.config.fdm.stateLogInterval Frequency in seconds a FDM logs a summary of the cluster state. If not specified, 600s (10 min) is used. In 5.0, yes, HA must be reconfigured on all hosts. In 5.1+, no. Cluster
5.0 das.config.fdm.event.maxMasterEvents Defines the maximum number of events cached by the master. If not specified, 1000 are cached. In 5.0, yes, HA must be reconfigured on all hosts. In 5.1+, no. Cluster
5.0 das.config.fdm.event.maxSlaveEvents Defines the maximum number of events cached by a slave. If not specified, 600 are cached. In 5.0, yes, HA must be reconfigured on all hosts. In 5.1+, no. Cluster
5.0 vpxd.das.reportNoMasterSec A vCenter Server parameter that determines how long to wait in seconds before issuing a cluster config issue to report that vCenter Server was unable to locate the HA master agent for the corresponding cluster. If not specified, 120s is used. Yes, vCenter Server needs to be restarted. VC
 

Additional Information

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 6 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 6 Ratings
Actions
KB: