VCP-6 Objective 9.1–Configure Advanced vSphere HA Features

For this objective I used the following resources:

Objective 9.1 – Configure Advanced vSphere HA Features

Knowledge

Explain Advanced vSphere HA Settings

Enable/Disable Advanced vSphere HA Settings

Since both these topics pretty much go hand in hand I am going to cover them jointly. VMware vSphere allows you to add to or change the default behavior of the cluster HA settings. While the default settings maybe appropriate for the majority of environments, depending on your specific implementation setting changes may be needed. I hope it goes without saying that vSphere HA will need to be enabled on the given cluster to make any changes.

Listing of Advanced Options

Option Description
das.isolationaddress[…] Sets the address to ping to determine if a host is isolated from the network. This address is pinged only when heartbeats are not received from any other host in the cluster. If not specified, the default gateway of the management network is used. This default gateway has to be a reliable address that is available, so that the host can determine if it is isolated from the network. You can specify multiple isolation addresses (up to 10) for the
cluster: das.isolationaddressX, where X = 0-9. Typically you should specify one per management network. Specifying too many addresses makes isolation detection take too long.
das.usedefaultisolationaddress By default, vSphere HA uses the default gateway of the console network as an isolation address. This option specifies whether or not this default is used (true|false).
das.isolationshutdowntimeout The period of time the system waits for a virtual machine to shut down before powering it off. This only applies if the host’s isolation response is Shut down VM. Default value is 300 seconds.
das.slotmeminmb Defines the maximum bound on the memory slot size. If this option is used, the slot size is the smaller of this value or the maximum memory reservation plus memory overhead of any powered-on virtual machine in the cluster.
das.slotcpuinmhz Defines the maximum bound on the CPU slot size. If this option is used, the slot size is the smaller of this value or the maximum CPU reservation of any powered-on virtual machine in the cluster.
das.vmmemoryminmb Defines the default memory resource value assigned to a virtual machine if its memory reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 0 MB.
das.vmcpuminmhz Defines the default CPU resource value assigned to a virtual machine if its CPU reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 32MHz.
das.iostatsinterval Changes the default I/O stats interval for VM Monitoring sensitivity. The default is 120 (seconds). Can be set to any value greater than, or equal to 0. Setting to 0 disables the check.NOTE Values of less than 50 are not recommended since smaller values can result in vSphere HA unexpectedly resetting a virtual machine.
das.ignoreinsufficienthbdatastore Disables configuration issues created if the host does not have sufficient heartbeat datastores for vSphere HA. Default value is false.
das.heartbeatdsperhost Changes the number of heartbeat datastores required.
Valid values can range from 2-5 and the default is 2.
fdm.isolationpolicydelaysec The number of seconds system waits before executing the
isolation policy once it is determined that a host is isolated.
The minimum value is 30. If set to a value less than 30, the
delay will be 30 seconds.
das.respectvmvmantiaffinityrules Determines if vSphere HA enforces VM-VM anti-affinity rules. Default value is “false”, whereby the rules are not enforced. Can also be set to “true” and rules are enforced (even if vSphere DRS is not enabled). In this case, vSphere HA does not fail over a virtual machine if doing so violates a rule, but it issues an event reporting there are insufficient resources to perform the failover.
See vSphere Resource Management for more information on
anti-affinity rules.
das.maxresets The maximum number of reset attempts made by VMCP. If a reset operation on a virtual machine affected by an APD situation fails, VMCP retries the reset this many times before giving up
das.maxterminates The maximum number of retries made by VMCP for virtual machine termination.
das.terminateretryintervalsec If VMCP fails to terminate a virtual machine, this is the number of seconds the system waits before it retries a terminate attempt
das.config.fdm.reportfailoverfailevent When set to 1, enables generation of a detailed per-VM event when an attempt by vSphere HA to restart a virtual machine is unsuccessful. Default value is 0. In versions earlier than vSphere 6.0, this event is generated by default.
vpxd.das.completemetadataupdateintervalsec The period of time (seconds) after a VM-Host affinity rule is set during which vSphere HA can restart a VM in a DRSdisabled cluster, overriding the rule. Default value is 300 seconds.
das.config.fdm.memreservationmb By default vSphere HA agents run with a configured
memory limit of 250 MB. A host might not allow this
reservation if it runs out of reservable capacity. You can use this advanced option to lower the memory limit to avoid this issue. Only integers greater than 100, which is the minimum value, can be specified. Conversely, to prevent problems during master agent elections in a large cluster (containing 6,000 to 8,000 VMs) you should raise this limit to 325 MB.
NOTE Once this limit is changed, for all hosts in the cluster you must run the Reconfigure HA task. Also, when a new host is added to the cluster or an existing host is rebooted, this task should be performed on those hosts in order to update this memory setting.

 

Configuring Advanced Options

    • Log into the vSphere Web Client with administrative privileges
    • From the Home screen in the vSphere Web Client, select Hosts and Clusters in the right hand navigation
    • In the left hand pane select expand your Datacenter object and select the vSphere Cluster
    • Right click on the vSphere Cluster and select Settings
    • In the right hand pane under Services select vSphere HA
    • Click the Edit button on the right
    • In the Edit Cluster Settings window expand Advanced Options
    • Click Add and type the name of the advanced option in the text box
    • Set the value of the option in the text box in the Value column
    • Repeat the following two steps for additional options you would like to add. Click OK when completed.

The screenshot below displays the Edit Cluster Settings window and for an example I have the the advanced options for das.slotmeminmb and das.slotcpuinmhz (values are 512 and 500 respectively):

AdvancedSettings-Cluster

For a complete listing of all the available vSphere HA settings have a look at VMware KB Article 2033250 – Advanced Configuration Options for VMware High Availability in vSphere 5.x (Note, at the time of this blog post there is not an equivalent VMware KB article for vSphere 6.x)

Explain How vSphere HA Interprets Heartbeats

vSphere HA utilizes the concept of master and slave hosts to build out an HA cluster. These hosts communicate with each other using heartbeats. The master host is responsible for detecting the failure of slave hosts in the cluster. The hosts communicate with network heartbeats every second with the master host monitoring the slave hosts. If the master host stops receiving heartbeats from a slave host the master host will check to see if the slave host has exchange heartbeats with a datastore (Datastore Heartbeating) and will also verify if the management IP of the slave host responds to ICMP ping requests. If all checks have not succeeded that slave host is considered to have failed and its virtual machines will be restarted.

Identify Virtual Machine Override Priorities

Each virtual machine in a vSphere HA cluster is assigned the cluster default settings for VM Restart Priority, Host Isolation Response, VM Component Protection, and VM Monitoring. You can specify specific behavior for each virtual machine by changing these defaults. If the virtual machine leaves the cluster, these settings are lost.

  • Log into the vSphere Web Client with administrative privileges
  • From the Home screen in the vSphere Web Client, select Hosts and Clusters in the right hand navigation
  • In the left hand pane select expand your Datacenter object and select the vSphere Cluster
  • Right click on the vSphere Cluster and select Settings
  • In the right hand pane under Configuration select VM Overrides
  • Click the Add button on the right
  • Use the + button to launch the Select a VM popup. Select the virtual machine or machines to which to apply the overrides
  • Change the virtual machine settings for VM restart priority, Response for Host Isolation, etc.
  • Click OK when completed

In the example screen shot below, I selected the virtual machine VCP and made changes to both the VM restart priority and Response for Host Isolation options:

VM-Overrides

Identify Virtual Machine Component Protection (VMCP) Settings

VMCP provides protection against datastore accessibility failures that can affect a virtual machine running on a host in a vSphere HA cluster. When a datastore accessibility failure occurs, the affected host can no longer access the storage path for a specific datastore. You can determine the response that vSphere HA will make to such a failure, ranging from the creation of event alarms to virtual machine restarts on other hosts.

There are two types of datastore accessibility failure:

  • PDL – PDL (Permanent Device Loss) is an unrecoverable loss of accessibility that occurs when a storage device reports the datastore is no longer accessible by the host. This condition cannot be reverted without powering off virtual machines
  • APD – APD (All Paths Down) represents a transient or unknown accessibility loss or any other unidentified delay in I/O processing. This type of accessibility issue is recoverable

Configuring VMCP – vSphere Cluster Settings

  • Log into the vSphere Web Client with administrative privileges
  • From the Home screen in the vSphere Web Client, select Hosts and Clusters in the right hand navigation
  • In the left hand pane select expand your Datacenter object and select the vSphere Cluster
  • Right click on the vSphere Cluster and select Settings
  • In the right hand pane under Services select vSphere HA
  • Click the Edit button on the right
  • Select the box for Protect Against Storage Connectivity Loss

VMCP-Cluster

  • PDL Failures – A virtual machine is automatically failed over to a new host unless you have configured VMCP only to Issue Events
  • APD Events – The response to APD is more complex and accordingly the configuration is more fine- grained. After the user-configured Delay for VM failover for APD period has
    elapsed, the action taken depends on the policy you selected. An event will be issued and the virtual machine is restarted conservatively or aggressively. The conservative approach does not terminate the virtual machine if the success of the failover is unknown, for example in a network partition. The aggressive approach does terminate the virtual machine under these
    conditions. Neither approach terminates the virtual machine if there are insufficient resources in the cluster for the failover to succeed. If APD recovers before the user-configured Delay for VM failover for APD period has elapsed, you can choose to reset the affected virtual machines, which recovers the guest applications that were impacted by the IO failures.

Thanks for reading and happy studying!

-Jason