What is VMware HA?
As per VMware Definition,
VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity
Download Now and Complete your End to End P2v Migrations using VMware Arena’s ”P2V ADMIN ISO“
What is AAM in HA?
AAM is the Legato automated
availability management. Prior to
vSphere
4.1, VMware’s HA is actually re engineered to work with VM’s with the
help of Legato’s Automated Availability Manager (AAM) software.
VMware’s
vCenter agent (
vpxa) interfaces with the VMware HA agent which acts as an intermediary to the AAM software. From
vSphere 5.0, it uses an agent called “FDM” (Fault
Domain Manager).
What are pre-requites for HA to work?
1.Shared storage for the VMs running in HA cluster
2.Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing
3.Create VMHA enabled Cluster
4.Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement)
What is maximum number of primary HA hosts in vSphere 4.1?
Maximum number of primary HA host is 5. VMware HA cluster chooses the
first 5 hosts that joins the cluster as primary nodes and all
others hosts are automatically selected as secondary nodes.
How to see the list of Primary nodes in HA cluster?
View the
log file named “aam_config_util_listnodes.log” under /var/log/
vmware/
aam using the below command
cat /var/log/
vmware/
aam/aam_config_util_listnodes.log
What is the command to restart /Start/Stop HA agent in the ESX host?
service vmware-
aam restart
service vmware-
aam stop
service vmware-
aam start
Where to located HA related logs in case of troubleshooting?
/
Var/log/
vmware/
aam
What the basic troubleshooting steps in case of HA agent install failed on hosts in HA cluster?
Below steps
are are taken from my blog posts
Troubleshooting HA
1. Check for some
network issues
2. Check the DNS is configured properly
3. Check the
vmware HA agent status in
ESX host by using below commands
service vmware-
aam status
4. Check the networks are properly configured and named exactly as other hosts in the cluster.
otherwise, you will get the below errors while installing or reconfiguring HA agent.
5. Check HA related ports are open in
firewall to allow for the communication
Incoming port: TCP/UDP 8042-8045
Outgoing port: TCP/UDP 2050-2250
6. First try to restart /stop/start the
vmware HA agent on the affected host using the below commands. In addition
u can also try to restart vpxa and management agent in the Host.
service vmware-
aam restart
service vmware-
aam stop
service vmware-
aam start
7. Right Click the affected host and click on “Reconfigure for
VMWare HA” to re-install the HA agent that particular host.
8. Remove the affected host from the cluster. Removing ESX host from the cluster will not be allowed
untill that host is put into maintenance mode.
9.Alternative solution for 3
step is, Goto cluster settings and
uncheck the
vmware HA in to
turnoff the HA in that cluster and re-enable the
vmware HA to get the agent installed.
10. For further troubleshooting , review the HA logs under /Var/log/
vmware/
aam directory.
What is the maximum number of hosts per HA cluster?
Maximum number of hosts in the HA cluster is 32
What is Host Isolation?
VMware HA has a mechanism to detect a host is isolated from rest of
hosts in the cluster. When the ESX host loses its ability to exchange heartbeat via
management network between the other hosts in the HA cluster, that ESX host will be considered as
a Isolated.
How Host Isolation is detected?
In HA cluster, ESX hosts uses heartbeats to communicate
among other hosts in the cluster
.By default, Heartbeat will be sent every 1 second.
If
a ESX host in the cluster
didn’t received
heartbeat for for 13 seconds from any other hosts in the cluster, The
host considered it as isolated and host will ping the configured
isolation address
(default gateway by default). If the ping fails, VMware HA will execute the Host isolation response
What are the different types isolation response available in HA?
Power off – All the VMs are powered off , when the HA detects that the network isolation occurs
Shut down – All VMs running on that host are shut down with
the help of VMware Tools, when the HA detects that the network isolation
occurs
.If the shutdown via
VMWare tools not happened within 5 minutes, VM’s power
off operation will be executed. This behavior can be changed with the help of HA advanced options. Please refer my Post on
HA Advanced configuration
Leave powered on – The VM’s state remain powered on or remain unchanged, when the HA detects that the network isolation occurs.
How to add additional isolation address for redundancy?
By default,
VMWare HA
use to ping
default gateway as the isolation address if it stops receiving heartbeat
.We can add an additional
values in case if we are using redundant service console both belongs to
different subnet.Let’s say we can add the default gateway of SC1 as
first value and
gateway of SC2 as the additional one using the below value
1. Right Click your HA cluster
2.
Goto to advanced options of HA
3. Add the line “das
.isolationaddress1 = 192.168.0.1″
4. Add the line “das
.isolationaddress2 = 192.168.1.1″ as the additional isolation address
To know more about the
Advanced HA Options
What is HA Admission control?
As per “VMware Availability Guide”,
VCenter
Server uses admission control to ensure that sufficient resources are
available in a cluster to provide failover protection and to ensure that
virtual machine resource reservations are respected.
What are the 2 types of settings available for admission control?
Enable: Do not power on VMs that violate availability constraints
Disable: Power on VMs that violate availability constraints
What are the different types of Admission control policy available with VMware HA?
There are 3 different types of Admission control policy available.
Host failures cluster tolerates
Percentage of cluster resources reserved as fail over spare capacity
Specify a
fail over
host
How the Host Failures cluster tolerates admission control policy works?
Select the maximum number of host failures that you can afford for or to guarantee fail over. Prior vSphere 4.1, Minimum is 1 and the maximum is 4.
In the Host Failures cluster tolerates admission control policy , we
can define the specific number of hosts that can fail in the cluster
and also it ensures that the sufficient resources remain to fail over
all the virtual machines from that failed hosts to the other hosts in
cluster. VMware High Availability
(HA)
uses a mechanism called slots to calculate both the available and
required resources in the cluster for a failing over virtual machines
from a failed host to other hosts in the cluster.
What is SLOT?
As per VMWare’s Definition,
“A slot is a logical representation of the memory and CPU resources
that satisfy the requirements for any powered-on virtual machine in the
cluster.”
If you have configured reservations at
VM level, It
influence
the HA slot calculation. Highest memory reservation and highest CPU
reservation of the VM in your cluster determines the slot size for the
cluster.
How the HA Slots are Calculated?
I have written a post about how the
HA slots are calculated.
How to Check the HA Slot information from vSphere Client?
Click on Cluster Summary Tab and Click on “Advanced Runtime Info” to see the the detailed HA slots information.
What is use of Host Monitoring status in HA cluster?
Let’s take an example, you are performing
network maintenance activity on your switches which connects your one of
th ESX host in HA cluster.
what will happen if the switch connected to the ESX host in HA cluster is down?
It will not receive heartbeat and also ping to the isolation address also failed.
so,
host
will think itself as isolated and HA will initiate the reboot of
virtual machines on the host to other hosts in the cluster. Why do you
need this unwanted situation while performing scheduled maintenance
window.
To avoid the above situation when performing scheduled activity which may cause
ESX host to isolate, remove the check box in ” Enable Host Monitoring” until you are done with the network maintenance activity.
How to Manually define the HA Slot size?
By default,
HA slot size
is determined by the Virtual machine Highest CPU and memory
reservation. If no reservation is specified at the VM level, default
slot size of 256 MHZ for CPU and 0 MB + memory overhead for RAM will be
taken as slot size. We can control the HA slot size manually by using
the following values.
There are 4 options we can configure at HA advanced options related to slot size
das.slotMemInMB – Maximum Bound value for HA memory slot size
das.slotCpuInMHz – Maximum Bound value for HA CPU slot Size
das.vmMemoryMinMB – Minimum Bound value for HA memory slot size
das.vmCpuMinMHz – Minimum Bound value for HA CPU slot size
For More HA related Advanced options, Please refer my
blog post
How the “Percentage of cluster resources reserved as failover spare capacity” admission control policy works?
In the Percentage of cluster resources reserved as
failover spare capacity admission control policy, We can define the
specific percentage of total cluster resources are reserved for failover
.In
contrast to the “Host Failures cluster tolerates admission control
policy”, It will not use slots. Instead This policy calculates the in
the way below
1.It
calculates the Total resource requirement for all Powered-on Virtual
Machines in the cluster and also calculates the total resource
available in host for virtual machines.
2.It calculates the current CPU and Memory Failover capacity for the capacity.
3.If the current CPU and Memory Failover capacity
for the cluster < configured failover capacity (ex 25 %)
4.Admission control will not allow to power on the virtual machine which violates the availability constraints.
How the “Specify a failover host” admission control policy works?
In the Specify a failover host” admission control policy, We can define a specific host as a dedicated failover host. When
isolation response is detected, HA attempts to restart the virtual machines on the specified failover host
.In this Approach, dedicated failover
hist will be sitting idle without actively involving or not participating in DRS load balancing
.DRS will not migrate or power on placement of virtual machines on the defined failover host.
What is VM Monitoring status?
HA will usually
monitors ESX hosts and reboot the virtual machine in the failed hosts in the other host in the cluster in case of host isolation but
i need the HA to
monitors for Virtual machine failures also.
here the feature called VM monitoring status as part of HA settings
.VM monitoring restarts the virtual machine if the
vmware tools heartbeat didn’t received
with the specified time using Monitoring sensitivity.
http://www.vmwarearena.com/2012/07/vmware-interview-questions-and-answers_20.html#uds-search-results