แสดงบทความที่มีป้ายกำกับ vSphere HA (High Availability) แสดงบทความทั้งหมด
แสดงบทความที่มีป้ายกำกับ vSphere HA (High Availability) แสดงบทความทั้งหมด

วันพุธที่ 2 เมษายน พ.ศ. 2557

Troubleshooting VMWare HA -Cannot complete the HA Configuration

Troubleshooting VMWare HA -Cannot complete the HA Configuration

What the basic troubleshooting steps in case of HA agent install failed on hosts in HA cluster?


If you are facing any issues related to hosts in  the HA cluster , I would recommend to follow the  below basic 10 troubleshooting steps. Most of the time, This will resolve the issues.


Error message will be similar to the below one
1. Check your environment, if any temporary network problem exists
2. Check the DNS is configured properly
3. Check the vmware HA agent status in ESX host by using below commands
  service vmware-aam status
4. Check the  ESX networks are properly configured  and named exactly as other hosts in the cluster.              otherwise, you will get the below errors while installing or reconfiguring HA agent.
5. Check HA related ports are open in firewall to allow for the communication
   
     Incoming port: TCP/UDP 8042-8045
     Outgoing port: TCP/UDP 2050-2250
6. Try to restart /stop/start the vmware HA agent on the affected host using the below commands.
In addition, u can also try to restart vpxa and management agent in the Host.
service vmware-aam restart
service vmware-aam stop
service vmware-aam start
7. Right Click the affected host and click on “Reconfigure for VMWare HA” to re-install the HA agent that particular host.
8. Remove the affected host from the cluster. Removing ESX host from the cluster will not be allowed untill that host is put into maintenance mode.
9.Alternative solution for 8 step is, Goto cluster settings and uncheck the vmware HA to turnoff the HA in that cluster and re-enable the vmware HA to get the agent installed from the scratch.
10. For further troubleshooting , review the HA logs under /Var/log/vmware/aam directory.
Thanks For Reading!!!!!

 http://www.vmwarearena.com/2012/07/troubleshooting-vmware-ha.html

VMWare HA Advanced Options



VMWare HA Advanced Options


Hi, In this blog post, . I am going to cover some of the advanced options can be configured at HA cluster level. Which are not available direct to edit from the cluster settings but can be added as the value in the HA advanced options. Isolation Response Time:-
By default, VMware HA has the isolation response timeout time is 15000 milliseconds Aka ,15 Seconds. Let’s say I want to avoid the restart of Virtual machines in the HA cluster because of temporary network fluctuation on my environment and i want to increase the isolation response timeout to 60000 milliseconds ( 60 seconds). The bellow is the steps to configure the isolation response time.
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.failuredetectiontime = 60000″
4. Click ok and isolation response time is now set to 60 seconds.
 Additional Isolation Address: -
By default, VMWare HA use to ping default gateway as the isolation address if it stops receiving heartbeat.We can add an additional values in case if we are using redundant service  console both belongs to different subnet.Let’s say we can add the default gateway of SC1 as first value and gateway of SC2 as the additional one using the below value
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.isolationaddress1 = 192.168.0.1″
4. Add the line “das.isolationaddress1 = 192.168.1.1″ as the additional isolation address
Changing Default Isolation Address:-
By default, VMWare HA uses default gateway as the isolation address incase if its stop receiving  heartbeat. Let’s take an example, In your organization, Firewall policy is blocking the ICMP ping  or your gateway address is not pingable. In that case, HA fells itself that host is isolated. This default isolation address can be changed using the below value.
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.usedefaultisolationaddress = False” to disable the usage of default gateway as isolation response address.
Configuring vMotion network to use for HA heartbeat usage:-
To configure the HA heartbeat to utilize the nic which is used for Vmotion network. By this value configured, HA heartbeat communication will be send via the nic which is used for vmotion communication.
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.allowVmotionNetworks = True”
Configuring retries for restart of Virtual Machines:-
When the ESX host is isolated and HA tries to restart the Virtual machine running on the isolated ESX host to the non-isolated ESS host in the HA cluster. By default, The retry value is 5. You can reduce the value of reties by using the below value
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.maxvmrestartcount = 2″ to set only 2 retry for the restart o the virtual machines
Ignore the Network redundant warning:-
We all may be come across the warnings in our HA cluster that ” No redundant network for the management network incase if we don’t have redundant service consoles. To vanish the warnings to appear due to non-redundant service console in cluster, configure the below value to TRUE.
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.bypassNetCompatCheck = TRUE ” to vanish the warnings from the HA Cluster.
Configuring Manual HA Slot Size:-
By default, HA slot size is determined by the Virtual machine Highest CPU and memory reservation. If no reservation is specified at the VM level, default slot size of 256 MHZ for CPU and 0 MB + memory overhead for ram will be taken as slot size. We can control the HA slot size manually by using the following values.
There are 4 options we can configure at HA advanced options related to slot size
das.slotMemInMB – Maximum Bound  value for HA memory slot size
das.slotCpuInMHz – Maximum Bound value for HA CPU slot Size
das.vmMemoryMinMB –  Minimum Bound  value for HA memory slot size
das.vmCpuMinMHz –  Minimum Bound  value for HA CPU slot size
Changing HA heartbeat interval:-
By default, HA heart beat interval is 1000 milliseconds i.e 1 seconds.  primary HA  node will send heartbeat to other primary and secondary and secondary nodes will only send to primary node every 1 seconds. this value can be changed using the following option.
1. Right Click your HA cluster
2. Goto to advanced options of HA
3    Add the line “das.failuredetectioninterval = 3000″ for heart beat to occur every 3 seconds
Changing default VM shutdown timeout:-
By default, HA waits for 300 seconds for the VM to be shutdown if the isolation response of the virtual machine is set as “shutdown” before it performs the virtual machine power off. this value can be changed using the following values.
1. Right Click your HA cluster
2. Goto to advanced options of HA
3    Add the line “das.isolationShutdownTimeout = 500″ to change it to 500 seconds
Thanks for Reading.!!!!!

http://www.vmwarearena.com/2012/07/vmware-ha-advanced-options.html

VMWare HA Slots Calculation



VMWare HA Slots Calculation


This article is just to cover how the HA slots are calculated step by step not more than that. If you want to read the detailed information about HA, Please refer HA Deep Dive by Duncan Epping. What is SLOT?
As per VMWare’s Definition,
“A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.”
If you have configured reservations at VM level, It influence the HA slot calculation. Highest memory reservation and highest CPU reservation of the VM in your cluster determines the slot size for the cluster.
Here is the Example,
If you have the VM configured with the highest memory reservation of 8192 MB (8 GB) and  highest CPU reservation of 4096 MHZ. among the other VM’s  in the cluster, then the slot size for memory is 8192 MB and slot size for CPU is 4096 MHZ. in the cluster.
If no VM level reservation is configured , Minimum CPU size of 256 MHZ and memory size of 0 MB +  VM memory overhead will be considered as CPU and Memory slot size.
Calculation for Number of Slots in cluster :-
Once we got the Slot size for memory and CPU by the above method , Use the below calculation
Num of CPU Slots  = Total available CPU resource of ESX or cluster   /  CPU Slot Size
Num of memory slotsTotal available memory resource of ESX or cluster minus memory used for service console & ESX system /  Memory Slot size
Let’s take a Example,  
I have 3 host on the cluster and 6 Virtual machine is running on the cluster and Each host capacity as follows
RAM = 50 GB per Host
CPU = 8 X 2.666 GHZ  per host
Cluster RAM Resources = 50 X 3 = 150 GB – Memory for service console and system = 143 GB
Cluster CPU resources = 8 X 2.6 X 3 =  63 GHZ (63000 MHZ) of total CPU capacity in the cluster – CPU Capacity used by the ESX System = 60384 MHZ



I don’t have any memory  or CPU reservation in my cluster, So,  the default CPU slot size 256 MHZ and one of my Virtual machine is assigned with 8 vcpu and its memory overhead is  344.98 MB (which is the highest overhead among my 6 virtual machines in the cluster)
Let’s calculate the num of  CPU  & Memory slots
Num of CPU Slots  = Total available CPU resource of cluster /  CPUSlot size in MHZ
No of CPU Slots = 60384 MHZ / 256 MHZ = 235.875 Approx
Num of Memory Slots =  Total available Memory resource of cluster  /  memory Slot Size  in MB
Num of Memory Slots =  146432 / 345 =  424 Approx
The most restrictive number among CPU and Memory slots determines the amount of slots for this cluster. We have 235 slots available for  CPU and 424 Slots available for Memory. So the most restrictive number is 235.
So, Total number of slots for my cluster is 235 Approx. Please find the  below snapshot
Please refer my blog post on Understanding Total Slots, Used Slots & Available slots in VMware HA Slot to understand Total Slots, Used Slots & Available slots
I hope you understand the post..Thanks For Reading !!!!!

http://www.vmwarearena.com/2012/07/ha-slots-calculation.html

Understanding Total Slots, Used Slots & Available slots in VMware HA Slots



Understanding Total Slots, Used Slots & Available slots in VMware HA Slots


This post is the follow up post for my previous post on VMware HA Slot Calculation. In that post, i have explained the step by step procedure for how to calculate the HA slot information. This post clarifies more on the Total Slots, Used Slots & Available slots in VMware HA Slot calculation. I strongly recommend to read my previous post on HA slot calculation to fully understand the concepts of this post.
From the above snapshot,
Total Slots in Cluster = 234
Used Slots  = 6
Available slots  = 150 
What is that? How does it calculate and our mind will think of the below calculation
Available slots = Total slot – Used slots i,e  Available slots = 234 – 6 = 228. It should come as 228 as available slots but why Available slots is 150 in the above snapshot. Is that wrong or VMware did something wrong in HA slot calculation? Absolutely NOT. Below is the method of calculation for Total Slots, Used Slots & Available slots.
Available Slots  = (Total Slots -Used Slots) – Slots reserved for failover capacity 
How to Calculate Slots reserved for failover ?
It is basically coming from you Admission Control Policy of HA cluster. I have enabled admission control and configured Admission control Policy as ” Host Failures Cluster Tolerates” is equal to 1 . So It will make sure the resources always be available in the cluster as a reserved capacity to tolerate the 1 ESX host failure in my 3 node HA cluster.So, It has to reserve some capacity for fail over purposes.
Lets Come back to the Calculation
Total Available Slots in the Cluster = 234
No of Hots in HA cluster = 3
Total Available slots per ESX host = 234 /3 = 78 Slots Per Host
“Host Failures Cluster Tolerates” Admission control Policy = 1 host Failure. So, 1 host failure should be tolerated in the cluster by reserving 78 Slots for fail over purposes.
Available Slots  = (Total Slots -Used Slots) – Slots reserved for fail over by admission control policy
Available Slots  = (234 – 6)  - 78 
Available Slots  = 150 Slots
I hope this post will clear doubts regarding Total Slots, Used Slots & Available slots in VMWare HA slot calculation. If you are looking for more information on Admission Control policy, Please visit my blog post on HA interview questions.
Thanks For Reading!!!!

http://www.vmwarearena.com/2012/09/understanding-total-slots-used-slots.html

VMWare interview questions and answers – HA (High Availability)


VMWare interview questions and answers – HA (High Availability)

What is  VMware HA?

As per VMware Definition,
VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity Download Now and Complete your End to End P2v Migrations using VMware Arena’s ”P2V ADMIN ISO
What is AAM in HA?
AAM is the Legato automated availability management.  Prior to vSphere 4.1, VMware’s HA is actually re engineered to work with VM’s with the help of  Legato’s Automated Availability Manager (AAM) software. VMware’s vCenter agent (vpxa) interfaces with the VMware HA agent which acts as an intermediary to the AAM software. From vSphere 5.0, it uses an agent called “FDM”  (Fault Domain Manager).
What are pre-requites for HA to work?
1.Shared storage for the VMs running in HA cluster
2.Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing
3.Create VMHA enabled Cluster
4.Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement)
What is maximum number of primary HA hosts in vSphere 4.1?
Maximum number of primary HA host is 5. VMware HA cluster chooses the first 5 hosts that joins the cluster as primary nodes and all others hosts are automatically selected as secondary nodes.
How to see the list of Primary nodes in HA cluster?
View the log file named “aam_config_util_listnodes.log” under /var/log/vmware/aam using the below command
cat /var/log/vmware/aam/aam_config_util_listnodes.log
What is the command to restart /Start/Stop HA agent in the ESX host?
service vmware-aam restart
service vmware-aam stop
service vmware-aam start
Where to located HA related logs in case of troubleshooting?
/Var/log/vmware/aam
What the basic troubleshooting steps in case of HA agent install failed on hosts in HA cluster?
Below steps are are taken from my blog posts Troubleshooting HA
1. Check for some network issues
2. Check the DNS is configured properly
3. Check the vmware HA agent status in ESX host by using below commands
  service vmware-aam status
4. Check the networks are properly configured  and named exactly as other hosts in the cluster. otherwise, you will get the below errors while installing or reconfiguring HA agent.
5. Check HA related ports are open in firewall to allow for the communication
   
     Incoming port: TCP/UDP 8042-8045
     Outgoing port: TCP/UDP 2050-2250
6. First try to restart /stop/start the vmware HA agent on the affected host using the below commands. In addition u can also try to restart vpxa and management agent in the Host.
        service vmware-aam restart
        service vmware-aam stop
        service vmware-aam start
7. Right Click the affected host and click on “Reconfigure for VMWare HA” to re-install the HA agent that particular host.
8. Remove the affected host from the cluster. Removing ESX host from the cluster will not be allowed untill that host is put into maintenance mode.
9.Alternative solution for 3 step is, Goto cluster settings and uncheck the vmware HA in to turnoff the HA in that cluster and re-enable the vmware HA to get the agent installed.
10. For further troubleshooting , review the HA logs under /Var/log/vmware/aam directory.
What is the maximum number of hosts per HA cluster?
Maximum number of hosts in the HA cluster is 32
What is Host Isolation?
VMware HA has a mechanism to detect a host is isolated from rest of hosts in the cluster. When the ESX host loses its ability to exchange heartbeat via management network between the other hosts in the HA cluster, that ESX host will be considered as a Isolated.
How Host Isolation is detected?
In HA cluster, ESX hosts uses heartbeats to communicate among other hosts in the cluster.By default, Heartbeat will be sent every 1 second.
If a ESX host in the cluster didn’t received heartbeat for for 13 seconds from any other hosts in the cluster, The host considered it as isolated and host will ping the configured isolation address(default gateway by default). If the ping fails, VMware HA will execute the Host isolation response
What are the different types isolation response available in HA?


Power off –  All the VMs are powered off , when the HA detects that the network isolation occurs
Shut down – All VMs running on that host are shut down with the help of VMware Tools, when the HA detects that the network isolation occurs.If the shutdown via VMWare tools not happened within 5 minutes, VM’s power off operation will be executed. This behavior can be changed with the help of HA advanced options. Please refer my Post on HA Advanced configuration

Leave powered on –  The VM’s state remain powered on or remain unchanged, when the HA detects that the network isolation occurs.
How to add additional isolation address for redundancy?
By default, VMWare HA use to ping default gateway as the isolation address if it stops receiving heartbeat.We can add an additional values in case if we are using redundant service  console both belongs to different subnet.Let’s say we can add the default gateway of SC1 as first value and gateway of SC2 as the additional one using the below value
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add the line “das.isolationaddress1 = 192.168.0.1″
4. Add the line “das.isolationaddress2 = 192.168.1.1″ as the additional isolation address
To know more about the Advanced HA Options
What is HA Admission control?
As per “VMware Availability Guide”,
VCenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected.
What are the 2 types of settings available for admission control?



Enable: Do not power on VMs that violate availability constraints
Disable: Power on VMs that violate availability constraints
What are the different types of Admission control policy available with VMware HA?
There are 3 different types of Admission control policy available.
Host failures cluster  tolerates
Percentage of cluster resources reserved as fail over spare capacity
Specify a fail over host
How the Host Failures cluster tolerates admission control policy works?



Select the maximum number of host failures that you can afford for or to guarantee fail over. Prior vSphere 4.1, Minimum is 1 and the maximum is 4.
In the Host Failures cluster tolerates admission control policy , we can define the specific number of hosts  that can fail  in the cluster and also it ensures that the sufficient resources remain to fail over all the virtual machines from that failed hosts to the other hosts in cluster. VMware High Availability(HA) uses a mechanism called slots to calculate both the available and required resources in the cluster for a failing over virtual machines from a failed host  to other hosts in the cluster.
What is SLOT?
As per VMWare’s Definition,
“A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.”
If you have configured reservations at VM level, It influence the HA slot calculation. Highest memory reservation and highest CPU reservation of the VM in your cluster determines the slot size for the cluster.
How the HA Slots are Calculated?
I have written a post about how the HA slots are calculated.
How to Check the HA Slot information from vSphere Client?
Click on Cluster Summary Tab and Click on “Advanced Runtime Info” to see the the detailed HA slots information.
What is use of Host Monitoring  status in HA cluster?
Let’s take an example, you are performing network maintenance activity on your switches which connects your one of th ESX host in HA cluster.
what will happen if the switch connected to the ESX host in HA cluster is down?
It will not receive heartbeat and also ping to the isolation address also failed. so, host will think itself as isolated and HA will initiate the reboot of virtual machines on the host to other hosts in the cluster. Why do you need this unwanted situation while performing scheduled maintenance window.
To avoid the above situation when performing scheduled activity which may cause ESX host to isolate, remove the check box in ” Enable Host Monitoring” until you are done with the network maintenance activity.
How to Manually define the HA Slot size?
By default, HA slot size is determined by the Virtual machine Highest CPU and memory reservation. If no reservation is specified at the VM level, default slot size of 256 MHZ for CPU and 0 MB + memory overhead for RAM will be taken as slot size. We can control the HA slot size manually by using the following values.
There are 4 options we can configure at HA advanced options related to slot size
das.slotMemInMB – Maximum Bound  value for HA memory slot size
das.slotCpuInMHz – Maximum Bound value for HA CPU slot Size
das.vmMemoryMinMB –  Minimum Bound  value for HA memory slot size
das.vmCpuMinMHz –  Minimum Bound  value for HA CPU slot size
For More HA related Advanced options, Please refer my blog post
How the “Percentage of cluster resources reserved as failover spare capacity” admission control policy works?




In the Percentage of cluster resources reserved as failover spare capacity admission control policy, We can define the specific percentage of total cluster resources are reserved for failover.In contrast to the “Host Failures cluster tolerates admission control policy”, It will not use slots. Instead This policy calculates the in the way below
1.It calculates the Total resource requirement for all Powered-on Virtual Machines in the cluster  and also calculates the total resource available in host for virtual machines.
2.It calculates the current CPU and Memory Failover capacity for the capacity.
3.If the current CPU and Memory Failover capacity for the cluster < configured failover capacity (ex 25 %)
4.Admission control will not allow to power on the virtual machine which violates the availability constraints.
How the “Specify a failover host” admission control policy works?




In the Specify a failover host” admission control policy, We can define a specific host as a dedicated failover host. When isolation response is detected, HA attempts to restart the virtual machines on the specified failover host.In this Approach, dedicated failover hist will be sitting idle without actively involving or not participating in DRS load balancing.DRS will not migrate or power on placement of virtual machines on the defined failover host.
What is VM Monitoring status?
HA will usually monitors ESX hosts and reboot the virtual machine in the failed hosts in the other host in the cluster in case of host isolation but i need the HA to monitors for Virtual machine failures also. here the feature called VM monitoring status as part of HA settings.VM monitoring restarts the virtual machine if the vmware tools heartbeat didn’t received with the specified time using Monitoring sensitivity.

http://www.vmwarearena.com/2012/07/vmware-interview-questions-and-answers_20.html#uds-search-results