วันพฤหัสบดีที่ 22 พฤษภาคม พ.ศ. 2557

ESXi 5.0 Management Network Interface Testing - Part 1

ESXi 5.0 Management Network Interface Testing - Part 1

During the course of upgrading my ESX 4.1 Server farm.  I notice that the behaviour of ESXi Management Network Interface is different from the Service Console of ESX 4.1.  With that, I decided to further test and understand the concept and deployment of  multiple Management Interface of ESXi 5.0.

Test Objectives
The test objective is to test the redundancy of ESXi Management Network Interface for vCenter connectivity and SSH.  Objective is to recover ESXi remotely should a portion of the network failed.

Test Environment
  • Setup 2 vSwitch each with 2 Management Network Interface.
  • All Management Network IP is in the same IP subnet.
  • vCenter is in the same Management Network IP subnet.
  • 1 test machine is located in the same Management Network subnet to perform ping test.
  • All setup is done on the same IP subnet so that other network issues as routing and the single vmk default gateway consideration is taken off from this testing.

Networking Configuration Capture in vCenter on the Test Setup



Management Network Interface Capture in vCenter on the Test Setup





Network Adapters Capture in DCUI on the Test Setup






IP Configuration Capture in DCUI on the Test Setup



Ping Test Captured from the Test Machine



Switch Port Connectivity Status of vmk0, vmk1, vmk3, vmk4


Test Case 1: Both network cards uplink connecting to vSwitch0 is down.

Set port 1 and port 2 of Network Switch to administrative down

Status captured in vCenter


Test Machine Ping Test Result 



Conclusion of Test Case 1
When vmnic0 and vmnic 1 uplink failed.  All vmnics will failed. 

Tried to connect to Management Network 3 and 4 IP using vCenter also failed.  SSH to Management Network 3 and 4 also failed.

This came as a surprise to me.  My expectation is that Management Network connected to vSwitch0 should fail, but IP address connected to vSwitch1 will be successful.  But in this case, I nearly felt from my chair.

But after understanding that in my DUCI Network Management Adapters is still point to vmnic0 and vmnic1, I have a second thought.  Because the DUCI Network Management Adapters is still pointting to vmnic0 and vmnic1, I cannot fault ESXi that why the Management Network IP did not which to vmnic3 and vmnic 4. 

In this case, it seem to me that when the uplink specified in the Network Management Adapters failed, all vmk will fail.  This posted a great concern as in vMotion, FT, iSCSI will also failed even if they are in a different vSwitch and NIC cards.   This also posted a great concern in designing and how should we protect the Mangement Network IP.

This highlighted some weakness in ESXi Management IP design as compare to ESX 4.0 Management Console.


Recovery of of Test Case 1


1) Identify the network problem and restore back the network link

2) I have tried to work in the DUCI and bind vmnic2 to the Network adapter group.


But it hit into the following error during binding of vmnic2.


I have tried rebind using vmnic4, but there are even more errors, can casues more problems.  After a few tries and re-setup of the lab (because I have no choice but to reset the network config in DUCI back to default)  I realised that after binding vmnic2 to the Management Network adapter with errors you just have to take it and do a reboot of the ESXi System. 

After reboot, I manage to take back control of the ESXi in my vCenter.



Conclusion of Recovery Method of Test Case 1
In this recovery test I found another problem with ESXi recovery of the Management Network IP.  Because you can't shell into ESXi 5.0 anymore, you will not be able to run commands like vcli to check the current status of the network card binding.

After binding of vmnic2, vmk0 and vmk1 is still not up, you will have to do a reboot in order to bind the new vmnic2.  If there are VM running during this moment, and we have to do a reboot of the system, there must be a way for us to shutdown the VMs.  But because there are no shell in DUCI, you can't shutdown and VMs.  This posted a even bigger problem!

So if you are hit into this situation, the best way and safest way is still to restore back the physical network link.  


ESXi 5.0 Management Network Interface Testing - Part 2

ESXi 5.0 Management Network Interface Testing - Part 2

Part 2 of the Management Network Interface Testing is about miscofig of ESXi Management Network Interface.

Misconfiguration happened very day and is part of life.  We learned from our mistake and gain experience from our mistake. 

In this part 2 of the test, three tests will be performed
  • Duplicate IP address of current Management IP in the same network segment.
  • Misconfig of Management Network IP.
  • Delete of Management Network IP that is currently connecting to vCenter.


Test 1 - Duplicate IP address of current Management IP in the same network segment

Powered up a Windows 7 PC in another ESXi server.  Configured the Windows 7 PC with the same IP address as the testing ESXi Management IP.

Use Windows PC to ping vCenter. So that vCenter register the PC Mac Address.

Result of Test 1
After 2 min, vCenter loss control over the ESXi Server that is having the same IP with the Windows 7 PC.




Try to reconnect to the ESXi Server using the same IP.  Connection Error Message.



Click close and the following message pop up. 


Connected to vCenter again, the old ESXi IP is replaced with the new Management IP in vCenter.





 Test 2 - Misconfig of Management Network IP

Changed current Mangement Network IP VLAN to a different subnet (vlan 31).


Result of Test 2
vCenter lost connetivity to Management IP 1 with the following error message.




All other Network Management IP still working fine with SSH.


Recovery
Local Recovery
  • Change to the correct IP address or VLAN in DUCI
Remote Recovery
  • Remove ESXi host from vCenter.
  • Add another Network Management IP know to vCenter. 
  • Correct the misconfigurated mistake using vCenter.



Test 3 - Delete of Management Network IP that is currently connecting to vCenter



Current Networking Config.




Delete current ESXi Management IP that connect to vCenter.

After some error messages and clicking a few "ok", vCenter changed the Mangement IP of the ESXi Server to the next Management IP (from vmk0 to vmk1) within the same vSwitch.  Note that vmk0 is not there any more and the vMkernel Port Group has changed to "Virtual Machine Port Group".



Delete the ESXi IP again.  Same error message pop up and vCenter changed the Management Ip of the ESXi Server to the next Management IP in vSwitch1.  Note that it has now switch from vSwitch 0 vmk1 to vSwitch 1 vmk2.



Since the vSwitch has changed from vSwitch 0 to vSwitch1.  I look into the DCUI.  To my surprise the DCUI reflected the correct setting.





Delete the ESXi IP again. Same error message pop up and vCenter changed the Management IP of the ESXi Server to the next Management IP from vkm2 to vmk3.