Here is an interesting project: let's say you have one or more UPS (one per power supply) attached to your ESXi vm host (or hosts; this is completely scalable). Yes, it goes without saying providing uninterrupted power to your servers is a good idea. But, unless you are a large company chances are this power will only last so long. You can make it last even longer by having a plan that will decide in which order your physical servers will be shut down based on load and remaining power. That does mean shutting down your vm servers; for the sake of this discussion, we will assume they are ESXi-based.
I have seen interesting articles on shutting down ESXi hosts on case of power failure, but many assume you are monitoring the UPS through the ESXi host. That might be thinking small; what if that is not the case? What if you have a UPS or two on the bottom feeding the entire cabinet? Chances are you will be monitoring it from a host, be it a vm or not, that is running some monitoring program such as Nagios, that is set to do something in case of a power failure. Of course, if you have a monitoring vm you can talk to your UPS using either ethernet or USB passthrough depending on how sophisticated that model is. And it will decide when to tell our ESXi box it is time to shut down.
I do not know about you but I would like to gracefully save/shutdown the vm clients running in that host before that.
The plan is to have the host monitoring the UPS tell the ESXi host to run a shutdown procedure, which would need to first save the vm guests. And, once the vm server is back up and running, it would resume -- by its own accord or by the order of another server -- the saved vm clients. Yes, you will have to worry about how the monitoring and the ESXi hosts will talk to each other and how the client's clock will catch up, but for this article we will focus on creating a tool that only cares about saving and resuming all of the vm guests running in this ESXi box. We can expand later.
If we want to save the running vm clients, we probably should find out which ones are running. In a previous article we wrote a script to see if a given vm client is running, off, or saved. For the script we will be creating, we want to use something else, vmdumper. Here is what the help screen for the program says.
/tmp # vmdumper -h vmdumper: [options]Note the -l shows only the running vms, which is what we want to do. So, let's run that and see what it spits back (I will break them a bit so they will kinda fit the screen):-f: ignore vsi version check -h: print friendly help message -l: print information about running VMs -g: log specified text to the vmkernel log /tmp #
~ # /sbin/vmdumper -l wid=264397 pid=-1 cfgFile="/vmfs/volumes/52a08b50-984b4bf0-219f-d067 e51ce7b7/boot2docker/boot2docker.vmx" uuid="56 4d 11 2b 63 bc 88 fb-d9 e1 93 fc 69 36 66 45" displayName="boot2docker" vmxCartelID=264396 wid=13080 pid=-1 cfgFile="/vmfs/volumes/52a08b50-984b4bf0-219f-d067 e51ce7b7/Windows 2012/Windows 2012.vmx" uuid="56 4d e7 cb 24 11 63 13-04 0d 9b 41 08 f9 a3 be" displayName="Windows 2012" vmxCartelID=13079 wid=527962 pid=-1 cfgFile="/vmfs/volumes/52a08b50-984b4bf0-219f-d067 e51ce7b7/devcentos/devcentos.vmx" uuid="56 4d d7 e8 25 6c de 91-09 38 60 ce ab 5d 43 ca" displayName="devcentos" vmxCartelID=527961 ~ #As you can see, it shows the path for the config file the vm guest is using (cfgFile, its name (displayName) and something called wid. And a few other things I do not feel like caring about. So, how do we save a vm anyway? We know we can start a vm using vim-cmd vmsvc/power.on, so maybe it sounds similar. Some frustrating searching later we find that http://www.vi-toolkit.com/wiki/index.php/Vmsvc/power.hibernate might be a candidate. Thing is it needs wmid as the argument. I will save some time and state (have faith, brother!) it can be obtained by
vim-cmd vmsvc/getallvms | grep "${displayName}" | awk '{ print "vmid=" $1}'But, does it really work? We shall try with devcentos, which happens to have wmid=3 (again, I cheat because I have spent loads of time testing this):
/tmp # vim-cmd vmsvc/power.hibernate 3 (vim.fault.ToolsUnavailable) { dynamicType =And it does not seem to want to work. It needs VMware Tools, and I do not want to worry about it. So let's see what else we can use. After some looking I found vmdumper. To save devcentos we could do, faultCause = (vmodl.MethodFault) null, msg = "Cannot complete operation because VMware Tools is not running in this virtual machine.", } /tmp #
/tmp # vmdumper 527962 suspend_vm Suspending VM... /tmp #The weird number 527962 is the world id or wid for devcentos, which happens to be the first column in the output of vmdumper -l associated with that vm client.
Pet Peeve: If you remember the output of vmdumper -h, which should be the help page for that command, mentions nothing about suspend-vm. Good job, VMware! That does make me wonder what else you are not documenting...
Now my venting is done, let's see what we need.
- We need the wid to shut down with vmdumper
- We can resume (I tested already, and so can you!) the vm client using vim-cmd vmsvc/power.on. Thing is it needs wmid as the argument, which we figure out how to get above.
- We then need a way to save wmid so when we can restore the saved vms. Probably saving the names of the vms would also be a nice touch.
cat > save_runningvms.sh << 'EOF' #!/bin/sh IFS=$'\n' USAGE="Usage: $0 {save|resume}" SAVE_FILE=/var/tmp/save_vms if [ "$#" == "0" ]; then echo "$USAGE" exit 1 fi selection=$1 case $selection in # If we want to save them save ) rm -f ${SAVE_FILE} # Find which vms are currently running for i in $(vmdumper -l \ | awk ' BEGIN { FS = "\t" }; { print $1 ";" $5 }') do eval $i # Start saving them vmid=$(vim-cmd vmsvc/getallvms | grep "${displayName}" \ | awk '{ print "vmid=" $1}') vmdumper $wid suspend_vm # Write list of saved guests in $SAVE_FILE echo $vmid ";" $i >> ${SAVE_FILE} done ;; # If we want to restore them resume ) # Get list of saved guests for i in $(cat ${SAVE_FILE}) do # Wake them up eval $i vim-cmd vmsvc/power.on $vmid done ;; esac EOF chmod +x save_runningvms.shYou will note that I avoid using Bashisms because the shell in busybox is closer to Bourne than Bash.
I think you probably want to see it running. So, let's run it. First we do some saving
/tmp # ./save_runningvms.sh save Suspending VM... Suspending VM... Suspending VM... /tmp #Did it create the /var/tmp/save_vms file? If so, how does it look like?
/tmp # cat /var/tmp/save_vms vmid=24 ; wid=5718058;displayName="boot2docker" vmid=23 ; wid=5714001;displayName="Windows 2012" vmid=3 ; wid=5715871;displayName="devcentos" /tmp #Ok, I am not convinced. You must be lying. Lemme go to the other vmhost, vmhost, and ping devcentos
[raub@vmhost tmp]# ping devcentos PING devcentos.example.com (10.0.0.112) 56(84) bytes of data. From vmhost.example.com (10.0.0.19) icmp_seq=2 Destination Host Unreachable From vmhost.example.com (10.0.0.19) icmp_seq=3 Destination Host Unreachable From vmhost.example.com (10.0.0.19) icmp_seq=4 Destination Host Unreachable ^C --- devcentos.example.com ping statistics --- 7 packets transmitted, 0 received, +3 errors, 100% packet loss, time 6125ms pipe 3 [raub@vmhost tmp]#Hmmmm, okay. But maybe it was off and you were lying to me. So, let's see about waking up the sleeping vms.
/tmp # ./save_runningvms.sh resume Powering on VM: Powering on VM: Powering on VM: /tmp #And then pinging devcentos
[raub@vmhost tmp]# ping devcentos PING devcentos.example.com (10.0.0.112) 56(84) bytes of data. 64 bytes from devcentos.example.com (10.0.0.112): icmp_seq=1 ttl=64 time=212 ms 64 bytes from devcentos.example.com (10.0.0.112): icmp_seq=2 ttl=64 time=0.316 ms 64 bytes from devcentos.example.com (10.0.0.112): icmp_seq=3 ttl=64 time=0.313 ms ^C --- devcentos.example.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2078ms rtt min/avg/max/mdev = 0.313/70.992/212.349/99.954 ms [raub@vmhost tmp]#I guess the script does work after all. What's the world coming to?