Thursday, May 28, 2015

Save/Suspend and Resume a VMware ESXi vm client command line style

Here is an interesting project: let's say you have one or more UPS (one per power supply) attached to your ESXi vm host (or hosts; this is completely scalable). Yes, it goes without saying providing uninterrupted power to your servers is a good idea. But, unless you are a large company chances are this power will only last so long. You can make it last even longer by having a plan that will decide in which order your physical servers will be shut down based on load and remaining power. That does mean shutting down your vm servers; for the sake of this discussion, we will assume they are ESXi-based.

I have seen interesting articles on shutting down ESXi hosts on case of power failure, but many assume you are monitoring the UPS through the ESXi host. That might be thinking small; what if that is not the case? What if you have a UPS or two on the bottom feeding the entire cabinet? Chances are you will be monitoring it from a host, be it a vm or not, that is running some monitoring program such as Nagios, that is set to do something in case of a power failure. Of course, if you have a monitoring vm you can talk to your UPS using either ethernet or USB passthrough depending on how sophisticated that model is. And it will decide when to tell our ESXi box it is time to shut down.

I do not know about you but I would like to gracefully save/shutdown the vm clients running in that host before that.

The plan is to have the host monitoring the UPS tell the ESXi host to run a shutdown procedure, which would need to first save the vm guests. And, once the vm server is back up and running, it would resume -- by its own accord or by the order of another server -- the saved vm clients. Yes, you will have to worry about how the monitoring and the ESXi hosts will talk to each other and how the client's clock will catch up, but for this article we will focus on creating a tool that only cares about saving and resuming all of the vm guests running in this ESXi box. We can expand later.

If we want to save the running vm clients, we probably should find out which ones are running. In a previous article we wrote a script to see if a given vm client is running, off, or saved. For the script we will be creating, we want to use something else, vmdumper. Here is what the help screen for the program says.

/tmp # vmdumper -h
vmdumper: [options]  
         -f: ignore vsi version check
         -h: print friendly help message
         -l: print information about running VMs
         -g: log specified text to the vmkernel log
/tmp #
Note the -l shows only the running vms, which is what we want to do. So, let's run that and see what it spits back (I will break them a bit so they will kinda fit the screen):
~ # /sbin/vmdumper -l
wid=264397      pid=-1  cfgFile="/vmfs/volumes/52a08b50-984b4bf0-219f-d067
e51ce7b7/boot2docker/boot2docker.vmx" uuid="56 4d 11 2b 63 bc 88 fb-d9 e1 
93 fc 69 36 66 45"  displayName="boot2docker"       vmxCartelID=264396
wid=13080       pid=-1  cfgFile="/vmfs/volumes/52a08b50-984b4bf0-219f-d067
e51ce7b7/Windows 2012/Windows 2012.vmx"       uuid="56 4d e7 cb 24 11 63 
13-04 0d 9b 41 08 f9 a3 be"  displayName="Windows 2012"      vmxCartelID=13079
wid=527962      pid=-1  cfgFile="/vmfs/volumes/52a08b50-984b4bf0-219f-d067
e51ce7b7/devcentos/devcentos.vmx"     uuid="56 4d d7 e8 25 6c de 91-09 38 
60 ce ab 5d 43 ca"  displayName="devcentos" vmxCartelID=527961
~ #
As you can see, it shows the path for the config file the vm guest is using (cfgFile, its name (displayName) and something called wid. And a few other things I do not feel like caring about. So, how do we save a vm anyway? We know we can start a vm using vim-cmd vmsvc/power.on, so maybe it sounds similar. Some frustrating searching later we find that http://www.vi-toolkit.com/wiki/index.php/Vmsvc/power.hibernate might be a candidate. Thing is it needs wmid as the argument. I will save some time and state (have faith, brother!) it can be obtained by
vim-cmd vmsvc/getallvms | grep "${displayName}" | awk '{ print "vmid=" $1}'
But, does it really work? We shall try with devcentos, which happens to have wmid=3 (again, I cheat because I have spent loads of time testing this):
/tmp # vim-cmd  vmsvc/power.hibernate 3
(vim.fault.ToolsUnavailable) {
   dynamicType = ,
   faultCause = (vmodl.MethodFault) null,
   msg = "Cannot complete operation because VMware Tools is not running in this virtual machine.",
}
/tmp #
And it does not seem to want to work. It needs VMware Tools, and I do not want to worry about it. So let's see what else we can use. After some looking I found vmdumper. To save devcentos we could do
/tmp # vmdumper 527962 suspend_vm
Suspending VM...
/tmp # 
The weird number 527962 is the world id or wid for devcentos, which happens to be the first column in the output of vmdumper -l associated with that vm client.

Pet Peeve: If you remember the output of vmdumper -h, which should be the help page for that command, mentions nothing about suspend-vm. Good job, VMware! That does make me wonder what else you are not documenting...

Now my venting is done, let's see what we need.

  1. We need the wid to shut down with vmdumper
  2. We can resume (I tested already, and so can you!) the vm client using vim-cmd vmsvc/power.on. Thing is it needs wmid as the argument, which we figure out how to get above.
  3. We then need a way to save wmid so when we can restore the saved vms. Probably saving the names of the vms would also be a nice touch.
So, here is the script I wrote to save and restore the running vms. As you can see, it is rather dumb since it is an all or nothing kinda deal. It is also unforgiving: if you run it again to save vms, the old /var/tmp/save_vms file will be overwritten. For what I wrote this script for, that is but a small annoyance.
cat > save_runningvms.sh  << 'EOF'
#!/bin/sh
IFS=$'\n'
USAGE="Usage: $0 {save|resume}"
SAVE_FILE=/var/tmp/save_vms

if [ "$#" == "0" ]; then
        echo "$USAGE"
        exit 1
fi

selection=$1

case $selection in
   # If we want to save them
   save )
      rm -f ${SAVE_FILE}

      # Find which vms are currently running
      for i in $(vmdumper -l \
         | awk ' BEGIN { FS = "\t" }; { print $1 ";" $5 }')
      do
         eval $i
         # Start saving them
         vmid=$(vim-cmd vmsvc/getallvms | grep "${displayName}" \
            | awk '{ print "vmid=" $1}')
         vmdumper $wid suspend_vm

         # Write list of saved guests in $SAVE_FILE
         echo $vmid ";" $i >> ${SAVE_FILE}
      done
      ;;
   # If we want to restore them
   resume )
      # Get list of saved guests
      for i in $(cat ${SAVE_FILE})
      do
         # Wake them up
         eval $i
         vim-cmd vmsvc/power.on $vmid
      done
      ;;
esac
EOF
chmod +x save_runningvms.sh
You will note that I avoid using Bashisms because the shell in busybox is closer to Bourne than Bash.

I think you probably want to see it running. So, let's run it. First we do some saving

/tmp # ./save_runningvms.sh save
Suspending VM...
Suspending VM...
Suspending VM...
/tmp # 
Did it create the /var/tmp/save_vms file? If so, how does it look like?
/tmp # cat /var/tmp/save_vms
vmid=24 ; wid=5718058;displayName="boot2docker"
vmid=23 ; wid=5714001;displayName="Windows 2012"
vmid=3 ; wid=5715871;displayName="devcentos"
/tmp # 
Ok, I am not convinced. You must be lying. Lemme go to the other vmhost, vmhost, and ping devcentos
[raub@vmhost tmp]# ping devcentos
PING devcentos.example.com (10.0.0.112) 56(84) bytes of data.
From vmhost.example.com (10.0.0.19) icmp_seq=2 Destination Host Unreachable
From vmhost.example.com (10.0.0.19) icmp_seq=3 Destination Host Unreachable
From vmhost.example.com (10.0.0.19) icmp_seq=4 Destination Host Unreachable
^C
--- devcentos.example.com ping statistics ---
7 packets transmitted, 0 received, +3 errors, 100% packet loss, time 6125ms
pipe 3
[raub@vmhost tmp]# 
Hmmmm, okay. But maybe it was off and you were lying to me. So, let's see about waking up the sleeping vms.
/tmp # ./save_runningvms.sh resume
Powering on VM:
Powering on VM:
Powering on VM:
/tmp #
And then pinging devcentos
[raub@vmhost tmp]# ping devcentos
PING devcentos.example.com (10.0.0.112) 56(84) bytes of data.
64 bytes from devcentos.example.com (10.0.0.112): icmp_seq=1 ttl=64 time=212 ms
64 bytes from devcentos.example.com (10.0.0.112): icmp_seq=2 ttl=64 time=0.316 ms
64 bytes from devcentos.example.com (10.0.0.112): icmp_seq=3 ttl=64 time=0.313 ms
^C
--- devcentos.example.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2078ms
rtt min/avg/max/mdev = 0.313/70.992/212.349/99.954 ms
[raub@vmhost tmp]#
I guess the script does work after all. What's the world coming to?

No comments: