Monday, January 28, 2013

Restoring time on sleeping (linux) vms

One of the most annoying issues in a VM is keeping accurate time.When a vm client is saved/paused and then restored, its clock will be still set to whatever time was when it was paused. That can lead to many issues including not being able to login using Kerberos. Rebooting the machine will force the clock to be reset, but what if we do not want to (or cannot) reboot? What we could use a script that will monitor the drift and if it is too far -- say one minute off -- it will automagically adjust the clock in the vm client.
There are many virtualization packages out there, some of which (in no order) are VMWare ESX/ESXi, Virtual Box, kvm, and Microsoft Virtual Server. The first three I personally have used; if you want to see a more complete list I would suggest to try the Wikipedia entry on Hypervisors. Suffice to say some of those programs allow host machine (the physical machine running the program) expose its clock to the guest machine (or vm), others do not. So let's see how we can check and set the clock based on whether the guest can see the host clock or not. For this discussion I will stick to Linux because I feel lazy today.

Using the vm host clock

Let's say the vm program of your choice can pass the vm host's clock to the client (I do know that KVM can do that and believe VMWare ESXi can too), you could write something like this:
cat > /usr/local/bin/driftcheck << 'EOF'
#!/bin/sh
### Detect drift in the vm client vh most. If it is massive; adjust
### it.

HWCLOCK=/sbin/hwclock

# Max drift in seconds. Kerberos does not like time offsets > 5min, 
# so we set it to 1m = 1*60s
MAX_DRIFT=` echo "1*60" | bc -l`

VMHOST_TIME=`date +'%s' -d "$(${HWCLOCK} -r -u | cut -d' ' -f-7)"`
MY_TIME=`date +%s`
DRIFT=$( echo "${VMHOST_TIME} - ${MY_TIME}" | bc | tr -d -)

if [ "${DRIFT}" -gt "${MAX_DRIFT}" ]
then   
        ${HWCLOCK} -s -u
fi
EOF
chmod +x /usr/local/bin/driftcheck
What it does is get the hardware clock of this vm client, which actually is the vm host's clock. One thing this requires is the vm client is configured to take the vm host's clock in utc time. YMMV here, but if you are using kvm and libvit, you would have a line like this
<clock offset="utc"></clock>
somewhere in the xml file defining the client.
The commented lines inside the if statement are there just so when you test it out you can see what is going on. For production you probably want leave them commented out
Now, we would probably want to to have it being called often but not really crazy. So, for now let's say we create a cron job to run driftcheck every 5 minutes?
cat > /etc/cron.d/driftcheck << 'EOF'
*/5 * * * * root /usr/local/bin/driftcheck > /dev/null 2>&1
EOF
Of course, you should adjust it to fit your needs. Do note I put driftcheck in /usr/local/bin/; it just felt like a nice place this season.

Using NTP

As we mentioned above, sometimes we cannot (or will not) use the clock off the vm host. If we are using a ntp server, we are good. Ok, you might argue but, if the drift/skew is too long, ntpd will refuse to adjust the clock. And, even if we force it, it will take hours or even days!. Not if we nudge it a bit by using an old friend, ntpdate:
cat > /usr/local/bin/driftcheck << 'EOF'
#!/bin/sh
### Detect drift against a reliable time source. If it is massive;
### instead of relying on ntp (which will not work), do something
### a bit more drastic

NTPQ=/usr/bin/ntpq
NTPDATE=/usr/sbin/ntpdate

# Max drift in ms. Kerberos does not like time offsets > 5min, so
# we set it to 1min = 1*60*1000ms
MAX_DRIFT=` echo "60*1000" | bc -l`

# Find the ntp server we are using in this host
NTP_SERVER=`sed -ne '/^server/p' /etc/ntp.conf | awk '{ print $2 }'| head -1`
# Get current drift
DRIFT=`${NTPQ} -p ${NTP_SERVER}|grep '*'|awk '{print $9 }'`
[ -z $DRIFT ] && DRIFT=`${NTPQ} -p |tail -n +3| awk '{print $9 }'`
DRIFT=$( echo "${DRIFT}/1" | bc | tr -d -)

if [ "${DRIFT}" -gt "${MAX_DRIFT}" ]
then
        echo ${NTPDATE} -u ${NTP_SERVER} >> /tmp/ntp
        ${NTPDATE} -u ${NTP_SERVER} >> /tmp/ntp
fi
EOF
chmod +x /usr/local/bin/driftcheck
The reason we want to look for * is that according to the
NTP troubleshooting page, it is the source you are currently synchronized to.
The cron file will be the same as what we had in the hardware clock session, so I decided not to copy it.
And that is pretty much it. Remember they are just ideas. Those scripts should work as is but you can/should customize/adapt them. If you have any questions, concerns, or just want to confuse me, do leave a message.

References