Monday, January 28, 2013

Restoring time on sleeping (linux) vms

One of the most annoying issues in a VM is keeping accurate time.When a vm client is saved/paused and then restored, its clock will be still set to whatever time was when it was paused. That can lead to many issues including not being able to login using Kerberos. Rebooting the machine will force the clock to be reset, but what if we do not want to (or cannot) reboot? What we could use a script that will monitor the drift and if it is too far -- say one minute off -- it will automagically adjust the clock in the vm client.
There are many virtualization packages out there, some of which (in no order) are VMWare ESX/ESXi, Virtual Box, kvm, and Microsoft Virtual Server. The first three I personally have used; if you want to see a more complete list I would suggest to try the Wikipedia entry on Hypervisors. Suffice to say some of those programs allow host machine (the physical machine running the program) expose its clock to the guest machine (or vm), others do not. So let's see how we can check and set the clock based on whether the guest can see the host clock or not. For this discussion I will stick to Linux because I feel lazy today.

Using the vm host clock

Let's say the vm program of your choice can pass the vm host's clock to the client (I do know that KVM can do that and believe VMWare ESXi can too), you could write something like this:
cat > /usr/local/bin/driftcheck << 'EOF'
### Detect drift in the vm client vh most. If it is massive; adjust
### it.


# Max drift in seconds. Kerberos does not like time offsets > 5min, 
# so we set it to 1m = 1*60s
MAX_DRIFT=` echo "1*60" | bc -l`

VMHOST_TIME=`date +'%s' -d "$(${HWCLOCK} -r -u | cut -d' ' -f-7)"`
MY_TIME=`date +%s`
DRIFT=$( echo "${VMHOST_TIME} - ${MY_TIME}" | bc | tr -d -)

if [ "${DRIFT}" -gt "${MAX_DRIFT}" ]
        ${HWCLOCK} -s -u
chmod +x /usr/local/bin/driftcheck
What it does is get the hardware clock of this vm client, which actually is the vm host's clock. One thing this requires is the vm client is configured to take the vm host's clock in utc time. YMMV here, but if you are using kvm and libvit, you would have a line like this
<clock offset="utc"></clock>
somewhere in the xml file defining the client.
The commented lines inside the if statement are there just so when you test it out you can see what is going on. For production you probably want leave them commented out
Now, we would probably want to to have it being called often but not really crazy. So, for now let's say we create a cron job to run driftcheck every 5 minutes?
cat > /etc/cron.d/driftcheck << 'EOF'
*/5 * * * * root /usr/local/bin/driftcheck > /dev/null 2>&1
Of course, you should adjust it to fit your needs. Do note I put driftcheck in /usr/local/bin/; it just felt like a nice place this season.

Using NTP

As we mentioned above, sometimes we cannot (or will not) use the clock off the vm host. If we are using a ntp server, we are good. Ok, you might argue but, if the drift/skew is too long, ntpd will refuse to adjust the clock. And, even if we force it, it will take hours or even days!. Not if we nudge it a bit by using an old friend, ntpdate:
cat > /usr/local/bin/driftcheck << 'EOF'
### Detect drift against a reliable time source. If it is massive;
### instead of relying on ntp (which will not work), do something
### a bit more drastic


# Max drift in ms. Kerberos does not like time offsets > 5min, so
# we set it to 1min = 1*60*1000ms
MAX_DRIFT=` echo "60*1000" | bc -l`

# Find the ntp server we are using in this host
NTP_SERVER=`sed -ne '/^server/p' /etc/ntp.conf | awk '{ print $2 }'| head -1`
# Get current drift
DRIFT=`${NTPQ} -p ${NTP_SERVER}|grep '*'|awk '{print $9 }'`
[ -z $DRIFT ] && DRIFT=`${NTPQ} -p |tail -n +3| awk '{print $9 }'`
DRIFT=$( echo "${DRIFT}/1" | bc | tr -d -)

if [ "${DRIFT}" -gt "${MAX_DRIFT}" ]
        echo ${NTPDATE} -u ${NTP_SERVER} >> /tmp/ntp
        ${NTPDATE} -u ${NTP_SERVER} >> /tmp/ntp
chmod +x /usr/local/bin/driftcheck
The reason we want to look for * is that according to the
NTP troubleshooting page, it is the source you are currently synchronized to.
The cron file will be the same as what we had in the hardware clock session, so I decided not to copy it.
And that is pretty much it. Remember they are just ideas. Those scripts should work as is but you can/should customize/adapt them. If you have any questions, concerns, or just want to confuse me, do leave a message.



Dalek said...

ntpq in CentOS is in /usr/sbin/ntpq. Sorry if I confused/frustrated anyone because of that.

Dalek said...

Found two errors in the ntp code. Corrected.

chiranjeevi said...

Hello Dalek,

I have Similar issue, where After my printer goes to powersave mode for more than 2 hours then if try to login to my printer using kerberos authetication. I am facing time skew error. Untill i reboot the device this error is coming, After reboot this error wont come if device wont go to power save mode.

My Kerberos server is in windows.
Printer uses linux OS.

Dalek said...

AFAIK, ntp has a skew limit: if the time difference between the host and the ntp server is too large, it won't try to adjust. That said, you can configure ntp in a host to ignore this limit. Problem is that left to its own devices ntp will take its sweet time to slowly bring the host in sync with the ntp server; in my tests I have seen it taking 14 days(1) to sync a host that was a couple of hours off.

This is exactly the reason I wrote this. If the printer indeed runs Linux, you should be able to upload something like what I wrote. Some hacking might be required to login to the printer's console and/or upload script