Tuesday, December 31, 2013

Notes on resetting and connecting to a Juniper router

This is another of those notes I wrote primarily to myself. It has to do with a Juniper appliance, namely a SSG5, which runs ScreenOS. I had some issues with its configuration, as I screwed up and could not log into it from either the network port or console. It felt it was high time to wipe and reconfigure the little guy.

Juniper has some notes on doing the deed, but there are a few things I would like to mention:

  1. You really want to do this resetting dance while the outer is not connected to any network. You know, just in case someone does recognize a router in default mode and have a field day.

  2. Having a good DB-9 RS-232-to-usb cable makes all the difference. I would strongly recommend one using the FTDI Chipset. Without that you might end up rather frustrated. There are a lot of companies, FTDI itself included, making such cables. For the lazy and curious amongst you, the one I personally own is the Sabrent USB2-to-RS-232 cable, model CB-FTDI.

  3. Find something convenient to reach the button, and a way to hold the router in place. When I first tried it, I used a trusty paperclip to press the reset button on the back. The brilliant (at the time) idea but was that if I could hold both the router and paper clip with one hand, I then would be able to see the lights on the front of the router. It would work fine if I was holding router with its back towards me. In real life, with me trying to see its blinking light, the paper clip kept sliding off the reset button, lodging itself between the button and the board (I think; I haven't opened it). What worked for me was a mechanical pencil. Its (7mm) tip was thick enough to just fit the reset button hole and its body felt just right to hold from the back.

  4. Resetting the router turned to be close enough to what was described in Juniper notes on resetting this router, but not exactly. Specifically,

    1. When you press the reset button, hold it until it starts blinking orange. Until that happens, just keep on pressing the button.
    2. Once it starts bliking orange, let it go. It will go green.
    3. Wait 2-4 seconds and then press the reset button again. The exact time might need some practice; in my case it was more like 3s. You know you got it right because once you press the button the LED will start blinking red.
    4. Now (led blinking red) release the router reset button and let it do the boot process continue.
  5. Know the serial port settings: 9600 8N1, the same as many Cisco devices. How you will configure that and connect to router is up to you. I have used screen (Linux/OSX/Others), minicom (Linux), and tip (Solaris), but I do know Windows also has a terminal program (HyperTerminal?) that comes with it that will work just fine. Or putty. I like putty.
  6. Running a packet acquisition package in a router LAN port is quite useful, specially if you have setup router to use a different network/IP than the default. When I first did the reset, the USB-to-serial cable I had was not working with the router's serial port.

    While I was waiting for the CB-FTDI cable mentioned above, I used wireshark (I was feeling lazy; nothign stopping you to use something fancier you already had... or write your own routine) to look at the traffic at the lan. Before it was reset, the router would keep sending arp requests in broadcast. And that would tell me which network it was configured to use, which was not the default ( Now, as soon as I successsfully reset the router, traffic went quiet during bootup but then I started seeing traffic from As the only device in that network -- my ethernet cable was just connected to the laptop doing the packet capture -- that told me the reset was successful. Then, I turned wireshark off, set some ip in the same network for the ethernet port connected to router, and checked if the web interface mentioned in the manual, and which I have never used, was there. Nope, all ports were closed. So I just had to wait for USP to deliver the USB-to-serial cable.

  7. As the manual and the link states, the default login and password are both nnetscreen.

Friday, November 08, 2013

Getting IP and MACs for hosts in a network without using nmap

Well, sometimes you need to find out which IPs in a given network are being used and what is the MAC addresses associated with them. Perhaps you want to make sure the machines are the ones that should be there (MAC spoofing notwithstanding). Or, as it happened to me before, you need to see which IPs your dhcp server have given out are actually being used. You can come up with other reasons too, even if they arenot honorable! Impress your friends! Be the life of the party!

Anyway, you can do some of this using nmap, but what if it is not available or you just want to use common Linux household commands? I have done something like this before using nslookup, but since dig is supposed to replace it, how about if we rewrite my old cold to use it? So, let's say you want to know what is in 10.0.0/24? You could do something like:

for i in $(seq 1 254)
  do ans=`ping -qnc 1 $subnet.$i | grep -c '100% packet loss'`
  [ "$ans" == 1 ] || echo "(+) $subnet.$i (`dig -x $subnet.$i +short` `arp -an $subnet.$i|awk '{ print $4 }'`) "

Or something a bit fancier, which would ask you to enter the first 3 octets of the network:

nonono() {
  printf "Enter subnet (only the first 3 octets): "
  read subnet
  for i in $(seq 1 254)
    do ans=`ping -qnc 1 $subnet.$i | grep -c '100% packet loss'`
    [ "$ans" == 1 ] || echo "(+) $subnet.$i (`dig -x $subnet.$i +short` `arp -an $subnet.$i|awk '{ print $4 }'`) "

Here is the code in action:

raub@desktop:~$ nonono
Enter subnet (only the first 3 octets): 10.0.0
(+) (router. 00:24:54:9s:2a:12) 
(+) (brownie.my.domain.com. 64:6b:b3:b0:76:e1) 
(+) (cookie.my.domain.com. 02:50:4d:c4:17:1a) 
(+) (tomato.my.domain.com. 02:50:4d:c4:17:1a) 
(+) (vmhost.my.domain.com. bc:5f:f4:ad:d7:8d) 
(+) (scan.my.domain.com. c0:ff:ee:4f:96:a9) 
(+) (pickles.my.domain.com. 00:9f:f3:46:23:90) 
(+) (pizza.my.domain.com. c0:ff:ee:67:1c:3c) 
(+) (desktop.my.domain.com. entries) 

Some stuff worth mentioning:

  1. Some machines have MAC beginning with c0:ff:ee. Those are VMs of mine running of vmhost; I use that so I can quickly identify them as VMs; you might want to follow the same idea if you have to deal with large amounts of VMs in server rooms.
  2. cookie and tomato have the same MAC. The reason is they are the same machine. I just configured its interface to do interface aliasing (eth0:0 and eth0:1 for instance) so one IP could be for a fileserver and another for a web server (really bad idea, which is why I thought you would like it). You can read about it in, say, here.
  3. Do note what we are calling subnet really isn't; it is just the first 3 octets in a class C network. In other words, it assumes your network is of the type a.b.c.0/24. You could change the code to handle any network provided the network IP and subnet mask; I will leave that as an exercise to you.
  4. If you want to run it in OSX, use arp -n instead of arp -an.

Tuesday, September 03, 2013

Upgrading SugarCRM command-line style

Where do we start? If you are here, chances are you know what SugarCRM is. For the lazy amongst you (like me!), the CRM in SugarCRM stands for Customer relationship management, which can be seen as a way to keep track of everything related to your customers: address, frequency and types of support questions, infomercials you sent to them, and so on.

If I can go on a detour, the concept is interesting because if you forget about the "customer" part of the acronym, you as an individual can apply it to other things. Like dealing with the IRS and other government organizations, your landlord (yes you should, trust me), job hunting, and so on. Ok, time to merge back with the main traffic.

The reason I am writing this is, as the title hints at, upgrading SugarCRM. There are two ways of upgrading this program. The first one is using the Upgrade Wizard and the other is using what they call a Silent Upgrade. Both start the same way:

  1. Find out the version you currently have installed and the one you want to upgrade to. At the time I wrote this, I had 6.5.9 installed and wanted to go to 6.5.14.
  2. Go to http://sourceforge.net/projects/sugarcrm/ and look for the upgrade file. In my case, it was in the http://sourceforge.net/projects/sugarcrm/files/1%20-%20SugarCRM%206.5.X/SugarCommunityEdition-6.5.X%20Upgrade/ directory. And the file I wanted was SugarCE-Upgrade-6.5.x-to-6.5.14.zip.
  3. Place the zip file somewhere the user running the web server can access. Since I am using Linux and am quite lazy, the location of choice was /tmp.

Now, if you were going to use the Wizard, you would follow something like http://www.siteground.com/tutorials/sugarcrm/sugarcrm-upgrade.htm... and then SugarCRM would then hang for an hour or so. I did try following the instructions in http://support.sugarcrm.com/04_Find_Answers/02KB/02Administration/100Install/Troubleshooting_Upgrade_Wizard_System_Check to no avail. I posted about this on their support site but so far no replies. So, time for a silent install.

Doing it command-line style

Don't get me wrong: using the command line to upgrade SugarCRM is not only to avoid sitting in your thumbs for an hour just to see it fail. It does not require the person doing the upgrade be a SugarCRM admin; if you can run as the web server user, you are good. You do not even need to be a user with root rights. Privilege separation in action babe!

So we go back to the steps we used to get the zip file (see above) and then add a few more:

  1. In the same directory you downloaded the upgrade file from, find a file called silentUpgrade-CE-X.Y.Z.zip where X, Y, and Z match the version you are upgrading to. In my case that was silentUpgrade-CE-6.5.14.zip.
  2. Download the silentUpgrade file you found in the previous step to the same convenient location you downloaded the upgrade file to. Once again in my case that was /tmp.
  3. Unzip the silentUpgrade file, but leave the upgrade one alone. If you are using /tmp as the staging directory, you should end up with something like this
    root@sugarcrm:/tmp# ls -lh /tmp/
    total 4.4M
    -rw-r--r-- 1 root     root      25K Jun 21 02:01 silentUpgrade_dce_step1.php
    -rw-r--r-- 1 root     root      33K Jun 21 02:01 silentUpgrade_dce_step2.php
    -rw-r--r-- 1 root     root     4.4K Jun 21 02:01 silentUpgrade.php
    -rw-r--r-- 1 root     root      42K Jun 21 02:01 silentUpgrade_step1.php
    -rw-r--r-- 1 root     root      21K Jun 21 02:01 silentUpgrade_step2.php
    -rw-r--r-- 1 root     root     4.9K Jun 21 02:01 SILENTUPGRADE.txt
    -rw-r--r-- 1 root     root     4.2M Aug 28 09:59 SugarCE-Upgrade-6.5.x-to-6.5.14.zip
    Note that I cheated here and am running as root. You can unzip those files as yourself as long as you have permission to set their group id to the web server user (in Ubuntu that would be www-data).
  4. Run silentupgrade, telling it where to store its log file. Just to be different, I decided my log file shall be /tmp/sucre.log:
    sudo -u www-data php -f /tmp/silentUpgrade.php /tmp/SugarCE-Upgrade-6.5.x-to-6.5.14.zip /tmp/sucre.log /var/www/sugar/ admin
  5. If successful, you should see a message like this:
    ***************This Upgrade process may take sometime***************
    *************************** SUCCESS*********************************
    ******** If your pre-upgrade Leads data is not showing  ************
    ******** Or you see errors in detailview subpanels  ****************
    ************* In order to resolve them  ****************************
    ******** Log into application as Administrator  ********************
    ******** Go to Admin panel  ****************************************
    ******** Run Repair -> Rebuild Relationships  **********************
    If you see an error, check the log file you created (in this case /tmp/sucre.log) for clues.

If you are curious, the actual running of silentupgrade took under a minute once all the files were properly downloaded and unzipped. I don't know about you but to me that is a big improvement from waiting an hour to find out it did not work.


Tuesday, August 27, 2013

Fail2ban and RedHat/CentOS

Fail2ban is another neat intrusion detection program. It monitors log files for suspicious access attempts and, once it has enough of that, edits the firewall to block the offender. The really neat part is that it will unban the offending IP later on (you define how long); that usually will suffice to your garden variety automatic port scanner/dictionary attack but also would give hope to your user who just can't remember a password. There are other programs out there that will deal with ssh attacks, but fail2ban will handle many different services; I myself use it with Asterisk, mail, and web just to name a few.

But, you did not come here to hear me babbling; let's get busy and do some installing, shall we?

Installing fail2ban in RedHat/CentOS

For this example I will be using CentOS 6. YMMV.

  1. Get required packages. Need jwhois (for whois) from base and fail2ban from, say, epel or your favourite repository
    yum install jwhois fail2ban --enablerepo=epel

    whois is needed by /etc/fail2ban/action.d/sendmail-whois.conf, which is called
    by /etc/fail2ban/filter.d/sshd.conf.

    You will also need ssmtp or some kind of MTA so fail2ban can let you know that it caught a sneaky bastard. I briefly mentioned about ssmtp in a previous post; seek and thou shalt find.

  2. Configure fail2ban.
    1. Disable everything in /etc/fail2ban/jail.conf. We'll be using /etc/fail2ban/jail.local:
      sed -i -e 's/^enabled.*/enabled  = false/' /etc/fail2ban/jail.conf
    2. Configure /etc/fail2ban/jail.local. For now, we will just have ssh enabled
      HOSTNAME=`hostname -f`
      cat > /etc/fail2ban/jail.local << EOF
      # Fail2Ban jail.local configuration file.
      actionban = iptables -I fail2ban- 1 -s  -m comment --comment "FAIL2BAN temporary ban" -j DROP
      # Destination email address used solely for the interpolations in
      # jail.{conf,local} configuration files.
      destemail = raub@kudria.com
      # This will ignore connection coming from our networks.
      # Note that local connections can come from other than just, so
      # this needs CIDR range too.
      ignoreip = $(dig +short $HOSTNAME)
      # ACTIONS
      # action = %(action_mwl)s
      # JAILS
      enabled = true
      port    = ssh
      filter  = sshd
      action   = iptables[name=SSH, port=ssh, protocol=tcp]
                 sendmail-whois[name=SSH, dest="%(destemail)s", sender=fail2ban@$HOSTNAME]
      logpath  = /var/log/secure
      maxretry = 5
      bantime = 28800
      Note we are only whitelisting the host itself. You could whitelist your lan
      and other machines/networks if you want. Jail is a fail2ban term that defines a ruleset you want to check for, and ban as needed.
    3. Decide where you want fail2ban to log to. That is done in /etc/fail2ban/fail2ban.local using the logtarget variable. Some possible values could be
      cat > /etc/fail2ban/fail2ban.local << EOF
      # logtarget = SYSLOG
      logtarget = /var/log/fail2ban.log
      The file /etc/fail2ban/fail2ban.conf should provide you with examples on how to set that up.
  3. Enable fail2ban
    service fail2ban restart
    chkconfig fail2ban on
    If you now do
    chkconfig --list fail2ban
    you should then see
    fail2ban        0:off   1:off   2:on    3:on    4:on    5:on    6:off
    And then check the fail2ban log you defined just before for any funny business. If you have set it correctly, you should see an email to destemail saying fail2ban started. Now, you will get one email per jail. So, if you just did the default (ssh), you will get one email that looks like this:

    The jail SSH has been started successfully.

    When fail2ban bans someone, you will receive an email that looks like this:

    The IP has just been banned by Fail2Ban after
    3 attempts against ASTERISK.
    Here are more information about
    % This is the RIPE Database query service.
    % The objects are in RPSL format.
    % The RIPE Database is subject to Terms and Conditions.
    % See http://www.ripe.net/db/support/db-terms-conditions.pdf
    % Note: this output has been filtered.
    %       To receive output for a database update, use the "-B" flag.
    % Information related to ' -'

    Note that it is not the SSH jail but the ASTERISk one; I just want to show a
    different example. Also, the stuff before the banned message is from whois.

    If you do iptables -L, you will see which rule fail2ban added to iptables:

    Chain fail2ban-SSH (1 references)
    target     prot opt source               destination
    DROP       all  --      anywhere
    RETURN     all  --  anywhere             anywhere

    Note it creates a chain for each jail.


Installing ssmtp in RedHat/CentOS

I like ssmtp, and yes it has an extra "s" compared to smtp server. There are a lot of MTAs out there with tons of features and tweaks and so on. I myself setup and deployed postfix in quite a few organizations and never had reason to regret it. But, postfix, sendmail, and all the others are full-fledged enterprise level MTAs. Sometimes all you need is to be able to send a couple of emails from servers whenever they have something interesting to tell you ("who is banging my ssh port?" "where's my tea?"). Take CentOS, for instance. If you are not careful, it will put postfix in every single desktop you install it on. Who needs postfix in their desktop when they probably have access to a perfectly good server class mail server thingie (which could even be postfix, mind you)?

So, enter ssmtp. It is small. You can even say it is rather limited and not particularly secure. But, if all you want is to know if some machine, say, finished a batch job or added a new user and can work around its limitations, you might want to check it out.

Installing ssmtp

As I mentioned in the title, we will be installing ssmtp in some RedHat/CentOS machine. That will require to configure the box to use additional repositories. You can check if your favourite one has it; I myself like epel and know ssmtp is there. repoforge (used to be called rpmforge) might have it too. How do you add a repository? I will let you figure out.

First thing we need to do is remove postfix just in case it is there:

yum remove postfix

Then we do some ssmtp adding:

yum install ssmtp --enablerepo=epel

Now we need to configure it. To do so you need a real smtp server; all ssmtp does is forward email to a proper MTA. Let's say your mail server is mail.domain.com. If you do not need to authenticate against it (it accepts unauthenticated email only from certain machines or your LAN), you could probable get away with this:

cat > /etc/ssmtp/ssmtp.conf  << EOF
# Config file for sSMTP sendmail  
hostname=$(hostname -f)

I will be rushing over the not-so-many settings in ssmtp because they are well explained somewhere else. What I will do is stop at the ones I think you might find interesting; at least I do:

  • UseSTARTTLS: if your smtp server can do TLS, by all means use it! Or SSL! ssmtp can handle that too; check the man pages!.
  • mailhub: this is as you guessed the address for the smtp server. The nice thing about it is that you can not only define the name but also the port, in case you are not using port 25. Here are a few examples:

    • mailhub=smtp.gmail.com:587
    • mailhub=mail
    • mailhub=host363.hostmonster.com:465
    Bonus points if you recognize the ports. Here is a full example assuming you are a cox customer (I based the setup on their email setup notes) connecting using port 25 (insecure) without any auth whatsoever:
    cat > /etc/ssmtp/ssmtp.conf << EOF
    # Config file for sSMTP sendmail
    hostname=$(hostname -f)
  • hostname: Nothing special here, besides you probably noticed I was lazy and let the computer tell what is its own name. If you do not get the FQDN, you probably need to check your configuration somewhere.

But, what if you need to autenticate to send an email (SMTP Auth)? Well, ssmtp allows some form of authentication. No kerberos or key pairs though, just plain old username and password. If you needed that, you would add something like this

to your /etc/ssmtp/ssmtp.conf file. Password will be in plaintext, so make sure only root can read this file. Note I used server1 as the username. Reason for that is I would think either each server would have its own email account or a commom server email account would be used. You decide.

When all of that is done, it is time to do some testing. First quick test would be to send an email to you (username@somedomain.com):

echo test | ssmtp -v username@somedomain.com

The -v option is verbose, so you can see what is going on and perhaps have some clues in case it all goes boink. A fancier ssmtp test (which should give you ideas on how to use it in your own scripts) would be

ssmtp $victim << EOF
From: test@$(hostname)
To: $victim
Subject: A longer test
Hi there!


Now go check if the email arrived in your mailbox and what your spam filter thought of it. Remember, the fancier example allows you to make your email a bit more proper... just sayin'!


Tuesday, June 18, 2013

Resizing a shared partition in a Synology DiskStation

I bought one of those devices, specifically the DS212j, to use as network storage (NAS for you alphabet soup lovers) for my home. I slapped two green 2TB Western Digital drives in it, set them up as a raid 1, created a 100GB (which probably means using the fake gigabyte, not the power-of-two one) NFS share partition for users, and off I went. Now, since I used its default clickety-click interface (it's pronounced web-based), when I ssh into the device (I am a bit of a command-line (CLI; I did not forget you) kinda bloke), I found it is using the standard Linux lvm (ok, this one I actually use) and named the logical volume I created as volume1, formatted as ext4 mounted as /volume_1. Even though I personally like to call my volumes after their function and try to avoid mounting stuff on the root, I can live with that. But, the point is ext4 and lvm involved. i.e. sane stuff. I like that. It also means that even though a lot of those devices use a scaled down version of linux, this one is not as scaled down as you would be led to believe.

This morning I received a email from the device. Since I want to make this post look long and important, I will post it here in glorious quadrovision:

Dear user,

The available space of volume 1 on spindizzy is running out; please delete some files to free space.

Total capacity: 98.43 GB
Available capacity: 0.98 GB (1.00%)

Synology DiskStation

Hmmm, that sounds kinda bad. What should I do? Well, I am lazy. Do you remember when I mentioned the sane stuff Synology is using in this device? Let's do some exploring since I still need to fill more space:

spindizzy> pvs
  PV         VG   Fmt  Attr PSize PFree
  /dev/md2   vg1  lvm2 a-   1.81T 1.72T
spindizzy> vgs
  VG   #PV #LV #SN Attr   VSize VFree
  vg1    1   2   0 wz--n- 1.81T 1.72T
spindizzy> lvs
  LV                    VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  syno_vg_reserved_area vg1  -wi-a-  12.00M                                      
  volume_1              vg1  -wi-ao 100.00G                                      

So, the entire raid (minus whatever the device needs to do its thing) is a single physical volume which is allocated as a single volume group (cleverly called vg1, inside which is our logical volume. And, as the email said and df -h can show,

/dev/vg1/volume_1        98.4G     97.4G    916.5M  99% /volume1

rather full. Well, how about if we take care of that lvm-style?

spindizzy> lvextend -L +100G /dev/vg1/volume1
  Logical volume volume1 not found in volume group vg1
spindizzy> lvextend -L +100G /dev/vg1/volume_1
  Extending logical volume volume_1 to 200.00 GB
  Logical volume volume_1 successfully resized
spindizzy> resize2fs /dev/vg1/volume_1
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/vg1/volume_1 is mounted on /volume1; on-line resizing required
old desc_blocks = 7, new_desc_blocks = 13
Performing an on-line resize of /dev/vg1/volume_1 to 52428800 (4k) blocks.
The filesystem on /dev/vg1/volume_1 is now 52428800 blocks long.

spindizzy> df -h              
Filesystem                Size      Used Available Use% Mounted on
/dev/md0                  2.3G    425.6M      1.8G  19% /
/tmp                    121.8M    264.0K    121.5M   0% /tmp
/dev/vg1/volume_1       196.9G     97.4G     99.2G  50% /volume1

What I did was to add an extra 100G, effectively doubling its size, to the logical volume volume_1. And all that was done live. Exciting huh? For those of you who do not dabble with lvm a lot, one of its nicest features is that you can increase the size of a logical volume life, without needing to unmount it first. All you need is to have some free space in the volume group (the VFree column). Going the other way around is a bit more challenging, for you need to umount the volume first, but can be done. I will later write an article on monkeying with lvm, I promise (remind me!).

Some of you might be like big deal, you could probably have done that using the web interface, just like in many other equivalent devices. What's so special about the a Linux-based network storage thingie? Well, the fact they can (either from factory or by adding the required packages) use lvm means I do not need to recreate a partition whenever I need more space, which was a problem with other devices I had. And, I can not only take care of that through the command line instead of needing a web browser, but also I could write a script to do that for me. Are they the only ones doing it? I doubt, but it reminds me why when shopping for a NAS I look for one that runs Linux in some shape or form.

Tuesday, March 19, 2013

On directory/folder and group ownership

Easy problem here: we have this directory, say /export/projects/web which is supposedly owned by the group developers,
raub@banana:~$ ls -lhd /export/projects/web
drwxrwxr-x 19 bob developers 4.0K 2013-03-19 13:36 /export/projects/web
The idea is that it is a shared folder, a place the developers can put share files amongst themselves without others being able to change/delete them. In other words, we want any file or directory created inside projects to inherit its group ID. At east that is the idea. In reality when any member of that group creates a file there, it is owned by that user's default group, not by developers.
raub@banana:~$ touch /export/projects/web/here
raub@banana:~$ ls -lh /export/projects/web
-rw-rw-r--. 1 raub raub       0 Mar 19 13:15 here
We can do something about it. First we set the setgid bit to make sure /export/projects/web is owned by developers:

raub@banana:~$ chmod g+s /export/projects/web
raub@banana:~$ ls -lhd /export/projects/web
drwxrwxr-x 19 bob developers 4.0K 2013-03-19 13:36 /export/projects/web
raub@banana:~$ ls -lh /export/projects/web
-rw-rw-r--. 1 raub developers 0 Mar 19 13:15 here

Then, we should find all files in that directory with different groups and set them to be owned by developers:

raub@banana:~$ for i in `find /export/projects/web -type f ! -group developers`; do chown :developers $i; done

Did it work? Let's find out!

raub@banana:~$ touch /export/projects/web/here
raub@banana:~$ ls -lh /export/projects/web
-rw-rw-r--. 1 raub developers 0 Mar 19 13:15 here
-rw-rw-r--. 1 raub developers 0 Mar 19 13:15 there



Friday, February 01, 2013

Grep-based decisions

So I had an interesting problem today: I want to have a script that will decide what to do based
on whether a file has something or not. Specifically, I am running libvirt with KVM/QEMU and wanted to know if a given vm client was configured with PC passthrough. Now, there are a ton of ways to do that, but I wanted to do it using Bourne or bash and grep. You know, something like

root@vmhost:~# virsh dumpxml vmclient | grep "type='pci' managed='yes'"

since if it has that pattern, it is passing PCI through. What I need then is to have grep look for that pattern and give me back some kind of return code that tells me if it is there or not.

At first I thought grep -q (quiet), as in

virsh dumpxml vmclient | grep -q "type='pci' managed='yes'" | echo $?

would work (you would need then to fish the return code using $?, which is why I put the echo $?). After all we only care whether it finds it or not. But the thing is that it would return 0 no matter it found the pattern or not. Major bummer.

Now, grep has this -c option that would count how many times it finds the pattern. Which probably leads to if it does not find it, the count should be 0, right? Let's test it out:

root@vmhost1:~# virsh dumpxml vmclient|grep -c "type='pci' managed='no'" 
root@vmhost1:~# virsh dumpxml vmclient |grep -c "type='pci' managed='yes'" 

I think you can see where I am going with this. So, here is a scaled-down version of what I ended up writing:

have_pci=$(virsh dumpxml ${vmclient} | grep -c "type='pci' managed='yes'" )

if [ "$have_pci" -eq 0 ]
   echo "${vmclient} is PCI passthrough-free. Rejoy!"
   # Do something interesting
   echo "${vmclient} has PCI passthrough. Be nice to it."
   # Do something interesting

As you can see it puts our little research to work by detecting if the vm whose name you provide as command line argument ($1) is configured to do PCI passthrough or not. Since this is just a sample skeleton code, it just tells us whether it does or not (the if statement); we could then use that info to do something else. In fact, I am using that test in a larger script I wrote; I did not want to include it here because I wanted to focus on one thing.

I hope this might be useful to someone out there. I probably should talk about the pci passthrough madness, but let's leave that for another episode, shall we?

Monday, January 28, 2013

Restoring time on sleeping (linux) vms

One of the most annoying issues in a VM is keeping accurate time.When a vm client is saved/paused and then restored, its clock will be still set to whatever time was when it was paused. That can lead to many issues including not being able to login using Kerberos. Rebooting the machine will force the clock to be reset, but what if we do not want to (or cannot) reboot? What we could use a script that will monitor the drift and if it is too far -- say one minute off -- it will automagically adjust the clock in the vm client.
There are many virtualization packages out there, some of which (in no order) are VMWare ESX/ESXi, Virtual Box, kvm, and Microsoft Virtual Server. The first three I personally have used; if you want to see a more complete list I would suggest to try the Wikipedia entry on Hypervisors. Suffice to say some of those programs allow host machine (the physical machine running the program) expose its clock to the guest machine (or vm), others do not. So let's see how we can check and set the clock based on whether the guest can see the host clock or not. For this discussion I will stick to Linux because I feel lazy today.

Using the vm host clock

Let's say the vm program of your choice can pass the vm host's clock to the client (I do know that KVM can do that and believe VMWare ESXi can too), you could write something like this:
cat > /usr/local/bin/driftcheck << 'EOF'
### Detect drift in the vm client vh most. If it is massive; adjust
### it.


# Max drift in seconds. Kerberos does not like time offsets > 5min, 
# so we set it to 1m = 1*60s
MAX_DRIFT=` echo "1*60" | bc -l`

VMHOST_TIME=`date +'%s' -d "$(${HWCLOCK} -r -u | cut -d' ' -f-7)"`
MY_TIME=`date +%s`
DRIFT=$( echo "${VMHOST_TIME} - ${MY_TIME}" | bc | tr -d -)

if [ "${DRIFT}" -gt "${MAX_DRIFT}" ]
        ${HWCLOCK} -s -u
chmod +x /usr/local/bin/driftcheck
What it does is get the hardware clock of this vm client, which actually is the vm host's clock. One thing this requires is the vm client is configured to take the vm host's clock in utc time. YMMV here, but if you are using kvm and libvit, you would have a line like this
<clock offset="utc"></clock>
somewhere in the xml file defining the client.
The commented lines inside the if statement are there just so when you test it out you can see what is going on. For production you probably want leave them commented out
Now, we would probably want to to have it being called often but not really crazy. So, for now let's say we create a cron job to run driftcheck every 5 minutes?
cat > /etc/cron.d/driftcheck << 'EOF'
*/5 * * * * root /usr/local/bin/driftcheck > /dev/null 2>&1
Of course, you should adjust it to fit your needs. Do note I put driftcheck in /usr/local/bin/; it just felt like a nice place this season.

Using NTP

As we mentioned above, sometimes we cannot (or will not) use the clock off the vm host. If we are using a ntp server, we are good. Ok, you might argue but, if the drift/skew is too long, ntpd will refuse to adjust the clock. And, even if we force it, it will take hours or even days!. Not if we nudge it a bit by using an old friend, ntpdate:
cat > /usr/local/bin/driftcheck << 'EOF'
### Detect drift against a reliable time source. If it is massive;
### instead of relying on ntp (which will not work), do something
### a bit more drastic


# Max drift in ms. Kerberos does not like time offsets > 5min, so
# we set it to 1min = 1*60*1000ms
MAX_DRIFT=` echo "60*1000" | bc -l`

# Find the ntp server we are using in this host
NTP_SERVER=`sed -ne '/^server/p' /etc/ntp.conf | awk '{ print $2 }'| head -1`
# Get current drift
DRIFT=`${NTPQ} -p ${NTP_SERVER}|grep '*'|awk '{print $9 }'`
[ -z $DRIFT ] && DRIFT=`${NTPQ} -p |tail -n +3| awk '{print $9 }'`
DRIFT=$( echo "${DRIFT}/1" | bc | tr -d -)

if [ "${DRIFT}" -gt "${MAX_DRIFT}" ]
        echo ${NTPDATE} -u ${NTP_SERVER} >> /tmp/ntp
        ${NTPDATE} -u ${NTP_SERVER} >> /tmp/ntp
chmod +x /usr/local/bin/driftcheck
The reason we want to look for * is that according to the
NTP troubleshooting page, it is the source you are currently synchronized to.
The cron file will be the same as what we had in the hardware clock session, so I decided not to copy it.
And that is pretty much it. Remember they are just ideas. Those scripts should work as is but you can/should customize/adapt them. If you have any questions, concerns, or just want to confuse me, do leave a message.