Wednesday, December 31, 2014

Restrictive rsync + ssh

Some of you have probably used rsync to backup files and directories from one machine to another. If one of those machines is in in an open network, you probably are doing it inside a ssh tunnel. If not, you should. And, it is really not that hard to do.

Let's say you wanted to copy directory called pickles inside the user bob's home directory at, which is a Linux/Unix box out in the blue yonder. If you have rsync
installed (most Linux distros do come with it or offer it as a package), you could do something like:

rsync -az -e ssh /path/to/backup/dir/

The -e ssh is what tells rsync to do all of its monkeying about inside a ssh tunnel. And, when you run the above statement, it will then ask for bob's password and then proceed to copy the directory
~bob/pickles inside the directory /path/to/backup/dir. Which is great but I think we can do better.

Look Ma! No passwords!

First thing I want to get rid of is needing to enter a password. Yeah, it was great while we are testing it, but if we have a flyingmonkey loose in the internet, I would like to make it a bit harder for someone to break into it; I think I owe that to the Wicked Witch of the West.

The other reason is that then we can do the rsync dance automagically, using a script that is run whenever it feels like. In other words, backup. For this discussion we will just cover backup as in copying new stuff over old stuff; incremental backup is doable using rsync but will be the subject for another episode.

So, how are we going to do that? you may ask. Well, ssh allows you to authenticate using public/private key pairs. Before we continue, let's make sure sshd in flyingmonkey is configured to accept them:

bob@flyingmonkey:~$ grep -E 'PubkeyAuthentication|RSAAuthentication' /etc/ssh/sshd_config 
#RSAAuthentication yes
#PubkeyAuthentication yes
#RhostsRSAAuthentication no
Since PubkeyAuthentication and RSAAuthentication are set to yes, we are good to go. Now if flyingmonkey runs OSX, you would want to use /etc/ssh/sshd_config instead.

A quick note on ssh keys: they are very nice way to authenticate because they make life of whoever is trying to break into your machine rather hard. Now, just guessing the password does not do you much good; you need to have the key. And, to add insult to injury, you can have a passphrase in the key itself.

Enough digressing. The next step is to create the key pair. The tool I would use in Linux/Solaris/OSX is ssh-keygen because I like to do command line thingies. So, we go back to the host that will be rsnc'ing to flyingmonkey and create it by doing

ssh-keygen -b 4096 -t rsa -C backup-key -f ~/.ssh/flyingmonkey
which will create a 4096 bit (a lot of places still use 1024 and some now are announcing they have new state-of-the-art ultra secure settings of 2048 bits. So unless your server can't handle it, use 4096 or better) RSA key pair called flyingmonkey and in your .ssh directory:
raub@backup:~$ ls -lh .ssh/flyingmonkey*
-rw------- 1 raub raub 3.2K Dec 31 11:30 .ssh/flyingmonkey
-rw-r--r-- 1 raub raub  732 Dec 31 11:30 .ssh/
During the creation process, it will ask for a passphrase. Since we are going to have a script using this keypair, it might not make sense to have a passphrase associated to it. Or it might, and there are ways to provide said passphrase to script in some secure way. But this post is getting long so I will stick to the easy basic stuff. If you remember, we said this is a public/private key authentication; that means it uses to keys: public and private. The public is taken to the machine you want to ssh into while the private stays, well, private. Let's look at the public key (it is a single line):
raub@backup:~$ cat .ssh/ 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACACsgpy/ihq31kv+Zji6Eknr46nbyx38uPE54X3STbaNC8oCheulVk
se9ZDx backup-key
You probably notices the backup-key string on the end of the file; we put it there using the -C (comment) option. Usually it writes the username@host, which is OK most of the times but I wanted something to remind me of what this key is supposed to do. You can change it later if you want.

So we go back to flyingmonkey and place the contents of the public key, in ~bob/.ssh/authorized_keys by whatever means you want. cut-n-paste works fine.

cat >> authorized_keys
also does a great job. Or you can even use ssh-copy-id if you feel frisky. Just remember the contents of is a single line. Of course, if flyingmonkey is a windows machine, you will do something else probably involving clicking on a few windows, but the principle is the same: get the bloody key into the account in the target host you want to connect to.

Once that is done, connect using ssh by providing the key

ssh -i .ssh/flyingmonkey

Can you login fine? Great; now try rsync

rsync -az -e 'ssh -i .ssh/flyingmonkey' /path/to/backup/dir/
Do not continue until the above works. Note in a real script the private key will probably be somewhere only the user which runs the backup script can access.


So far so good. We eliminated the need to use a password so we can write a script to use the above. But, we can still ssh using that key to do other things besides just rsync. Time to finally get to the topic of this post.

If the IP/hostname of the host you are backing up flyingmonkey from does not change, you can begin by adding that to the front of the ~bob/.ssh/authorized_keys entry for the flyingmonkey public key. Now, if the backup server is in a private/NATed lan, you want to use the IP for its gateway. In this example, let's say we all all inside a private lan and the IP for backup server is

from="" ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACACsgpy/ihq31kv+Zji6Eknr46nbyx38uPE54X3STbaNC8oCheulVk
se9ZDx backup-key
This is a small improvement: only host that can connect is the one with this IP, be it legit or faking that. Test it.

Next step is specify which commands that can be run when connected using this key. And that one again will require playing with ~bob/.ssh/authorized_keys. This time we will specify the command:

from="",command="/home/bob/.ssh/validate-rsync" ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACACsgpy/ihq31kv+Zji6Eknr46nbyx38uPE54X3STbaNC8oCheulVk
se9ZDx backup-key
And define validate-rsync as
cat > .ssh/validate-rsync << 'EOF'
rsync\ --server\ --sender\ -vlogDtprze.iLsf\ .\ pickles)
echo "Rejected"
chmod +x .ssh/validate-rsync
And this is where it get really exciting. All that validate-rsync is doing is seeing if the command being sent is not only an rsync command but a specific one. Once we figure out how to get the proper SSH_ORIGINAL_COMMAND, we can change the line
rsync\ --server\ --sender\ -vlogDtprze.iLsf\ .\ pickles)
to what it needs to be to match our backup script and test. Note that if you change the rsync statement, you will need to change the case.

Friday, December 26, 2014


Let's say you want to have an account you can ssh into but only run very specific commands in it. A good way to achieve that is to write a wrapper script that is called from your authorized_keys file. So you could have a wrapper that looks like this:

    "/usr/bin/rsync "*)
        echo "Permission denied."
        exit 1
But, what if you really want to be really precise on the command? Using the above example, not only running rsync but also specifying the path and the arguments? You could cheat and find what the command you are sending is supposed to look like by replacing (temporarily) your wrapper script with this

DEBUG="logger" # Linux
#DEBUG="syslog -s -l note" # OSX

if [ -n "$SSH_ORIGINAL_COMMAND" ]; then
        $DEBUG "Passed SSH command $SSH_ORIGINAL_COMMAND"
elif [ -n "$SSH2_ORIGINAL_COMMAND" ]; then
        $DEBUG "Passed SSH2 command $SSH2_ORIGINAL_COMMAND"
        $DEBUG Not passed a command.
Then you run the ssh command and see what it looks like in the log file. Copy that to your original wrapper script, and you are good to go. So
ssh -t -i /home/raub/.ssh/le_key raub@virtualpork echo "Hey"
Results in
Dec 26 13:34:05 virtualpork syslog[64541]: Passed SSH command echo Hey
rsync -avz -e 'ssh -i /home/raub/.ssh/le_key' raub@virtualpork:Public /tmp/backup/
results in
Dec 26 13:28:17 virtualpork syslog[64541]: Passed SSH command rsync --server 
--sender -vlogDtprze.iLs . Public
The latter meaning our little wrapper script would then look like
    "rsync --server --sender -vlogDtprze.iLs . Public")
        echo "Permission denied."
        exit 1

Saturday, December 13, 2014

Adding a disk to a libvirt/kvm vm client

So you have a kvm virtualization infrastructure which you, being lazy like me, manage using libvirt. You then created a vm client with the virtual disk partitioned just right (using LVM or not; pick your poison). But later you realized you needed another disk. Maybe it is because you need to have data encrypted to meet HIPAA or PCI requirements. Maybe you just want to keep your data in a different drive. The point is you need another drive and don't want to/should not resize the current virtual disk associated with the VM.
So, first thing you do is create the new disk. If you are using KVM, chances are you have been using qcow2 disks. Or maybe vmdk, iSCSI, or, like me, lvm. No matter what, you probably know how to create a new disk, so I will assumed you took time to figure out how large it needs to be to fit your needs and created the little bastard. Because, as I mentioned before, I am lazy, I will say we are running libvirt in linux, with a vm client called vmclient and creating LVs to use as virtual disks. So, we need a 10GB virtual disk that we'll call data because we are friends with Captain Obvious.
lvcreate -L 10G -n data vmhost_vg0
creates as we know a 10GB lv as /dev/vmhost_vg0/data. To make it easier on us, we will shut down the vm client. Once that is done (do check it using virsh list --all, will you?), we then run
virsh edit vmclient
Note we could have created a properly configured xml file and fed into the config, but I do forget how to make it properly configured so I prefer to cheat.
When the config file is open, look for the disk entries; they should look like this:
    <disk device="disk" type="file">
      <driver cache="none" io="native" name="qemu" type="raw"/>
      <source file="/dev/vmhost_vg0/vmclient_boot"/>
      <target bus="virtio" dev="vda"/>
      <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci"/>

which is how our original virtual disk is configured to be used in this vm client. Of course, if you were using qcow, vmdk, or something else, the entry might look a bit different; make a note of how it differs from my example and move on. This is why I said I like to cheat: I can see how the old disk was defined and copy that instead of trying to figure out how to do it.

Now you need to add the new drive. As you guessed from the above, we will do it by copying the above entry and changing it a bit. Now, we do not need to copy everything; we just need enough so virsh knows what we want. It will fill the blanks. So, after we copy the relevant bits below the already defined disk and change the drive name, we would have something like this

    <disk device="disk" type="file">
      <driver cache="none" io="native" name="qemu" type="raw"/>
      <source file="/dev/vmhost_vg0/data"/>
      <target bus="virtio" dev="vda"/>

Save it; it should close without issues. If not, got back and see if you missed something. If it did not bark, use virsh dump xml vmclient to see your handiwork. Mine looks like this:

    <disk device="disk" type="file">
      <driver cache="none" io="native" name="qemu" type="raw"/>
      <source file="/dev/vmhost_vg0/vmclient_boot"/>
      <target bus="virtio" dev="vda"/>
      <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci"/>
    <disk device="disk" type="file">
      <driver cache="none" io="native" name="qemu" type="raw"/>
      <source file="/dev/vmhost_vg0/data"/>
      <target bus="virtio" dev="vda"/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/%gt

As you can see, it added the pci bus address entry on its own. Now all we need to do is reboot the vmclient, format the new disk on whatever way we want, and start using it.

Friday, November 07, 2014

Running a web app from command line using powershell

Had a interesting task: we have an web-based program that is run by connecting to a given website. Arguments are passed in the url. So far so good. But we want to run the program at scheduled times, and that means it must be run automagically. And the host that will be used to call the program runs Windows. How about that for requirements?

I decided that would be a good exercise in powershell. Why doing it in powershell instead of something in visual-something-or-another? Well, the honest answer is that I suck at those visual thingies. However, instead of telling the truth I will instead say: since I came from a Linux/Unix background, I would prefer to use some sort of script if I can and save the bloatware associated with building a program for another time. If I were to do this in the Linux camp, I would probably do it in Bourne, tcsh, or bash. Or python if I felt it required some degree of sophistication. On the Windows camp, however, the closest thing to those shells is Powershell. And if you have used it, it ain't bad at all.

This might be a bit long since I will go over how this evolved; go grab some popcorn.

In Windows you can call a web browser from the command line feeding it the url for whatever site you want it to open. Something like this:

explorer ""

would pop up this very site on your default browser. Problem with that is, well, you are loading a web browser. There are times you do want to pop a browser, but if you want to run something in the background that might not be a good idea. After all, would you want to be typing the most amazing program ever written in Ook! when suddenly a web browser comes up taking you to some website you never heard of? You know, like spammers love to do, but instead point to a page without boobs or free money offers. Even if you know it is the web-based program mentioned above being called, that would get old really quickly. So, we need another option.

Powershell allows you to call some .net libraries and classes. One of those classes is System.Net.WebRequest. Here's an example of how we could use it:

$request = [System.Net.WebRequest]::Create("")
$reply = $request.GetResponse()

The first line creates the request to connect, in this case, to my nagios server. But, that does not send any traffic yet. We need to send the request, which is the job of the last line. It also captures the reply sent by the server. Now what should I get by sending a carefully improper request like that? An error message. What kind of message, you may ask. Let's see what you would get back using netcat (from my Linux laptop):

raub@black:~$ nc -v "" 80
Connection to 80 port [tcp/http] succeeded!
HTTP/1.1 400 Bad Request
Date: Mon, 06 Oct 2014 05:18:50 GMT
Server: Apache/2.2.15 (CentOS)
Content-Length: 311
Connection: close
Content-Type: text/html; charset=iso-8859-1

400 Bad Request

Bad Request

Your browser sent a request that this server could not understand.

Apache/2.2.15 (CentOS) Server at Port 80

As you can see we get a 400 Error. So, the above commands should spit back a similar message. Note at this point I do not care about getting a proper reply; I just want to connect to the web server.

Once I do that, we should take a look at the Apache access.log: - - [)6/Oct/2014:00:18:50 -0500] "GET HTTP/1.0" 400 311 "-" "-"

As you can see, the logs register when netcat reached the server. Now, let's see if we can repeat the deed using powershell. I am going to run the two lines I mentioned above but as a powershell script, which I shall call gimmesite.ps1:

PS C:\Users\Administrator\dev> cat .\gimmesite.ps1
$request = [System.Net.WebRequest]::Create("")
$reply = $request.GetResponse()
PS C:\Users\Administrator\dev>

Let's try to run it:

PS C:\Users\Administrator\Documents\dev> .\gimmesite.ps1
.\gimmesite.ps1 : File C:\Users\Administrator\Documents\dev\gimmesite.ps1 cannot be loaded because running scripts is
disabled on this system. For more information, see about_Execution_Policies at
At line:1 char:1
+ .\gimmesite.ps1
+ ~~~~~~~~~~~~~~~
    + CategoryInfo          : SecurityError: (:) [], PSSecurityException
    + FullyQualifiedErrorId : UnauthorizedAccess
PS C:\Users\Administrator\Documents\dev>

That does not look very happy. As the error message tries to tell us, the windows box is setup not to run any powershell script. You would need to certify it, which is something I have yet to do. There is a workaround, however which is mentioned in this thread:

PS C:\Users\Administrator\dev> powershell -ExecutionPolicy ByPass -File .\gimmesite.ps1
PS C:\Users\Administrator\dev>

As you can see, no error messages this time since we asked for an exception. Did apache see out connection attempt?

==> /var/log/httpd/access_log <== - - [06/Oct/2014:00:23:50 -0500] "GET / HTTP/1.1" 200 - "-" "-"

It seems we are making progress. Next step is a small correction in the url we are using. You see, the site we are using to test out is, not So we edit the gimmesite.ps1 script and try again:

PS C:\Users\Administrator\dev> powershell -ExecutionPolicy ByPass -File .\gimmesite.ps1
Exception calling "GetResponse" with "0" argument(s): "The remote server returned an error: (401) Unauthorized."
At C:\Users\Administrator\dev\gimmesite.ps1:2 char:1
+ $reply = $request.GetResponse()
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : WebException

PS C:\Users\Administrator\dev>

The error message makes sense: you need login credentials to access my nagios page. If we remember the objective of the script we are trying to develop in this article -- a powershell script that runs a script in a webpage at certain intervals -- we realize we have no interest in any error message. So, we need to get rid of it. Now, powershell has some structures similar to Java and Python. The one of interest here is the try{}catch{} one. So, let's modify the script once more:

PS C:\Users\Administrator\dev> cat gimmesite.ps1
# Connect to some site we told it about
# Run it as
# powershell -ExecutionPolicy ByPass -File .\gimmesite.ps1

$source = ""
        # Let's try to reach site with our noodly tentacles:
        $request = [System.Net.WebRequest]::Create($source)
        $reply = $request.GetResponse()
        # I tried really hard to care about error messages but I failed
PS C:\Users\Administrator\dev>

I hope the comments help a bit understand what is going on. When we run that, error messages are caught by the catch{} statement. Since we do nothing with them (we could in a fancier program), they are never printed to the screen/stdin:

PS C:\Users\Administrator\dev> powershell -ExecutionPolicy ByPass -File .\gimmesite.ps1
PS C:\Users\Administrator\dev>

So we have a working script. How about running it at predetermined intervals? Windows has no cron command, but there are equivalent commands. In fact, powershell has a set of commands to schedule tasks... which I will not use. Since this blog entry is getting long and I am getting tired of typing, I will instead use schtasks, which has been around since Windows XP. Let me add a comment to the script describing how to use it:

PS C:\Users\Administrator\dev> cat gimmesite.ps1
# Connect to some site we told it about
# Run it as
# powershell -ExecutionPolicy ByPass -File .\gimmesite.ps1
# And from the cronjobbie
# Schtasks /create /tn "Connect to site" /sc daily /st 07:00 /tr "powershell -ExecutionPolicy ByPass -File .\gimmesite.ps1"

$source = ""
        # Let's try to reach site with our noodly tentacles:
        $request = [System.Net.WebRequest]::Create($source)
        $reply = $request.GetResponse()
        # I tried really hard to care about error messages but I failed
PS C:\Users\Administrator\dev>

The above comment shows how to setup a "schedule task" (fancy term for a cron job) that will be run every day at 7am. Since I want to test this, let's make it run every 10 minutes instead:

PS C:\Users\Administrator\Documents\dev> schtasks /create /tn "Connect to site" /sc minute /mo 10 /tr "powershell -execution policy bypass -file C:\Users\Administrator\Documents\dev\gimmesite.ps1"
WARNING: The task name "Connect to site" already exists. Do you want to replace it (Y/N)? y
SUCCESS: The scheduled task "Connect to site" has successfully been created.
PS C:\Users\Administrator\Documents\dev> schtasks /query|more

Folder: \
TaskName                                 Next Run Time          Status
======================================== ====================== ===============
Connect to site                          11/6/2014 12:46:00 PM  Ready

Folder: \Microsoft
TaskName                                 Next Run Time          Status
======================================== ====================== ===============
INFO: There are no scheduled tasks presently available at your access level.

You can see that it barked I already had created the task before. Since I want to change it, I told it to just replace it. You probably also noticed that the task is being identified by its name (the /tn option). I will not spend any time at all describing the different options available in schtasks; there are beautiful pages online describing that. Or you can do schtasks /query /?. One final point: the task name is rather important for it is how you refer to it if you have to manipulate it somehow. How about if we delete the task?

PS C:\Users\Administrator\Documents\dev> schtasks /delete /tn "Connect to site"
WARNING: Are you sure you want to remove the task "Connect to site" (Y/N)? y
SUCCESS: The scheduled task "Connect to site" was successfully deleted.
PS C:\Users\Administrator\Documents\dev>

I hope this is enough to get you started and give you evil ideas. I put a slightly more complex version of this script in github.

Thursday, July 10, 2014

Request Tracker and Admin users

Request Tracker, or RT for short, is described by wikipedia as a trouble ticket tracking system written in perl. Let's say you set RT up and tested it and put it into production. You created some queues to handle support question, nicely organized per topic, and the associated groups. However, as of now, the only user able to manage the system as a whole is the user root, which is automagically created when you first install RT. So far so good.

During a particularly sunny day, you are told are three users to whom you must grant all the same rights that root has. How can you do that?

Well, first thing you need to figure out is what is meant by having all the same rights that root has:

  1. Those 3 users can control every aspect of RT.
  2. Those users can control every aspect of certain queues.

They are a bit different and you might want to have that cleared up; you should make sure you will deliver what your manager wants even when the said manager is not exactly sure. This will require lots of tact/diplomacy but will avoid a lot of headaches later. Since I do not know the right answer, let's then talk about how accomplish both:

  1. Those 3 users can control every aspect of RT.

    This is actually the easiest one to accomplish, but also the most dangerous. I do believe on the policy of providing the least rights required to achieve a task when dealing with production environments. And that is why I would ask for some clarification. With that in mind, here are the steps:

    1. Log into RT as root
    2. Using the menu on the left, go to Configuration->Global->User Rights. This page is called Modify global user rights and should look a lot like this:
      Note we have 3 users here: John Root, Robert Root, and Enoch Root. Enoch Root is actually the name of the default root user; don't ask me why that name. If you look at John Root, you will notice under his Current rights is SuperUser with a check mark on its left. You will see the same with Enoch Root. That means both are global admins, or have full control of RT.

      If you want to make either of them no longer be global admins, just check the checkmark by SuperUser for the user in question and click on Modify User Rights on the bottom of that page. Now, if you, say, want to make Robert Root as superuser, you would then select SuperUser from his New Rights and click on Modify User Rights.

  2. Those users can control every aspect of certain queues. This is really a special case of the above.

    1. Using the menu on the left, go to Configuration->Queues and then select the queue in question.
    2. Now, click on the User Rights link on the menu on the top of the page.
    3. By now you will notice the page looks just like the Modify user rights shown before, but it applies only for this specific user. The available options are a bit different but I think you can figure out what they all do.

And that is all there is to it. When in doubt, remember you can create a test user to play around with the different rights and options.

Monday, April 28, 2014

Booting a ESXi VM from a .iso in a NFS share

First time I created a VM in Vmware's ESXi, I placed the .iso containing the install image of the operating system in the host I run vsphere client from. And then told the VM where to get that. Some info on this procedure can be fount at At first glance, it seemed to be quick and easy; I would even add for a small deployment it is quite convenient. However, but it was just cumbersome (too many hoops to make that work) and slow (it expects the vsphere box to be in a fast and reliable connection to the ESXi one, which might not be the case). It really makes sense to have the images either in the server itself or mounted on the server (fileshare). I was not looking forward to having the images in the server itself because:

  • It would probably require me to download them somewhere else and then upload to the server. That does not sound like a dealbreaker to most since if I can ssh into my ESXi, which I do, I can scp the files into it. Well, I do not believe on having a single device/program/app/thingie that does it all; you see, I do like the Rule of Simplicity from the Unix Philosophy. And that dovetails to the next item:
  • I have a perfectly good fileserver (ok, a Synology NAS box) thank you verymuch. Many of my VMs already NFS mount shares from it, or just have their entire disk in an iSCSI LUN from said NAS.
  • I might want to use those images with my other vm host, which runs KVM.

What I want is maybe a NFS share that can be mounted somewhere where I can download the .iso files to and then have it available read-only to the vm hosts (ESXi, KVM/libvirt, Vbox). Let's see if we can make that happen, shall we?


We first need to begin with the fileshare itself. It is being exported read-only as a NFSv4 fileshare from the NAS; we won't go over how to do that in this discussion. Since I could not find showmount or even mount in ESXi, let's assume I know what I am doing and believe the share I want is (I cheated and verified in a Linux box). We can add that to vmhost2 using the vsphere client:

Using the ISO

So now we did all this boring work, let's see if we can boot using a, well, boot .iso from the NFS share. So, On the vsphere client

  • Select the vmhost. In my case that is vmhost2.
    1. Select Configuration->Storage in the Hardware panel.
    2. Click Datastores and click Add Storage.

    3. Select Network File System and click Next.

    4. Enter

      • server name:
      • mount point folder name: /export/public (yep NFS3)
      • [x] Mount NFS read only
      • datastore name: public

    And that should result in a new datastore entry called public.
  • Select the vm client, which is called devcentos.
    1. select the vm in question, devcentos:
    2. Edit virtual machine properties->CD/DVD->Device Type->Datastore ISO File
    3. Hit Browse
    4. Datastores->public->ISOs->CentOS.iso
    5. Boot to BIOS. For some reason I have to manually select which device to boot. Even though most of the time the virtual hard drive is completely virgin, the bios does not failover to the ISO, Maybe that has been solved by now, but just learn this step... just in case.
    6. Turn CD player on on boot


Sunday, April 20, 2014

What that orange alarm LED light in a Juniper SRX router is trying to tell me?

If you have a Juniper SRX router thingie, you might have noticed the orange light glowing on it:

It is the alarm light, and could have been triggered by many reasons, like the one mentioned in>later post. On its defense, it is nice to know the silly router is upset about something. And, it sure beats a blinking 200W light or a blaring air raid siren. Still, it is staring at me with its deep unmoving orange eyes demanding attention. So, let's do some probulating, shall we?

When I asked it what's up, this is what it told me:

root@uranus> show system alarms 
2 alarms currently active
Alarm time               Class  Description
2014-03-23 10:42:35 EDT  Minor  Autorecovery information needs to be saved
2014-03-23 10:42:33 EDT  Minor  Rescue configuration is not set


After I saw that, it hit me like a, er, something heavy (please come up with something more original than the usual ton of bricks. I have never been hit by one and plan on staying that way) and unyielding: about that time I did a full wipe and reinstall! So, since it is being so nice to tell us what it wants, let's see about pleasing it. If we look at this thread in the juniper forums, we see the command to save the rescue configuration is:

root@uranus> request system configuration rescue save 

root@uranus> show system alarms                          
1 alarms currently active
Alarm time               Class  Description
2014-03-23 10:42:35 EDT  Minor  Autorecovery information needs to be saved


One down, one to go. Now, how to save the autorecovery info? I am going to punt and assume the command should be very similar to the one we used to save the rescue info. Like Cisco's IOS, you can use ? to see which arguments a give command take. So, we try

root@uranus> request system configuration ?    
Possible completions:
  rescue               Request operation on system rescue configuration
root@uranus> request system ?                 
Possible completions:
  autorecovery         Manage autorecovery information
  certificate          Manage X509 certificates
  configuration        Request operation on system configuration
  download             Manage downloads
  firmware             Upgrade or downgrade firmware
  halt                 Halt the system
  license              Manage feature licenses
  logout               Forcibly end user's CLI login session
  power-off            Power off the system
  reboot               Reboot the system
  scripts              Manage scripts (commit, op, event)
  services             Request service applications information
  set-encryption-key   Set EEPROM stored encryption key
  snapshot             Archive data and executable areas
  software             Perform system software extension or upgrade
  storage              Request operation on system storage
  zeroize              Erase all data, including configuration and log files
root@uranus> request system ?

Aha, we found autorecovery. Which arguments does request system autorecovery take?

root@uranus> request system autorecovery ?  
Possible completions:
  state                Manage autorecovery state information
root@uranus> request system autorecovery state ?
Possible completions:
  clear                Delete previously saved autorecovery state
  recover              Check for problems and recover state if needed
  save                 Save autorecovery state
root@uranus> request system autorecovery state ?

So it seems that request system autorecovery state save will do the trick. Let's try it then:

root@uranus> request system autorecovery state save 
Saving config recovery information
Saving license recovery information
Saving BSD label recovery information


and the orange light's gone! Another mystery solved...

Sunday, March 23, 2014

When upgrades go bad: Installing JunOS from USB in a SRX router

So, I screwed up pretty bad. I decided to upgrade the JunOS release in this Juniper SRX210 router to the one (at the time I type this) recommended by Juniper, 11.4R10.3. When it booted up after the install, it crashed during the boot process. Well, I could have spent the time kicking myself but I am doing this upgrade off-hours and I did account for things going badly in my downtime estimate. And, this router is part of a redundant router setup using the Virtual Router Redundancy Protocol (VRRP); being down will not affect production. In other words, this is more of an annoyance than a real issue. Since I have to deal with this, how about if we learn how to restore the OS in this juniper router?

I tried a few ways and thought that the easiest one was to use a USB drive. Of course, it will not work well if you are not physically close to said router (other things will also not work well in these circumstances but that is another topic), but since I can I am doing the USB upgrade.


  1. Get a USB drive. I know, this is a pretty obvious step but it is step 1. Ideally use a 1GB/2GB USB drive, formatted as fat16/fat32. Honestly I do not know how critical that is, but my experience with Cisco, which seems not to like the higher capacity ones, made me be leery. On the plus side, you should be able to find those rather easily as people replace their old ones with newer larger ones. If not, there are always the usual sources such as ebay or amazon.
  2. Download and copy OS image you are going to use, say junos-srxsme-11.4R10.3-domestic.tgz, into USB drive. If you are smarter than me, you would have gone to the Juniper downloads site and got all the OS images you need, placing them in your file server. I wasn't so I had to go the SRX210 download page and fetch it.
  3. Have your trusty serial cable and connect it to the router's console port. The default setup is the time-honored 9600 8N1. If you changed it, make sure you wrote than somewhere. I am lazy and I kinda like that setting.
  4. Connect USB drive to router.
  5. Reboot router after you attack the usb drive to it. It needs to know the drive exists as it boots up. Otherwise, it will bark like this:
    loader> install file:///junos-srxsme-11.4R10.3-domestic.tgz
    cannot open package (error 22)

    When you try to install it.

  6. Now, if you boot with USB already connected to router, it will first say something like this:

    Running U-Boot CRC Test... OK.
    Flash:  4 MB
    USB:   scanning bus for devices... 4 USB Device(s) found
           scanning bus for storage devices... 2 Storage Device(s) found
    Clearing DRAM........ done
    BIST check passed.

    Some of you noticed the 2 storage devices message. It is talking about the inboard one (probably where the OS should be) and the external drive.

  7. Now, when you see

    POST Passed
    Press SPACE to abort autoboot in 1 seconds

    Please keep your fingers in your pockets. If you press space here, you will end up in the => prompt (U-boot). If you wait you will then see

    Protected 1 sectors
    Loading /boot/defaults/loader.conf
    /kernel data=0xb0f9c0+0x134788 DA(some hot action happening here)

    have your space-bar finger on standby for the next message will be

    Hit [Enter] to boot immediately, or space bar for command prompt.
  8. Then you will press space bar and get the loader> prompt. And now, it will start doing the install thingie:

    loader> install file:///junos-srxsme-11.4R10.3-domestic.tgz
    /kernel data=0xae82f0+0x12d2b8 syms=[0x4+0x88ce0+0x4+0xc6af6]
    Kernel entry at 0x801000d8 ...
    init regular console
    GDB: debug ports: uart
    GDB: current port: uart
    KDB: debugger backends: ddb gdb
    KDB: current backend: ddb
    Copyright (c) 1996-2013, Juniper Networks, Inc.
    All rights reserved.
    Copyright (c) 1992-2006 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
            The Regents of the University of California. All rights reserved.
    JUNOS 11.4R10.3 #0: 2013-11-15 06:56:20 UTC
  9. After a while (I got bored and went to make me some tea), you will see it recreate the ssh key pairs and then finally be ready for business (apologies for the bad cut-n-pasting but my terminal console was being cute):

    |                 |
    |  .o  ..         |
    |.+o .o.o.
    |X . .. .. E      |
    |oo ..            |
    |  .+             |
    root@uranus% omplete
    Setting initial options: .
    Starting optface configuration:
    additional daemons: eventd.
    Additional rout;/boot/modules -> /bo;
    kld netpfe drv: ifpfed_dialer default_adtwork setup:.
    Starting final network daemons:.
    setting ldconfig.
    Initial rc.mips initialization:.
    Local package initializationup access
    kern.securelevel: -1 -> 1
    Creating JAIL MFS partitirade.uboot="0xBFC00000"
    clean, 78249 free (17 frags, ar 20 16:46:25 CDT 2014
    uranus (ttyu0)

    Note that it remembered the hostname for the router. I still went through the configs before letting it join the router cluster. But that is pretty much it! Router is back in business.

Closing Thoughts

  1. The universe is Murphian; things will go wrong. Try not to stress about that.
  2. When you schedule downtime for upgrades, account for things going badly in your time estimates.
  3. The hardest thing to do is figuring out what can go wrong. But, you could ask yourself "If this upgrade halts server or just this service, what would be my backup plan?" and then see if you can answer that question.
  4. Next time I need to upgrade the OS in this or another router, I will have the firmware/OS on standby in a USB drive. I do not know about you but I found out when I am prepared everything works out perfectly.
  5. If you can afford it, redundancy is a wonderful thing.
  6. Always save your configs somewhere, well, safe. Having to recreate them from scratch is a bit of a drag.

Monday, March 17, 2014

generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information ()

This will be a quick post about something that was biting my ass these last few days and what was the real cause. After you read it, you are welcome to laugh at my expense. Go ahead! I deserve it!

I was working in a kerberos/ldap (linux) server and needed to debug the connection to a given client. The ldap connection uses TLS, GnuTLS specifically since the two machines were ubuntu servers, which means we also had to worry about certs. And since kerberos is in the picture, we need to configure for that. To help in solving other issues, which I should comment about later (at least those were clever problems not like this one), I was running slapd in debug mode,

/usr/sbin/slapd -d 256 -h "ldap:/// ldapi:/// ldaps:///" -g openldap -u openldap -F /etc/ldap/slapd.d

and that did help solve the other issue I had. Some of you will notice I am also running ldaps (port 636), which I really do not need since TLS should take care of the encryption thingie. But, I digress for this post, so let's go back on topic. What I then noticed was some very problems with ldap. For instance, if I created a kerberos ticket and then tried to run ldapsearch, I would then get the following error:

root@services:~# export KRB5CCNAME=/tmp/host.tkt
root@services:~# ldapsearch -vvv
ldap_initialize(  )
SASL/GSSAPI authentication started
ldap_sasl_interactive_bind_s: Other (e.g., implementation specific) error (80)
        additional info: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information ()

Here is what the server sees:

53261bde conn=1043 fd=19 ACCEPT from IP= (IP=
53261bde conn=1043 op=0 EXT oid=
53261bde conn=1043 op=0 STARTTLS
53261bde conn=1043 op=0 RESULT oid= err=0 text=
53261bde conn=1043 fd=19 TLS established tls_ssf=128 ssf=128
53261bde conn=1043 op=1 BIND dn="" method=163
53261bde SASL [conn=1043] Failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information ()
53261bde conn=1043 op=1 RESULT tag=97 err=80 text=SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information ()
53261bde conn=1043 op=2 UNBIND
53261bde conn=1043 fd=19 closed

Since I do not have many clever things to talk about and fill the space until the solution, how about if we talk about what some of those lines mean?

  • IP= (IP= Client is connecting from its port 44610 to my port 389.
  • oid= Start TLS extended request (per rfc2830).
  • BIND dn="": anonymous if we are doing a SIMPLE bind. If we are however doing SASL bind, it is not used.
  • tag=97: result from a client bind operation.

As you noticed, at least from reading the title of this post, the error line is this generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information () thingie. Here is where it annoyed me to no end: what minor code? It is supposed to put some kind of message between the parenthesis, like "No principal in keytab matches desired name" or "Ticket expired". Then I would be able to search online for something. Instead, zilch. I could not find a single entry where the minor code parenthesis thingie was empty. Not very helpful today are we?


So, what was wrong? Me. User error. Do you remember how I was running slapd? Do you also remember the part about kerberos? Well, in the /etc/default/slapd (that'll be /etc/sysconfig/ldap for you RedHat/CentOS/Fedora folks) I have defined

export KRB5_KTNAME=/etc/ldap/ldap.keytab

which means ldap knows then where the keytab containing the ldap service principal hides. Can you see where this is going? No? Let's look again at how I am running slapd, shall we?

/usr/sbin/slapd -d 256 -h "ldap:/// ldapi:/// ldaps:///" -g openldap -u openldap -F /etc/ldap/slapd.d

As you can see, I did not pass a KRB5_KTNAME to slapd. As soon as I fed that to slapd, all was once again well in the Land of Ooo.

Thursday, January 23, 2014

Internal server touching itself from behind another servers/router's external IP

Before I start, I have to say I saw this happening using virtual machines where the network was also virtualized, but not when both machines were physical ones. Your situation might be different. YMMV, use with care, this side up, rauchen verbotten. Now that's out, let's say we have one of the following networks:

 {Internet}                     OR       {LAN}
   |                                       |
   | (external IP)                 | (LAN IP)
 [ Router ]                            [ serverA ]
   |                           |
   |                                       |
   |                                       |
   |                          |
 [ serverB ]                           [ serverB ]

It is really the same thing for all practical purposes: one machine in an internal network behind another. In any case, we want to reach port 1234 in the internal machine (serverB) from the network outside the router or serverA. So, we can go to the router/serverA do a simple port forwarding iptables rules using DNAT:

iptables -t nat -a PREROUTING -d -p tcp -m tcp --dport 1234 -j DNAT --to-destinaton

But, let's say serverB also wants to connect to the same port? Well, the easiest way it to connect to localhost:1224, but what if we want to run the same script that external machines run, which then connect to It might not be what we want:

raub@serverB:~$ nc -v localhost 1234
Connection to localhost 1234 port [tcp/https] succeeded!
raub@serverB:~$ nc -v 1234
nc: connect to port 1234 (tcp) failed: Connection timed out

In broad strokes, here is what is happening: ServerB want to connect to, a non-local address, so it sends the packet to it's gateway (router or serverA in the diagram). Gateway then assembles its response, which gets put through router/serverA's routing table. Now serverA looks at the IP and realizes that it's a local one, so it sends the response directly to serverB. But serverB is expecting the response from the outside IP, so it does not connect (topplewagon in #centos wrote a better explanation). The solution is mentioned in the session Destination NAT Onto the Same Network of the Linux 2.4 NAT howto: create a SNAT rule:

iptables -t nat -A POSTROUTING -d -s -j SNAT --to-source