Friday, January 13, 2012

Poor Man's LDAP Replication checker (Kerberos Involved!)

Warning: this article is a bit short, perhaps even curt. As such it might brush over too many concepts and assume you know how to setup replication in ldap. You have been warned!

If you are using ldap, you probably want to have more than one ldap server. You know, one dies and the other takes over kinda thing. Now, that involves some means to keep the ldap databases in the different ldap servers in sync. Currently the preferred method is called syncrepl, and you can find info on setting it up online (I might add some thoughts on that later). Problem is you need to make sure they are all in sync.

One way to do so is by monitoring the contextCSN. So if you have two ldap servers, master.domain.com and slave.monetra.com (let's say, as the name implies, we have a master and a slave ldap servers), after you start replication you could go to master and do:

ldapsearch -z1 -LLLQY EXTERNAL -H ldapi:/// -s base contextCSN
dn: dc=domain,dc=com
contextCSN: 20120113185836.364944Z#000000#000#000000

Then you would go to the slave and run the same command. And then compare the output, namely the funny number after contextCSN:, between the two. If they match, all should be well. If not, time to go check the log files in the two machines; depending on how you set it up, that would mean starting with auth.log and syslog.

Now we know what we need, what if we could make a little script to compare these values between all the ldap servers you have (that are replicating)? Well, we should be able to connect to those ldap servers from any machine that can do so, and query for contextCSN. And then, it is a matter of comparing them. In the following script, let's assume we have 3 ldap servers: one master and two slaves.

#!/bin/bash

# KRB5CCNAME=/tmp/host.tkt
LDAPs=( master.domain.com slave1.domain.com slave2.domain.com )
LDAP_NUMBER=${#LDAPs[@] }
ldap_reply[0]="";

function getLDAPinfo()
{
   local i
   for ((i=0; i < ${LDAP_NUMBER}; i++))
      do
         ldap_reply[$i]=`ldapsearch -z1 -LLLQ -H ldap://${LDAPs[$i]} -s base contextCSN | grep contextCSN | awk '{ print $2 }'`
         echo ${ldap_reply[$i]}
   done
}

function checkSyncStatus()
{
   local i
   local j
   for ((i=0; i < ${LDAP_NUMBER} -1; i++))
      do
         echo -n "$i x "
         for ((j=$i + 1; j < ${LDAP_NUMBER} ; j++))
            do
               echo -n " $j"
               if [[ "${ldap_reply[$i]}" != "${ldap_reply[$j]}" ]]; 
                  echo -n "(Bad)"
               else
                  echo -n "(Ok) "
               fi
         done
         echo
   done
}

echo "Number of LDAP servers: ${LDAP_NUMBER}"
getLDAPinfo
checkSyncStatus

Note the arguments for ldapsearch might change a bit (you might need a -x -Z or whatever; you know what you need to do to run ldapsearch in your environment). If the three machines are in sync, when you run the above code, the output should look something like:

Number of LDAP servers: 3
20120113185836.364944Z#000000#000#000000
20120113185836.364944Z#000000#000#000000
20120113185836.364944Z#000000#000#000000
0 x  1(Ok) 2(Ok)
1 x  2(Ok)

Now, the code is not complete, and that is for a reason. I really wanted to show what it is doing. The echo "Number of LDAPs: ${LDAP_NUMBER}" is there just to verify the number of ldap servers: we have 3 and in the LDAPs array they would be [0], [1], and [2]. It should be commented out/not be there in the production version of the script. The echo statement in getLDAPinfo() is there just to show the output of (and how to get said output) the ldapsearch command so you can see they are all matching; you can also take it off. And the same goes for the echo statements in getLDAPinfo. What they allow is to show which contextCSN values we are comparing, and whether they match ((Ok)) or not. Note we are checking not only the master with each slave but each slave against the others. It is a bit overkill but why not?

As I said, this code is incomplete; what you would need to do, after removing/commenting out the echo statements, is to decide how to use this information. What I have done is when ${ldap_reply[$i]} != ${ldap_reply[$j]}, it then writes down a message saying the contextCSN values for LDAPs[$i] and LDAPs[$j] do not match in an email that is then sent to me. Maybe you want to do something else, but you get the idea.

The only missing step now is to create a cron job to call this script every so often.

Ok, smart guy, you might say, what about the kerberos part you mentioned on the title? Well, if you go back to the script, you will notice a line containing KRB5CCNAME=/tmp/host.tkt commented out. We authenticate ldap access against kerberos. Also, since each machine has its own kerberos principal and keytab, we use it to create a kerberos cache named /tmp/host.tkt:

FQDN=$(hostname -f)
sudo kinit -k -t /etc/krb5.keytab -c /tmp/host.tkt "host/$FQDN@DOMAIN.COM"

which is owned by the root users and used by different services in each ldap/kerberos client. Well, if it is there, we might as well use it, right?

No comments: