Showing posts with label ssh. Show all posts
Showing posts with label ssh. Show all posts

Tuesday, November 12, 2019

Capturing the output of a command sent through ssh within script without it asking to verify host key

You have noticed when you connect to a new server, if its key is not in ~/.ssh/known_hosts ssh will ask you to verify the key:

raub@desktop:~$ ssh -i $SSHKEY_FILE cc@"$server_ip"
The authenticity of host 'headless.example.com (10.0.1.160)' can't be established.
ECDSA key fingerprint is SHA256:AgwYevnTsG2m9hQLu/ROp+Rjj5At2HU0HVoGZ+5Ug58.
Are you sure you want to continue connecting (yes/no)? 

That is a bit of a drag if you want to run a script to connect to said server. Fortunately ssh has an option just for that, StrictHostKeyChecking, as mentioned in the man page:

ssh automatically maintains and checks a database containing identification for all hosts it has ever been used with. Host keys are stored in ~/.ssh/known_hosts in the user's home directory. Additionally, the file /etc/ssh/ssh_known_hosts is automatically checked for known hosts. Any new hosts are automatically added to the user's file. If a host's identification ever changes, ssh warns about this and disables password authentication to prevent server spoofing or man-in-the-middle attacks, which could otherwise be used to circumvent the encryption. The StrictHostKeyChecking option can be used to control logins to machines whose host key is not known or has changed.

Now we can rewrite the ssh command as

ssh -o "StrictHostKeyChecking no" -i $SSHKEY_FILE cc@"$server_ip"

and it will login without waiting for us to verify the key. Of course this should only be done when the balance between security and automation meets your comfort level. However, if we run the above from a script, it will connect and just stay there with the session open, not accept any commands. i.e. if we try to send a command, pwd in this case,

ssh -o "StrictHostKeyChecking no" -i $SSHKEY_FILE cc@"$server_ip"
pwd

it will never see the pwd. Only way to get to the pwd is to kill the ssh session, when it will then go to the next command but running it in the local host, not the remote one. If we want to run pwd in the host we ssh into, we need to pass it as an argument, i.e.

raub@desktop:/tmp$ pwd
/tmpraub@desktop:/tmp$ ssh -o "StrictHostKeyChecking no" -i $SSHKEY_FILE cc@"$server_ip" pwd
/home/raub
raub@desktop:/tmp$

which means it will connect, run said command, and then exit. But, how to do that from a script? The answer is to use eval or bash -c (or other available shell):

raub@desktop:/tmp$ moo="ssh -o \"StrictHostKeyChecking no\" -i $SSHKEY_FILE cc@$server_ip"
raub@desktop:/tmp$ dirlist=$(eval "$moo pwd")
raub@desktop:/tmp$ echo $dirlist
/home/raub
raub@desktop:/tmp$ dirlist2=$(sh -c "$moo pwd")
raub@desktop:/tmp$ echo $dirlist2
/home/raub
raub@desktop:/tmp$

Now, there are subtle differences between eval and bash -c. There is a good thread in stackexchange which explains them better than I can do.

Now why someone would want to do that?

How about doing some poor-man ansible, i.e. when host on the other side does not have python or has some other idiosyncrasy. Like my home switch, but that is another story...

Wednesday, January 16, 2019

Failover and swapping controllers in a Equallogic SAN, command line involved

Equallogic as some of you know are old storage systems sold by Dell that can do at best RAID6 in a good day. As of now it has been out of support for a few years; Dell bought EMC and would rather if we used that instead. We do have them too but this article is about the Equallogic we still use.

On a Tuesday evening I got an email from one of those telling me it was unhappy. As my recurring motif, I do not like GUIs to manage devices. This one decided it would not reinforce that belief and instead just refused to start. So it is time for plan B. Most of the storage appliances and hypervisors out there either are built on Linux or Freebsd. In the last time I had to probulate it, I found it runs freebsd. And that usually means we can ssh into it. And we do it; I think I do not need to show how to do that; if you ssh into a machine, you ssh into them all.

Now we are inside we need to find what's up (I am cheating a bit because the email did tell me that the device PS4100E-01 was unhappy, which is why I am probulating it specifically):

STORGroup01> member select PS4100E-01 show
_____________________________ Member Information ______________________________
Name: PS4100E-01                       Status: online
TotalSpace: 15.98TB                    UsedSpace: 3.86TB
SnapSpace: 192.92GB                    Description:
Def-Gateway: 192.168.1.1               Serial-Number:
Disks: 12                                CN-TWO_AND_A_HALF
Spares: 1                              Controllers: 2
CacheMode: write-thru                  Connections: 4
RaidStatus: ok                         RaidPercentage: 0.000%
LostBlocks: false                      HealthStatus: critical
LocateMember: disable                  Controller-Safe: disabled
Version: V9.1.4 (R443182)              Delay-Data-Move: disable
ChassisType: DELLSBB2u12 3.5           Accelerated RAID Capable: no
Pool: default                          Raid-policy: raid6
Service Tag: H9CHSW1                   Product Family: PS4100
All-Disks-SED: no                      SectorSize: 512
Language-Kit-Version: de, es, fr, ja,  ExpandedSnapDataSize: N/A
  ko, zh                               CompressedSnapDataSize: N/A
CompressionSavings: N/A                Data-Reduction: no-capable-hardware
Raid-Rebuild-Delay-State: disabled     Raid-Expansion-Status: enabled
_______________________________________________________________________________

____________________________ Health Status Details ____________________________

Critical conditions::
Critical hardware component failure.

Warning conditions::
None
_______________________________________________________________________________


____________________________ Operations InProgress ____________________________


ID StartTime            Progress Operation Details                             

-- -------------------- -------- -----------------------------------------------
STORGroup01>

Yep, the Health Status Details menu tells us it is unhappy; we knew that already. But, what does the log say?

STORGroup01> show recentevents
[...]
6492:881:PS4100E-01:SP: 8-Jan-2019 19:34:51.700834:cache_driver.cc:1056:WARNING:
28.3.17:Active control module cache is now in write-through mode. Array performa
nce is degraded.

6491:880:PS4100E-01:SP: 8-Jan-2019 19:34:51.700833:emm.c:355:ERROR:28.4.85:Criti
cal hardware component failure, as shown next.
        C2F power module is not operating.

6490:879:PS4100E-01:SP: 8-Jan-2019 19:34:51.700832:emm.c:2363:ERROR:28.4.47:Crit
ical health conditions exist.
 Correct immediately before they affect array operation.
        Critical hardware component failure.
        There are 1 outstanding health conditions. Correct these conditions before they
 affect array operation.

OK, controller thingie is not happy. But, this device has 2 of them. Which one is it?

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: failed
ProcessorTemperature: 65               ChipsetTemperature: 44
LastBootTime: 2018-04-23:15:06:55      SerialNumber:
Manufactured: 0327                       CN-I_AM_FEELING_DEPRESSED
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011                             
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2018-04-23:15:13:17      SerialNumber:
Manufactured: 031S                       CN-I_AM_FEELING_GREAT
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011                             
_______________________________________________________________________________

______________________________ Cache Information ______________________________
CacheMode: write-thru                  Controller-Safe: disabled
Low-Battery-Safe: enabled                                                      
_______________________________________________________________________________
STORGroup01>

Fun fact: I haven't the foggiest idea of which slot is 1 and which one is 0. We will have to find that out as we go along.

SPOILER ALERT: Making ASSumptions here is a bad idea.

The Controller

We sent out the above info -- model, firmware -- to the vendor we have a support contract with, which sent a controller card. Here is it in its purple gloriousness.

The configuration is stored in a SD card on the socket being pointed by my finger. When swapping the controller, the laziest thing to do is put the old SD card in the new controller. This way we do not have to configure it.


We know one of the controllers is unhappy, but which one? You see, this storage device only needs one controller to work. But, it is an enterprise device; the reason to have two is that the second one is on standby: if the primary has a problem, service can failover to the secondary/backup. If you look at the picture below, you will see one controller has both lights green while the one below is has the top green and the bottom orange. That (orange light) indicates it is the either the backup, secondary, failover, or unused controller. The OR is very important here

Now, this system is designed to be hot swappable but with one proviso: you can swap the device that is not currently in use, which would make sense since chances are the bad controller failed and the backup took over. So, the failed device should be the one in standby.

So, I swapped it. And then checked again.

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: failed
ProcessorTemperature: 64               ChipsetTemperature: 44
LastBootTime: 2018-04-23:15:07:14      SerialNumber:
Manufactured: 0327                       CN-I_AM_FEELING_DEPRESSED
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:12:56:01      SerialNumber:
Manufactured: 025E                       CN-I_AM_THE_NEW_GUY
ECOLevel: C00                          CM Rev.: A03
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

It seems I replaced the secondary (serial number CN-I_AM_THE_NEW_GUY), and the primary one (CN-I_AM_FEELING_DEPRESSED) is still the problematic one. Why didn't it failover when it realized it had a problem? I don't know; what I know is that I still need to replace the problematic controller. At least now we know the SlotID 0 is the top controller and SlotID 1 the bottom. Small progress but progress nevertheless.

It turns out there is a way to programmatically force the failover; that is achieved using the command restart. But, since I have 3 different storage devices -- PS4100E-01, PS4100E-00, and PS6100E -- in this setup and am currently accessing them from the system that control all of them, and restart does not seem to allow me to specify which device I want to reboot,

STORGroup01> restart
            - Optional argument to restart.
 
STORGroup01>

we should ssh into PS4100E-01 (IP 192.168.1.3 for those who are wondering) and then issue the command there. We do not need to worry about it restarting the secondary controller because it only reboots the active one, which causes it to failover to the secondary.

STORGroup01> restart

Restarting the system will result in the active and secondary control
modules switching roles. Therefore, the current active control module
will become the secondary after the system restart.


After you enter the restart command, the active control module will fail over.
To continue to use the serial connection when the array restarts, connect the
serial cable to the new active control module.

Do you really want to restart the system? (yes/no) [no]:yes
Restarting at Tue Jan 15 13:40:48 EST 2019 -- please wait...
Waiting for the secondary to synchronize with us (max 900s)
....Rebooting the active controller
Connection to 192.168.1.3 closed by remote host.
Connection to 192.168.1.3 closed.
raub@desktop:~$

And, yes, it is that anticlimatic. I just got kicked off PS4100E-01; our network access to the storage device is through the controller. Now we need to wait for it to come back. I am lazy so I will use ping to let me know when the network interface is back up.

raub@desktop:~$ ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.

64 bytes from 192.168.1.3: icmp_seq=10 ttl=254 time=3.50 ms
64 bytes from 192.168.1.3: icmp_seq=10 ttl=254 time=3.51 ms (DUP!)
64 bytes from 192.168.1.3: icmp_seq=11 ttl=254 time=90.8 ms
64 bytes from 192.168.1.3: icmp_seq=12 ttl=254 time=1.14 ms
64 bytes from 192.168.1.3: icmp_seq=13 ttl=254 time=1.49 ms
64 bytes from 192.168.1.3: icmp_seq=14 ttl=254 time=1.64 ms
64 bytes from 192.168.1.3: icmp_seq=15 ttl=254 time=1.06 ms
^C
--- 192.168.1.3 ping statistics ---
15 packets transmitted, 6 received, +1 duplicates, 60% packet loss, time 14078m
s
rtt min/avg/max/mdev = 1.068/14.741/90.815/31.072 ms
raub@desktop:~$

What I can't show in this article is that it took a while until I started getting packets back. In fact, I went to make coffee (which required me to find coffee first) and then came back and then run ping a few times until I got the above output. These devices are in no hurry when rebooting.

Once we see pinging, we can check if ssh is running again and then log back in and asked what's up with the controllers:

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: unknown
Model: ()                              BatteryStatus: unknown
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:13:43:19      SerialNumber:
Manufactured:                          ECOLevel:
CM Rev.:                               FW Rev.:
BootRomVersion:                        BootRomBuilDate:
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 60               ChipsetTemperature: 43
LastBootTime: 2019-01-15:12:56:00      SerialNumber:
Manufactured: 025E                       CN-I_AM_THE_NEW_GUY
ECOLevel: C00                          CM Rev.: A03
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

When the output was captured, the controller in slot 0 has not come back up from the restart command. What matters, however, is the controller in slot 1 became the active one, meaning we can swap the unhappy controller. If we wait a bit, we will see the controller on slot 0 be reported as failed,

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: failed
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:13:43:09      SerialNumber:
Manufactured: 0327                       CN-I_AM_FEELING_DEPRESSED
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

which means we can now swap the top guy (slot 0) with the old controller we accidentally removed from slot 1. After a few minutes, all is well in the world:

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:13:55:33      SerialNumber:
Manufactured: 031S                       CN-I_AM_FEELING_GREAT
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 62               ChipsetTemperature: 42
LastBootTime: 2019-01-15:12:55:43      SerialNumber:
Manufactured: 025E                       CN-I_AM_THE_NEW_GUY
ECOLevel: C00                          CM Rev.: A03
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

There might be a GUI way to do all of that but I am not smart enough to click on things.

Tuesday, September 26, 2017

Forcing a fuse (sshfs) network fileshare to unmount in OSX

As some of you already know, I do have an old MacBook Air which I use as my main (as in the computer I sit in front of, not the computer I store data on. Laptops can be stolen, you know) machine until I find a new Linux laptop replacement. For this reason I need it to play nice with other machines, and that requires sometimes to mount a fileshare. If the other host is in the same VLAN, that is rather easy because there are ways to mount a Windows (SMB/CIFS) and even a Linux/UNIX (nfs) fileshare without breaking a sweat. But what if the machine is remote? If we can ssh into it, why not then use sshfs?

As we are aware of (since we read the link. There are a few more sshfs examples here), sshfs requires fuse. Since I am using OSX, which at the present time does not have it, I need to install. If you are curious, the one I use is FUSE for MacOS.

Mounting: business as usual

Let's say we are in the machine boris as user pickles trying to mount my home directory off desktop. We create the mountpoint (Let's use /tmp/D or ~/D so it looks more like what we would do in Linux:

boris:Documents pickles$ mkdir /tmp/D; sshfs raub@desktop.in.example.com:. /tmp/D
boris:Documents pickles$ df -h
Filesystem                     Size   Used  Avail Capacity  iused    ifree %iused  Mounted on
/dev/disk1                    112Gi   79Gi   33Gi    71% 20783599  8546848   71%   /
devfs                         364Ki  364Ki    0Bi   100%     1259        0  100%   /dev
map -hosts                      0Bi    0Bi    0Bi   100%        0        0  100%   /net
map auto_home                   0Bi    0Bi    0Bi   100%        0        0  100%   /home
raub@desktop.in.example.com:.  492Gi  389Gi  102Gi    80%   408428 32359572    1%   /private/tmp/D
boris:Documents pickles$

So far so good. To unmount it we can use diskutil, as in (Mac)

boris:Documents pickles$ diskutil umount /tmp/D
Unmount successful for /tmp/D
boris:Documents pickles$

or (Linux)

fusermount -u /tmp/D

Or go old school (both):

sudo mount /tmp/D

Since boris is a laptop, sometimes if we just let it go to sleep it will unmount it. Then, all we have to do is mount it again.

Mounting again: not so fast

Thing is, sometimes it does not work.

boris:Documents pickles$ mkdir /tmp/D; sshfs raub@desktop.in.example.com:. /tmp/D
mkdir: /tmp/D: File exists
fuse: bad mount point `/tmp/D': Input/output error
boris:Documents pickles$ 

Ok, maybe it did not automagically unmounted while laptop was off. So, let's tell it to do so:

boris:Documents pickles$ diskutil umount /tmp/D
Unmount failed for /tmp/D
boris:Documents pickles$ 

Just before you ask, sudo mount /tmp/D did not work either. What if the old sshfs processes did not cleanly closed and as a result are still lingering? To answer that we must elicit some help from one of grep's cousins, pgrep:

boris:Documents pickles$ pgrep -lf sshfs
384 sshfs raub@desktop.in.example.com:. /tmp/D
1776 sshfs raub@desktop.in.example.com:. /tmp/D
7356 sshfs user@other.in.example.com:. /tmp/D
boris:Documents pickles$

Just as we guessed, there are not only but quite a few unhappy sshfs instances. Let's see if we can kill them:

boris:Documents pickles$ kill 384 1776 7356
boris:Documents pickles$ pgrep -lf sshfs
384 sshfs raub@desktop.in.example.com:. /tmp/D
1776 sshfs raub@desktop.in.example.com:. /tmp/D
boris:Documents pickles$ kill 384
boris:Documents pickles$ pgrep -lf sshfs
384 sshfs raub@desktop.in.example.com:. /tmp/D
1776 sshfs raub@desktop.in.example.com:. /tmp/D
boris:Documents pickles$ kill 1776
boris:Documents pickles$ pgrep -lf sshfs
384 sshfs raub@desktop.in.example.com:. /tmp/D
1776 sshfs raub@desktop.in.example.com:. /tmp/D
boris:Documents pickles$
Hmmm, this is going nowhere slowly. Let's crank up a notch and force it to kill the mount.
boris:Documents pickles$ kill -9 1776
boris:Documents pickles$ pgrep -lf sshfs
384 sshfs raub@desktop.in.example.com:. /tmp/D
boris:Documents pickles$ kill -9 384
boris:Documents pickles$ pgrep -lf sshfs
\boris:Documents pickles$

Sounds like we got them all. Now, let's try and mount once more:

boris:Documents pickles$ mkdir /tmp/D; sshfs raub@desktop.in.example.com:. /tmp/D
mkdir: /tmp/D: File exists
raub@desktop.in.example.com's password:
boris:Documents pickles$

I think we have a winner!

Monday, October 31, 2016

Creating and uploading ssh keypair using ansible

As some of you have noticed from previous posts, I do use docker. However, I also use Ansible to install packages on and configure hosts. I am not going to talk about when to use each one. Instead, I will try to make another (hopefully) quick post, this time ansible-related.

Let's say I have an user abd I want to set said user up so I can later on ssh to the target machine as that user using ssh key pair. From the machine I am running ansible from. To do so I need to create a ssh keypair and then put the public key to the user's authorized_keys file. And that might require creating said file and ~/.ssh if not available. Now that is a mouthfull so let's do it in steps.

  1. Decide where to put the key pair: This is rather important since we not only need to have it somewhere we can find when we aare ready to copy it to the target server but also we might want not to recreate the key pair every time we run this task.

    For now I will put it in roles/files/, which is not ideal for many reasons. Just to name one, we might want to put it somewhere encrypted or with minimum access by suspicious individuals. But for this discussion it is a great starting place for the key pair. So, let's see how I would verify if the pair already exists or not:

    - name: Check if user_name will need a ssh keypair
        local_action: stat path="roles/files/{{user_name}}.ssh"
        register: sshkeypath
        ignore_errors: True

    The first line (name) writes something to the screen to show where we are in the list of steps. It is rather useful because when we run ansible we then can see which task we are doing. The local_action uses stat to determine if the file roles/files/{{user_name}}.ssh exists. The return/result code is then written to the variable ssheypath. Finally I do not care about any other messages from stat.

  2. Create key pair: there are probably really clever ways that are purely ansible native to do it, but I am not that bright so I will use openssh instead, specifically ssh-keygen. Ansible has a way to call shell scripts, the shell module. Now, I want to run tt>ssh-keygen on the machine I am running ansible from, the local machine in ansible parlance. And for that reason we have the local_action command. Here is one way to do the deed:

    - name: Create user_name ssh keypair if non-existent
        local_action: shell ssh-keygen -f roles/files/"{{user_name}}".ssh -t rsa -b 4096 -N "" -C "{{user_name}}"
        when: not sshkeypath.stat.exists

    As you guessed, we are using ssh-keygen to create the key pair roles/files/user_name.ssh and roles/files/user_name.ssh.pub. The other ssh-keygen are just because I would like my keys to be a bit more secure than those created by the average bear. So far so good: we are here using local_action and shell like in the previous step.

    The interesting part if the when statement. What it is saying is to only create the key pair if it does not already exist. And that is why we did check if roles/files/user_name.ssh existed in the previous step. Of course that meant we assumed that if roles/files/user_name.ssh existed, roles/files/user_name.ssh.pub must also exist. There are ways to check for both but I will leave that to you.

  3. Copy public key to remote server: Now we know we have a ssh key pair, we do need to copy it. You can do it using the file module, which will require a few extra tasks (create the ~/.ssh directory with the proper permissions comes to mind). Or, you can use the appropriately named authorized_key module. I am lazy so I will pick the last option.

    - name: Copy the public key to the user's authorized_keys
        # It also creates the dir as needed
        authorized_key:
          user: "{{user_name}}"
          key: '{{ lookup("file", "roles/files/" ~ user_name ~ ".ssh.pub") }}'
          state: present

    What this does is create ~/.ssh and ~/.ssh/authorized_keys and then copy the key to ~/.ssh/authorized_keys. The most important statement here is the key one. Since it expects us to pass the key as a string, we need to read the contents of the public key. And that is why we called upon the lookup command; think of it here as using cat. And, why we have all the crazy escaping in "roles/files/" ~ user_name ~ ".ssh.pub").

And that is pretty much it. If you want to be fancy you could use a few local_actions to copy the private key to your (local) ~/.ssh and then create/edit ~/.ssh/config to make your life easier. I did not do it here because this user will be used as is by ansible; I need no access to this user as me outside ansible.

Saturday, January 30, 2016

Mounting user fileshare on boot2docker boot

For docker container development, and light using, I found boot2docker to be quite convenient. I have it to boot off its ISO and then mount a permanent drive for the containers (and to store config files).

/dev/sda                 19.6G     12.7G      5.9G  68% /mnt/sda/var/lib/docker/aufs
Whenever there is a new version, I shut the vm down, swap the ISOs (really just point the alias to the new file), and reboot. Nice and brainless.

I do not like to use the default account, docker, to do container development and running. Also, in a nice production environment you want to have other users running their containers. So, I created a user called ducker, which is a quick play on the default username. I also would prefer not having the user homedir in the drive where the containers are, which has been suggested before. You see, the way I see containers they are by design not important; blow them up if you feel like or wonder if they have been compromised. What matters is the data and the dockerfile required to rebuild the container. As a result, ducker has an account in the fileserver, which does its RAID and backup thingie as any good fileserver should. Now, if we want to have containers created and running from ducker's account when the server boots up, we need to have said account available.

So, the plan is that whenever the boot2docker server reboots, ducker will be there. And we then have a few issues to deal with. First, docker2file's ISO can't do automount. And second, usually when we shut down any user is lost because we are running of an ISO.

boot2docker's ISO is built so that we can provide some permanent stuff, which is why it mounts /dev/sda. But, that is not the only place we can mount things, nor the only way. In /opt/bootscript.sh we have the following interesting lines:

# Allow local bootsync.sh customisation
if [ -e /var/lib/boot2docker/bootsync.sh ]; then
    /bin/sh /var/lib/boot2docker/bootsync.sh
    echo "------------------- ran /var/lib/boot2docker/bootsync.sh"
fi

# Launch Docker
/etc/rc.d/docker

# Allow local HD customisation
if [ -e /var/lib/boot2docker/bootlocal.sh ]; then
    /bin/sh /var/lib/boot2docker/bootlocal.sh > /var/log/bootlocal.log 2>&1 &
    echo "------------------- ran /var/lib/boot2docker/bootlocal.sh"
fi

The files in /var/lib/boot2docker are in /dev/sda. Don't know which one to pick, but the second one does have a comforting Allow local HD customization message. So I will pick /var/lib/boot2docker/bootlocal.sh and add the following lines:

# Create local user, also creating the homedir
adduser -D -u 1003 ducker
# add user to docker group
adduser ducker docker
# Mount homedir
mount.nfs  fileserver.example.com:/export/home/ducker /home/ducker

After we create the file and reboot, we get

docker@boot2docker:~$ id ducker
uid=1003(ducker) gid=1003(ducker) groups=1003(ducker),100(docker)
docker@boot2docker:~$ df -h
Filesystem                Size      Used Available Use% Mounted on
tmpfs                   896.6M    123.8M    772.8M  14% /
tmpfs                   498.1M         0    498.1M   0% /dev/shm
/dev/sda                 19.6G     12.7G      5.9G  68% /mnt/sda
cgroup                  498.1M         0    498.1M   0% /sys/fs/cgroup
df: /mnt/hgfs: Protocol error
fileserver.example.com:/home/ducker
                        295.3G    285.1G     10.0G  97% /home/ducker
/dev/sda                 19.6G     12.7G      5.9G  68% /mnt/sda/var/lib/docker/aufs
docker@boot2docker:~$

User ducker can login because there is a public ssh key already in place. However, we did not do anything for that user's password, but there are a few ways to take care of that such as using /var/lib/boot2docker/bootlocal.sh to copy the hash into /etc/passwd. And, you really want to take care of that otherwise you will not be able to login.

Friday, August 07, 2015

Juniper SRX router booted from backup image (or, orange LED light again!)

So, uranus is in pain once more. Power went out and it was not properly shut down before the UPS gave its last gasp (I do need to do something about that). When I rebooted it, it seemed to have come back without an issue -- it was working fine -- but it had The Light on again just like it was in last time. I ssh'd into it and this is what the motd looked like:

raub@desktop:~$ ssh janitor@uranus.example.com
janitor@uranus.example.com's password:
--- JUNOS 12.1X45-D25.1 built 2014-04-23 20:45:48 UTC

***********************************************************************
**                                                                   **
**  WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE      **
**                                                                   **
**  It is possible that the primary copy of JUNOS failed to boot up  **
**  properly, and so this device has booted from the backup copy.    **
**                                                                   **
**  Please re-install JUNOS to recover the primary copy in case      **
**  it has been corrupted.                                           **
**                                                                   **
***********************************************************************

janitor@uranus>
Hmmm, so the normal image (does that mean the entire OS partition or just the boot stuff?) got corrupted when power went bye-bye. Good thing it keeps a backup. Out of curiosity, let's see if that is related to the LED being on:
janitor@uranus> show chassis alarms
1 alarms currently active
Alarm time               Class  Description
2015-02-26 05:38:12 EST  Minor  Host 0 Boot from backup root

janitor@uranus>
It seems to be the case. Funny that it is labelled minor but important enough to become the motd, but I digress. Personally, we should take care of that as soon as we have a scheduled maintenance window.

Ok, maintenance window time. Before rebooting, let's prepare a few things. If we know the backup copy is good (configuration files are also backed up, and you can push them back with ansible or whatnot if you feel uncomfortable), you could be lazy like me and copy the backup into the primary partition.

janitor@uranus> request system snapshot slice alternate
Formatting alternate root (/dev/da0s1a)...
Copying '/dev/da0s2a' to '/dev/da0s1a' .. (this may take a few minutes)
The following filesystems were archived: /

janitor@uranus>
If all went well, we should see the primary snapshot having the creation date of when we ran the ,tt>request system snapshot slice alternate command.
janitor@uranus> show system snapshot media internal
Information for snapshot on       internal (/dev/da0s1a) (primary)
Creation date: Apr 19 22:34:04 2015
JUNOS version on snapshot:
  junos  : 12.1X45-D25.1-domestic
Information for snapshot on       internal (/dev/da0s2a) (backup)
Creation date: Feb 26 05:33:18 2015
JUNOS version on snapshot:
  junos  : 12.1X45-D25.1-domestic

janitor@uranus>
As you noticed, I did this a while ago but never got around to making an article about it, but there it is. However, I still had the evil LED staring at me. Time to turn it off
request system reboot media internal
So I rebooted and did not get that motd anymore not the LED:
janitor@uranus> show chassis alarms
No alarms currently active

janitor@uranus>

Sunday, February 15, 2015

Backing up sqlite to another machine

Much have been written about backing up mysql/mariadb (mysqldump anyone?), but what about lowly sqlite? It might not have as many features and does not run as its own process, but sqlite is rather useful in its own right (embedded anyone?). If you have an android phone you are running sqlite. And that does not mean its data is not worth saving, so let's do some saving.

I will call the backup server backupbox and the machine running sqlite webserver. Note that usually a sqlite db is not run in a separate server like other databases, which is why in this example we claim it is a backend to some website.

Requirements and Assumptions

  1. sqlite3. After all, it is a bit challenging backing up a database if you do not have the database.
  2. Path to the database file you want to backup. Remember, sqlite is primarily a file; if you do not know where the file is, backing it up might pose a few challenges. In this example, it is found at /var/local/www/db/stuff.db.
  3. Both backupbox (IP 192.168.42.90) and webserver are running Linux or OSX. If there is an interest we can talk about when one (or both) of them are running Windows.
  4. We will break the backup process in two steps: backing up the database (to local drive) and then copying backup into the backup server. Reason is that we can then time each step to run at convenient times to them. Also, if one of the steps fail, it is much easier to take care of that. I know how monolithic do-everything-plus-clean-kitchen programs are the hip and fashionable solution nowadays, but this is my blog and I prefer simple and (hopefully) easy to maintain solutions whenever possible. Deal with it.

Procedure

  1. On server to be backed up, webserver
    1. Create backup user
      useradd -m backupsqlite -G www-data
      sudo -u backupsqlite mkdir -p /home/backupsqlite/.ssh
      touch /home/backupsqlite/.ssh/authorized_keys
      chown backupsqlite:backupsqlite /home/backupsqlite/.ssh/authorized_keys
      
    2. Create script to dump the database in ~backupsqlite/sqlite-backup.bz2.
      cat > /usr/local/bin/backupsqlite << 'EOF'
      #!/bin/bash
      BACKUP_USER=sqlitebackup
      BACKUP_GROUP=services
      DATABASE=/var/local/www/db/stuff.db
      
      sqlite3 ${DATABASE} ".dump" |
      sudo -u  ${BACKUP_USER} bzip2 -c > /home/${BACKUP_USER}/sqlite-backup.bz2
      chmod 0600 /home/${BACKUP_USER}/sqlite-backup.bz2
      EOF
      chmod 0700 /usr/local/bin/backupsqlite
    3. Run the above script manually as user sqlitebackup. Do not continue until this step is successful.
    4. Now you know your hard work paid off, how about running this script once a day? Maybe at 3:00am the natives will be quiet enough so you can safely run the backup script:
      cat > /etc/cron.d/backupsqlite << 'EOF'
      MAILTO=admin@example.com
      0 3 * * *  backupsqlite    /usr/local/bin/backupsqlite
      EOF
  2. On backup server, backupbox:
    1. Create ssh key pair to authenticate the connection. Note we are assuming we are will be running this script from backupbox's root account; that is not required and probably not the smartest thing to do, but will work fine for our little test. You could have used Kerberos or LDAP or something else, but would need to make changes as needed.
      ssh-keygen -f /root/.ssh/sqlitebackup-id-rsa
      You will need to copy sqlitebackup-id-rsa.pub to webserver and place it in ~backupsqlite/.ssh/authorized_keys by any means you want. If you are a better typist than me, you could even enter it manually.
    2. Test: can you retrieve the backup file?
      rsync -az -e "ssh -i /root/.ssh/sqlitebackup-id-rsa " \
      backupsqlite@webserver.example.com:sqlite-backup.bz2 .
      We can restrict this connection later. Right now let's just make sure this step works. If not, find out why before continuing.
    3. Now let's create the script that will get the database and put it, say, in /export/backup/databases
      cat > /usr/local/bin/backupsqlite << 'EOF'
      #!/bin/bash
      BACKUP_USER=sqlitebackup
      BACKUP_PATH='/export/backup/databases'
      DATABASE_SERVER=webserver.example.com
      $DATABASE=sqlite-backup.bz2
      KEY='/root/.ssh/sqlitebackup-id-rsa'
      
      cd $DATABASE_PATH
      rsync -az -e "ssh -i $KEY " $BACKUP_USER@$DATABASE_SERVER:$DATABASE .
      EOF
      chmod 0700 /usr/local/bin/backupsqlite

      Test it before continuing. Note there are other ways to do this step, like add the above rsync statement to a larger backup script; I tried to hint at that and the fact this could be the start of a function that would loop over all the servers you need to grab backup files from. So final implementation is up to you.

    4. If this backup is to be run independently, create a cron job to run it at a convenient time. How does 11:45 in the evening sounds?

      cat > /etc/cron.d/backupsqlite << 'EOF'
      MAILTO=admin@example.com
      45 23 * * *  root    /usr/local/bin/backupsqlite
      EOF

      Otherwise, tell your main backup script about this new entry.

  3. And back to webseerver
    1. Now time to make access to the database more restrict. We will be rather lazy here: we will start with what we have done in a previous article and make a few changes here and there.
      sudo -u backupsqlite cat > /home/backupsqlite/cron/validate-rsync << 'EOF'
      #!/bin/sh
      case "$SSH_ORIGINAL_COMMAND" in
      rsync\ --server\ --sender\ -vlogDtprze.iLsf\ --ignore-errors\ .\ sqlite-backup.bz2)
      $SSH_ORIGINAL_COMMAND
      ;;
      *)
      echo "Rejected"
      ;;
      esac
      EOF
      chmod +x /home/backupsqlite/cron/validate-rsync
      You probably noticed that I named the scripts running in both machines the same. Why not? If you do not like that, change it!
    2. Then we need to modify /home/backupsqlite/.ssh/authorized_keys to tell it to accept only connections from our backup server using our key pair and then only allow the backup script to be run. In other words, add something like
      from="192.168.42.90",command="/home/backupsqlite/cron/validate-rsync"
      to the beginning of the line in /home/backupsqlite/.ssh/authorized_keys containing the public key.

References

The Sqlite dump command

Wednesday, February 04, 2015

Creating a git server in docker... with NFS and custom port and ssh key pair

I usually like to start my post describing what we will try to accomplish here, but I think I can't do any better than what the title states. So, let's say if I can come up with a convincing excuse. Well, all I can come up with right now is that I think it is wasteful to create an entire VM to run a distributed version control system. At least one that does not have helpful paperclips with eyes and other features requiring you to download crap. And, it is nice to know if the docker host (or cloud) takes a dump, we can bring this service back rather quickly. For this article we will use git; some other time we can talk about svn.

The git server I will be using is gitolite because it is reasonably simple and quick to manage and get going. What I really like on it is that the accounts for the users using git are not accounts in the host itself, so they cannot login to the machine hosting git. By default the git users login using

Since I am lazy, I will store the repositories in a NFS fileshare that is mounted into the container at runtime. We talked about how to do the mounting in a previous article.

Assumptions

  1. gitolite running off /home/git
  2. We will connect to the git server on port 2022. I chose that because I need port 22 for to ssh into the docker host. Yes, they are in different IPs (in my case completely different VLANs), but I am weird like that.
  3. We will use ssh key pair authentication. And will use a different key than the default one. Note you can authenticate against LDAP, but that would go against what I wrote in the name of this article.
  4. gitolite being run as user git
  5. I am running this on centos6.

Install

  1. I created a CNAME for the docker host, gitserver.example.com, so it looks pretty.
  2. In the NFS server, create a fileshare owned by the user git, which in this example has uid=1201.
  3. We will need to create a ssh key pair for the gitadmin. I created my pair by doing something like
    ssh-keygen -t rsa -C gitadmin -f ~/.ssh/gitadmin
    You will need to copy ~/.ssh/gitadmin.pub into the docker host by whatever means you desire.
  4. I create a directory in the docker host to put all the files (docker-entrypoint.sh and Dockerfile) related to this container. Here is the Dockerfile
    ############################################################
    # Dockerfile to build a gitolite git container image
    # Based on CentOS
    ############################################################
    
    # Set the base image to CentOS
    FROM centos:centos6
    
    # File Author / Maintainer
    MAINTAINER Mauricio Tavares "raubvogel@gmail.com"
    
    ################## BEGIN INSTALLATION ######################
    # We need epel
    RUN rpm -Uvhi http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.no
    arch.rpm && \
        sed -i -e 's/^enabled=1/enabled=0/' /etc/yum.repos.d/epel.repo
    
    # We need NFS, openssh, and git
    # And ssmtp (from Epel)
    RUN yum update -y && yum install -y \
            git \
            nfs-utils \
            openssh-server && \
        yum install -y ssmtp --enablerepo=epel
    
    # Configure NFS
    RUN sed -i -e '/^#Domain/a Domain = example.com' /etc/idmapd.conf
    
    ##################### INSTALLATION END #####################
    
    # Create git user
    RUN adduser -m -u 1201 git
    
    # Configure ssmtp
    
    # Configure sshd
    RUN sed -i -e 's/^#Port .*$/Port 2022/' \
               -e 's/^#PermitRootLogin .*$/PermitRootLogin no/' \
               /etc/ssh/sshd_config && \
        sed -i -e \
            's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.s
    o@g' \
             /etc/pam.d/sshd && \
        ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -t rsa && \
        ssh-keygen -f /etc/ssh/ssh_host_dsa_key -N '' -t dsa
    
    # And a mountpoint for repositories
    # Note: can't NFS mount from dockerfile, so will do it in an entrypoint script
    RUN su - git -c 'mkdir repositories'
    
    
    # And now the git server
    # Gitolite admin: gitadmin (it is based on the name of the pub key file)
    RUN su - git -c 'mkdir bin' && \
        su - git -c 'git clone git://github.com/sitaramc/gitolite' && \
        su - git -c 'mkdir -m 0700 .ssh' && \
        su - git -c 'echo "ssh-rsa AAAAB3NzaC1yc2EAASLDAQCOOKIEQDehf5hxGq9//34yrsL
    [...]
    7CfSpbiP gitadmin" > .ssh/gitadmin.pub'
    # The rest will be configured in the entrypoint script
    
    # Put the entrypoint script somewhere we can find
    COPY docker-entrypoint.sh /entrypoint.sh
    ENTRYPOINT ["/entrypoint.sh"]
    
    EXPOSE 2022
    # Start service
    CMD ["/usr/sbin/sshd", "-D"]
    

    Where

    1. You will need to put the public key gitadmin.pub between the double quotes in the line beginning with su - git -c 'echo "ssh-rsa.
    2. I am running a lot of things in the Dockerfile as user git.
    3. The NFS setup was mentioned before, so I will not bother with it right now.
    4. I forgot to add the setup for ssmtp. I will think about that sometime later.
  5. The docker-entrypoint.sh file looks vaguely like this:
    #!/bin/sh
    set -e
    
    # Mount git's repositories
    mount.nfs4 fileserver.example.com:/git /home/git/repositories
    
    su - git -c 'gitolite/install -to $HOME/bin'
    # setup gitolite with yourself as the administrator
    su - git -c 'gitolite setup -pk .ssh/gitadmin.pub'
    
    # And we are out of here
    exec "$@"
  6. So far so good? Ok, so let's build the image. I will call it git.
    docker build -t git .
    If the last build message looks like
    Successfully built 6fb1ac15b47a
    chances are the build was successful and you can go to the next step. Otherwise, figure out what went boink.
  7. Now start the service. Remember you need to run in priviledge mode because of NFS. Since this is a test, I am calling the contained test-git.
    docker run --privileged=true -d -P -p 3306:3306 --name test-git git

Setup and Testing

  1. Let's start testing by seeing what repositories we can see as an admin:
    $ /usr/bin/ssh -i /home/raub/.ssh/gitadmin git@gitserver.example.com info
    hello gitadmin, this is git@docker running gitolite3 v3.6.2-12-g1c61d57 on git 1.7.1
    
     R W    gitolite-admin
     R W    testing
    
    
    FYI, testing is a repo everyone allowed to use the git server can play with. Think of it as a, as its name implies, test repo. Since that worked we can proceed to the next step.
  2. Now we should edit your .ssh/config file to access the gitadmin repository.
    cat >> ~/.ssh/config << EOF
    Host gitadmin
            Hostname        gitserver.example.com 
            User            git
            Port            2022
            identityfile    /home/raub/.ssh/gitadmin
            protocol        2
            compression     yes
    EOF
    Yes, you can ask about what to do if you have a script that needs to pull stuff out of a repo, and I will tell you to wait for the next installment. This article deals with getting it to work.
  3. Retrieving the admin repository is now much easier. So, instead of having to do something like
    git clone ssh://git@gitserver.example.com:[port]/gitolite-admin
    which would also require us to feed the key (or rename it as the default which IMHO is a bad idea), thanks to the previous step we can now lazily do
    git clone gituser:/gitolite-admin
  4. Repository config file is in gitolite-admin/conf/gitolite.conf
  5. Adding a new user (and perhaps a repository)
    1. Get user's ssh public key, say raub.pub. How did the user created the ssh key pair? Don't know, don't care.
    2. Copy raub.pub to gitolite-admin/keydir. NOTE:file must be named after username user will use to connect to the git server; it does not need to have anything to do with the user's normal/real username.
    3. Create a repository for the user. Let's give it a nice and snuggly name, like somethingawful
      cat >> conf/gitolite.conf << EOF
      
      repo somethingawful
          RW      = raub
      EOF
    4. Commit changes
      git add conf keydir
      git commit -m 'Added new user and config a repo'
      git push origin master
  6. Now, let's pretend we are the user (i.e. what the user should do/see). Which repos can you/user see? If this looks like a step we did before, it is. Just using a new user.
    $ ssh -i /home/raub/.ssh/raub git@gitserver.example.com info 
    hello raub, this is git@docker running gitolite3 v3.6.2-12-g1c61d57 on git 1.7.1
    
    
     R W    somethingawful
     R W    testing
    
    There is somethingswful!
  7. Let's grab somethingawful
    $ git clone gituser:/somethingawful
    Cloning into 'somethingawful'...
    warning: You appear to have cloned an empty repository.
    Checking connectivity... done.
  8. Edit, commit, and off you go

I will put all those guys in my github account later on. And shall sneakly update this post afterwards.

Wednesday, December 31, 2014

Restrictive rsync + ssh

Some of you have probably used rsync to backup files and directories from one machine to another. If one of those machines is in in an open network, you probably are doing it inside a ssh tunnel. If not, you should. And, it is really not that hard to do.

Let's say you wanted to copy directory called pickles inside the user bob's home directory at flyingmonkey.example.com, which is a Linux/Unix box out in the blue yonder. If you have rsync
installed (most Linux distros do come with it or offer it as a package), you could do something like:

rsync -az -e ssh bob@flyingmonkey.example.com:pickles /path/to/backup/dir/

The -e ssh is what tells rsync to do all of its monkeying about inside a ssh tunnel. And, when you run the above statement, it will then ask for bob's password and then proceed to copy the directory
~bob/pickles inside the directory /path/to/backup/dir. Which is great but I think we can do better.

Look Ma! No passwords!

First thing I want to get rid of is needing to enter a password. Yeah, it was great while we are testing it, but if we have a flyingmonkey loose in the internet, I would like to make it a bit harder for someone to break into it; I think I owe that to the Wicked Witch of the West.

The other reason is that then we can do the rsync dance automagically, using a script that is run whenever it feels like. In other words, backup. For this discussion we will just cover backup as in copying new stuff over old stuff; incremental backup is doable using rsync but will be the subject for another episode.

So, how are we going to do that? you may ask. Well, ssh allows you to authenticate using public/private key pairs. Before we continue, let's make sure sshd in flyingmonkey is configured to accept them:

bob@flyingmonkey:~$ grep -E 'PubkeyAuthentication|RSAAuthentication' /etc/ssh/sshd_config 
#RSAAuthentication yes
#PubkeyAuthentication yes
#RhostsRSAAuthentication no
bob@flyingmonkey:~$
Since PubkeyAuthentication and RSAAuthentication are set to yes, we are good to go. Now if flyingmonkey runs OSX, you would want to use /etc/ssh/sshd_config instead.

A quick note on ssh keys: they are very nice way to authenticate because they make life of whoever is trying to break into your machine rather hard. Now, just guessing the password does not do you much good; you need to have the key. And, to add insult to injury, you can have a passphrase in the key itself.

Enough digressing. The next step is to create the key pair. The tool I would use in Linux/Solaris/OSX is ssh-keygen because I like to do command line thingies. So, we go back to the host that will be rsnc'ing to flyingmonkey and create it by doing

ssh-keygen -b 4096 -t rsa -C backup-key -f ~/.ssh/flyingmonkey
which will create a 4096 bit (a lot of places still use 1024 and some now are announcing they have new state-of-the-art ultra secure settings of 2048 bits. So unless your server can't handle it, use 4096 or better) RSA key pair called flyingmonkey and flyingmonkey.pub in your .ssh directory:
raub@backup:~$ ls -lh .ssh/flyingmonkey*
-rw------- 1 raub raub 3.2K Dec 31 11:30 .ssh/flyingmonkey
-rw-r--r-- 1 raub raub  732 Dec 31 11:30 .ssh/flyingmonkey.pub
raub@backup:~$
During the creation process, it will ask for a passphrase. Since we are going to have a script using this keypair, it might not make sense to have a passphrase associated to it. Or it might, and there are ways to provide said passphrase to script in some secure way. But this post is getting long so I will stick to the easy basic stuff. If you remember, we said this is a public/private key authentication; that means it uses to keys: public and private. The public is taken to the machine you want to ssh into while the private stays, well, private. Let's look at the public key (it is a single line):
raub@backup:~$ cat .ssh/flyingmonkey.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACACsgpy/ihq31kv+Zji6Eknr46nbyx38uPE54X3STbaNC8oCheulVk
/+bTmrFy8Ne8RcTeWYd93wwabgBVDJYzjnsuwUgBO/JPXE4GiQrcnIz5fPsqJqslYxR5WnuUfkYPsAYgJL33XWZWi
dPu+A38OSxXf7UAfpKe5WXa93knXIERUA7NOzCKO96YzpW96i7LxAs20bsmNAw6bZbrZG8Dn3EssIK8CtEUvw4nWb
uFKZQS+b5AM8q20+IrGGVG193H6Rm3/iw0jip0VQOFozUB6yjToyZ5MTzShjb+f56o3+VUG2Bel7OMDfYXYYEKIoj
+cmTMLP5yu4v1t5dTkN3osneK/+2KHwXFTQY48TqxyqH+ZqEFy2X+kXoKff/89aD8lwj+uYKl6HNKhveKSZMNq/yc
7jCc05hLQCQkyC9/1lY9LI2UMHq2kqsgbdmR3uu3Oua2y1HhyeR9hqP9Om+kLu2K7cIXu5NBO9ro0vWBJII7T+z98
awbGH4jSryOxvAlpTT7d+POev13oOWonIwyTmkT72Q+/qJhPU/Vdtd7n5gSUomRT8dJQH+2hyA8c3+YPSW2VckBY/
Ax5aGX+AFoy1Y6WKpUWMIbwHJqizpdEd3WQWzivR1psfsjFzqrdG5SOSZFH2SHvzdNQOTz0FbYvgBV2Egq7WXv98C
se9ZDx backup-key
raub@backup:~$ 
You probably notices the backup-key string on the end of the file; we put it there using the -C (comment) option. Usually it writes the username@host, which is OK most of the times but I wanted something to remind me of what this key is supposed to do. You can change it later if you want.

So we go back to flyingmonkey and place the contents of the public key, flyingmonkey.pub in ~bob/.ssh/authorized_keys by whatever means you want. cut-n-paste works fine.

cat flyingmonkey.pub >> authorized_keys
also does a great job. Or you can even use ssh-copy-id if you feel frisky. Just remember the contents of flyingmonkey.pub is a single line. Of course, if flyingmonkey is a windows machine, you will do something else probably involving clicking on a few windows, but the principle is the same: get the bloody key into the account in the target host you want to connect to.

Once that is done, connect using ssh by providing the key

ssh -i .ssh/flyingmonkey bob@flyingmonkey.example.com

Can you login fine? Great; now try rsync

rsync -az -e 'ssh -i .ssh/flyingmonkey' bob@flyingmonkey.example.com:pickles /path/to/backup/dir/
Do not continue until the above works. Note in a real script the private key will probably be somewhere only the user which runs the backup script can access.

Limiting

So far so good. We eliminated the need to use a password so we can write a script to use the above. But, we can still ssh using that key to do other things besides just rsync. Time to finally get to the topic of this post.

If the IP/hostname of the host you are backing up flyingmonkey from does not change, you can begin by adding that to the front of the ~bob/.ssh/authorized_keys entry for the flyingmonkey public key. Now, if the backup server is in a private/NATed lan, you want to use the IP for its gateway. In this example, let's say we all all inside a private lan and the IP for backup server is 192.168.42.24:

from="192.168.42.24" ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACACsgpy/ihq31kv+Zji6Eknr46nbyx38uPE54X3STbaNC8oCheulVk
/+bTmrFy8Ne8RcTeWYd93wwabgBVDJYzjnsuwUgBO/JPXE4GiQrcnIz5fPsqJqslYxR5WnuUfkYPsAYgJL33XWZWi
dPu+A38OSxXf7UAfpKe5WXa93knXIERUA7NOzCKO96YzpW96i7LxAs20bsmNAw6bZbrZG8Dn3EssIK8CtEUvw4nWb
uFKZQS+b5AM8q20+IrGGVG193H6Rm3/iw0jip0VQOFozUB6yjToyZ5MTzShjb+f56o3+VUG2Bel7OMDfYXYYEKIoj
+cmTMLP5yu4v1t5dTkN3osneK/+2KHwXFTQY48TqxyqH+ZqEFy2X+kXoKff/89aD8lwj+uYKl6HNKhveKSZMNq/yc
7jCc05hLQCQkyC9/1lY9LI2UMHq2kqsgbdmR3uu3Oua2y1HhyeR9hqP9Om+kLu2K7cIXu5NBO9ro0vWBJII7T+z98
awbGH4jSryOxvAlpTT7d+POev13oOWonIwyTmkT72Q+/qJhPU/Vdtd7n5gSUomRT8dJQH+2hyA8c3+YPSW2VckBY/
Ax5aGX+AFoy1Y6WKpUWMIbwHJqizpdEd3WQWzivR1psfsjFzqrdG5SOSZFH2SHvzdNQOTz0FbYvgBV2Egq7WXv98C
se9ZDx backup-key
This is a small improvement: only host that can connect is the one with this IP, be it legit or faking that. Test it.

Next step is specify which commands that can be run when connected using this key. And that one again will require playing with ~bob/.ssh/authorized_keys. This time we will specify the command:

from="192.168.42.24",command="/home/bob/.ssh/validate-rsync" ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACACsgpy/ihq31kv+Zji6Eknr46nbyx38uPE54X3STbaNC8oCheulVk
[...]
se9ZDx backup-key
And define validate-rsync as
cat > .ssh/validate-rsync << 'EOF'
#!/bin/sh
case "$SSH_ORIGINAL_COMMAND" in
rsync\ --server\ --sender\ -vlogDtprze.iLsf\ .\ pickles)
$SSH_ORIGINAL_COMMAND
;;
*)
echo "Rejected"
;;
esac
EOF
chmod +x .ssh/validate-rsync
And this is where it get really exciting. All that validate-rsync is doing is seeing if the command being sent is not only an rsync command but a specific one. Once we figure out how to get the proper SSH_ORIGINAL_COMMAND, we can change the line
rsync\ --server\ --sender\ -vlogDtprze.iLsf\ .\ pickles)
to what it needs to be to match our backup script and test. Note that if you change the rsync statement, you will need to change the case.

Friday, December 26, 2014

Getting the SSH_ORIGINAL_COMMAND

Let's say you want to have an account you can ssh into but only run very specific commands in it. A good way to achieve that is to write a wrapper script that is called from your authorized_keys file. So you could have a wrapper that looks like this:

#!/bin/sh
case $SSH_ORIGINAL_COMMAND in
    "/usr/bin/rsync "*)
        $SSH_ORIGINAL_COMMAND
        ;;
    *)
        echo "Permission denied."
        exit 1
        ;;
esac
But, what if you really want to be really precise on the command? Using the above example, not only running rsync but also specifying the path and the arguments? You could cheat and find what the command you are sending is supposed to look like by replacing (temporarily) your wrapper script with this
/bin/sh

DEBUG="logger" # Linux
#DEBUG="syslog -s -l note" # OSX

if [ -n "$SSH_ORIGINAL_COMMAND" ]; then
        $DEBUG "Passed SSH command $SSH_ORIGINAL_COMMAND"
elif [ -n "$SSH2_ORIGINAL_COMMAND" ]; then
        $DEBUG "Passed SSH2 command $SSH2_ORIGINAL_COMMAND"
else
        $DEBUG Not passed a command.
fi
Then you run the ssh command and see what it looks like in the log file. Copy that to your original wrapper script, and you are good to go. So
ssh -t -i /home/raub/.ssh/le_key raub@virtualpork echo "Hey"
Results in
Dec 26 13:34:05 virtualpork syslog[64541]: Passed SSH command echo Hey
While
rsync -avz -e 'ssh -i /home/raub/.ssh/le_key' raub@virtualpork:Public /tmp/backup/
results in
Dec 26 13:28:17 virtualpork syslog[64541]: Passed SSH command rsync --server 
--sender -vlogDtprze.iLs . Public
The latter meaning our little wrapper script would then look like
#!/bin/sh
case $SSH_ORIGINAL_COMMAND in
    "rsync --server --sender -vlogDtprze.iLs . Public")
        $SSH_ORIGINAL_COMMAND
        ;;
    *)
        echo "Permission denied."
        exit 1
        ;;
esac

Tuesday, August 27, 2013

Fail2ban and RedHat/CentOS

Fail2ban is another neat intrusion detection program. It monitors log files for suspicious access attempts and, once it has enough of that, edits the firewall to block the offender. The really neat part is that it will unban the offending IP later on (you define how long); that usually will suffice to your garden variety automatic port scanner/dictionary attack but also would give hope to your user who just can't remember a password. There are other programs out there that will deal with ssh attacks, but fail2ban will handle many different services; I myself use it with Asterisk, mail, and web just to name a few.

But, you did not come here to hear me babbling; let's get busy and do some installing, shall we?

Installing fail2ban in RedHat/CentOS

For this example I will be using CentOS 6. YMMV.

  1. Get required packages. Need jwhois (for whois) from base and fail2ban from, say, epel or your favourite repository
    yum install jwhois fail2ban --enablerepo=epel

    whois is needed by /etc/fail2ban/action.d/sendmail-whois.conf, which is called
    by /etc/fail2ban/filter.d/sshd.conf.

    You will also need ssmtp or some kind of MTA so fail2ban can let you know that it caught a sneaky bastard. I briefly mentioned about ssmtp in a previous post; seek and thou shalt find.

  2. Configure fail2ban.
    1. Disable everything in /etc/fail2ban/jail.conf. We'll be using /etc/fail2ban/jail.local:
      sed -i -e 's/^enabled.*/enabled  = false/' /etc/fail2ban/jail.conf
    2. Configure /etc/fail2ban/jail.local. For now, we will just have ssh enabled
      HOSTNAME=`hostname -f`
      cat > /etc/fail2ban/jail.local << EOF
      # Fail2Ban jail.local configuration file.
      #
      
      [DEFAULT]
      actionban = iptables -I fail2ban- 1 -s  -m comment --comment "FAIL2BAN temporary ban" -j DROP
      
      # Destination email address used solely for the interpolations in
      # jail.{conf,local} configuration files.
      destemail = raub@kudria.com
      
      # This will ignore connection coming from our networks.
      # Note that local connections can come from other than just 127.0.0.1, so
      # this needs CIDR range too.
      ignoreip = 127.0.0.0/8 $(dig +short $HOSTNAME)
      
      #
      # ACTIONS
      #
      # action = %(action_mwl)s
      
      #
      # JAILS
      #
      [ssh]
      enabled = true
      port    = ssh
      filter  = sshd
      action   = iptables[name=SSH, port=ssh, protocol=tcp]
                 sendmail-whois[name=SSH, dest="%(destemail)s", sender=fail2ban@$HOSTNAME]
      logpath  = /var/log/secure
      maxretry = 5
      bantime = 28800
      EOF
      Note we are only whitelisting the host itself. You could whitelist your lan
      and other machines/networks if you want. Jail is a fail2ban term that defines a ruleset you want to check for, and ban as needed.
    3. Decide where you want fail2ban to log to. That is done in /etc/fail2ban/fail2ban.local using the logtarget variable. Some possible values could be
      cat > /etc/fail2ban/fail2ban.local << EOF
      [Definition]
      # logtarget = SYSLOG
      logtarget = /var/log/fail2ban.log
      EOF
      The file /etc/fail2ban/fail2ban.conf should provide you with examples on how to set that up.
  3. Enable fail2ban
    service fail2ban restart
    chkconfig fail2ban on
    If you now do
    chkconfig --list fail2ban
    you should then see
    fail2ban        0:off   1:off   2:on    3:on    4:on    5:on    6:off
    And then check the fail2ban log you defined just before for any funny business. If you have set it correctly, you should see an email to destemail saying fail2ban started. Now, you will get one email per jail. So, if you just did the default (ssh), you will get one email that looks like this:

    Hi,
    
    The jail SSH has been started successfully.
    
    Regards,
    
    Fail2Ban.

    When fail2ban bans someone, you will receive an email that looks like this:

    Hi,
    
    The IP 82.205.21.200 has just been banned by Fail2Ban after
    3 attempts against ASTERISK.
    
    
    Here are more information about 82.205.21.200:
    
    % This is the RIPE Database query service.
    % The objects are in RPSL format.
    %
    % The RIPE Database is subject to Terms and Conditions.
    % See http://www.ripe.net/db/support/db-terms-conditions.pdf
    
    % Note: this output has been filtered.
    %       To receive output for a database update, use the "-B" flag.
    
    % Information related to '82.205.16.0 - 82.205.31.255'
    [...]

    Note that it is not the SSH jail but the ASTERISk one; I just want to show a
    different example. Also, the stuff before the banned message is from whois.

    If you do iptables -L, you will see which rule fail2ban added to iptables:

    Chain fail2ban-SSH (1 references)
    target     prot opt source               destination
    DROP       all  --  221.178.164.251      anywhere
    RETURN     all  --  anywhere             anywhere

    Note it creates a chain for each jail.

References