Record of the UNIX Wars: 2016

Saturday, December 31, 2016

Discovering (tagged only?) vlans using tcpdump

Why to use VLANs? well, there is the part of having a bunch of different devices -- computers, printers, toasters -- networks that are connected to the same switches but in different networks so you can control how/if they can see and talk to each other. And those switches are connected to each other and to routers who need to route traffic between the different networks. And that probably means we are trunking and tagging the vlans between those switches and routers. But, if we now add virtual machines to the mix, the physical machines they are running on -- the virtual servers -- need to be able to connect to all the networks their virtual clients belong to. And that once again is done using trunking and vlan tagging.

What if it is not working properly?

Let me elaborate it by using an example: I recently had a router running OpenWRT which I assumed I had configured to handle multiple vlans and was connected to the rest of the network through a 802.1q trunk; think of it as a router-on-a-stick setup. Now, the router was configured to have one IP -- 192.168.2.1 and 192.168.3.1 to make it easier -- in each of the two vlans, which I shall call vlan2 and vlan3. Problem was I could only reach it through vlan2; nothing on vlan3.

First thing I did was to recheck the switch to see if that port was configured for 802.1q trunk. Not only it was but both vlans were tagged; I did not have an untagged/default vlan in this trunk. So for all practical purposes both vlans should look the same for the router. So why weren't they?

At this point I decided to do some packet sniffing. And, this would be the cue to unleash Wireshark on it. But, we are talking about a little router running a OS designed to fit a resource limited device. Why not see if we can use a command line alternative, like the humble tcpdump? It can show you not only any vlan traffic on the wire but also only traffic on a specific vlan just as it can show traffic related to a specific IP or protocol. For instance, since we really only care about traffic on vlan 3 on interface eth0 we can

root@router:~# tcpdump -i eth0.3
tcpdump: WARNING: eth0.3: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.3, link-type EN10MB (Ethernet), capture size 65535 bytes

And then after letting it run for ten minutes, all I got back were tumbleweeds. In the same time frame, if had asked for traffic on vlan 2 it would have filled the screen before I finished this sentence (and, yes, I am a slow typist but still). The logical thing to do now is to see if there is any traffic on the wire that has a 802.1Q tag (vlan):

tcpdump -i eth0 -e -n vlan
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
01:32:27.440782 e2:46:9a:5e:a7:45 > c0:ff:ee:3e:06:eb, ethertype 802.1Q (0x8100)
, length 122: vlan 2, p 0, ethertype IPv4, 192.168.2.1.22 > 192.168.2.249.56762: Flags
 [P.], seq 2026124025:2026124077, ack 2216938037, win 21298, options [nop,nop,TS
 val 346274301 ecr 697366229], length 52
01:32:27.441234 e2:46:9a:5e:a7:45 > c0:ff:ee:3e:06:eb, ethertype 802.1Q (0x8100)
, length 122: vlan 2, p 0, ethertype IPv4, 192.168.2.1.22 > 192.168.2.249.56762: Flags
 [P.], seq 52:104, ack 1, win 21298, options [nop,nop,TS val 346274301 ecr 697366232], length 52
01:32:27.441946 e2:46:9a:5e:a7:45 > c0:ff:ee:3e:06:eb, ethertype 802.1Q (0x8100)
, length 138: vlan 2, p 0, ethertype IPv4, 192.168.2.1.22 > 192.168.2.249.56762: Flags
 [P.], seq 104:172, ack 1, win 21298, options [nop,nop,TS val 346274301 ecr 6973
66232], length 68

Do you see the vlan info on those lines? Let's trim it a bit:

tcpdump -i eth0 -e -n vlan | cut -f 6,10,11 -d ' '
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
802.1Q vlan 2,
^C21 packets captured
29 packets received by filter
0 packets dropped by kernel

So all the traffic we have seen above is for vlan 2. In fact the 192.168.2.1.22 > 192.168.2.249.56762 means someone in 192.168.2.249 is connecting from its port 56762 to port 22 (ssh) on 192.168.2.1, which is the router. And that is indeed the case because that is how I connected to the router!

NOTE: A long time ago there was a bug in tcpdump, which would require you to run it twice,
tcpdump -Uw - | tcpdump -en -r - vlan 
But that is no longer the case.

But, where's vlan 3? That is a great question! At first I thought I had it setup wrong n the router even though both vlans have similar confiugrations, save the network info. And that kept me awake for a few nights.

It turns out that there is a ongoing bug which would not allow me to run more than one tagged vlan. The workaround is to "kick" it by adding a

option enable_vlan4k 1

to /etc/config/network

We feel confident we solved the problem, so instead of listening to all vlans this time we will only listen for vlan 3 traffic:

root@router:~# tcpdump -i eth0.3
tcpdump: WARNING: eth0.3: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.3, link-type EN10MB (Ethernet), capture size 65535 bytes
19:15:25.822964 IP 0.0.0.0 > all-systems.mcast.net: igmp query v2
19:15:25.823010 IP6 fe80::e046:9aff:fe5e:a745 > ff02::1: HBH ICMP6, multicast li
stener querymax resp delay: 10000 addr: ::, length 24
19:15:26.568728 ARP, Request who-has 192.168.3.10 tell 192.168.3.1, length 28
19:15:26.569022 ARP, Reply 192.168.3.10 is-at c0:ff:ee:14:25:ad (oui Unknown), l
ength 42
19:15:26.569092 IP 192.168.3.1.32101 > 192.168.3.10.domain: 29094+ PTR? 1.0.0.0.
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90)
19:15:26.606872 IP 192.168.3.10.domain > 192.168.3.1.32101: 29094 NXDomain 0/1/0
 (160)
19:15:27.797110 IP6 fe80::c2ff:eeff:fe14:25ad > ff02::1:ff14:25ad: HBH ICMP6, mu
lticast listener reportmax resp delay: 0 addr: ff02::1:ff14:25ad, length 24
19:15:28.014715 IP6 fe80::be5f:f4ff:fe54:d78d > ff02::1:ff54:d78d: HBH ICMP6, mu
lticast listener reportmax resp delay: 0 addr: ff02::1:ff54:d78d, length 24
19:15:28.615536 IP 192.168.3.1.16915 > 192.168.3.10.domain: 15895+ PTR? d.a.5.2.
4.1.e.f.f.f.e.e.f.f.2.c.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa. (90)
19:15:28.616059 IP 192.168.3.10.domain > 192.168.3.1.16915: 15895 NXDomain* 0/1/
0 (125)
19:15:28.618233 IP 192.168.3.1.36929 > 192.168.3.10.domain: 54136+ PTR? d.a.5.2.
4.1.f.f.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90)
19:15:28.646935 IP 192.168.3.10.domain > 192.168.3.1.36929: 54136 NXDomain 0/1/0
 (160)

And that is the kind of output I wanted to see!

Moral of the story: when vlan and their trunks get funky it might be time to unleash some packet analysis tool to figure out where in the network is the problem. And, even though Wireshark is a pretty nice tool, sometimes you can get the work done with something a bit more lightweight.

Wednesday, November 30, 2016

Relative paths and roles in Ansible

Here is something I finally understood in Ansible that has been annoying the hell out of me for a while: relative paths. The idea of relative paths is so if you have to tell Ansible to grab a file -- like some config file or even an image or certificate -- and put it somehwere else you do not have to give the entire source path.

Imagine if you wrote your playbook in your Linux desktop which has a task that grabs the file /home/bob/bobs-development-stuff/server.stuff/ansible/roles/webservers/files/main-logo.png and put it somewhere in the target machine. That is a mouthfull of a path. But then since you are clever you have this playbook in an internal git repository and then check it out in your Mac laptop to do some work. Thing is the path now is /Users/bobremote/ansible/roles/webservers/files/main-logo.png and now your task ain't working no more. And that is the short version for why absolute paths in your ansible box are not a great idea.

So we use relative paths instead and the main question of the day is what are those paths relative to?

Do you remember when we talked about creating and uploading ssh key pairs? Well, there was a bit of a fudge right in the key: line

- name: Copy the public key to the user's authorized_keys
    # It also creates the dir as needed
    authorized_key:
      user: "{{user_name}}"
      key: '{{ lookup("file", "roles/files/" ~ user_name ~ ".ssh.pub") }}'
      state: present

On a second thought, it is really not a fudge but it hints on the answer to our question.All we are asking is for the task to grab a file inside the roles/files directory. The original layout for this playbook looks like this

ansible
 hosts
 ansible.cfg
 site.yml
 roles
  defaults
  files
   moose.ssh
   moose.ssh.pub
  handlers 
   main.yml
  meta
  tasks 
   main.yml
  template
  vars

So we really only have just one playbook inside the ansible directory. When we write in the roles/tasks/main.yml tasklist file to grab the .ssh.pub< file, it knows exactly how to get it. With time, we became clever and would like to plan ahead so decided to break the roles into the ones common to every server and the ones that build up on the common one and then do the specialization stuff, say website or database server. So, what we really want is to have a layout like this (inside the same old ansible dir)

site.yml
web.yml
group_vars
roles
 common
   defaults
   files
      epel.repo
      iptables-save
      moose.ssh
      moose.ssh.pub
   handlers 
      main.yml
   meta
   tasks 
     main.yml
   template
   vars
 website
   defaults
   files
   handlers
   meta
   tasks
     main.yml
   template
   vars
 db
   defaults
   files
   handlers
   meta
   tasks
     main.yml
   template
   vars

The playbook that copies the ssh key is in the common role, specifically roles/common/tasks/main.yml (yes I could break it out some more but let's not confuse things just yet and hurt my brain). And we expect if we write our function as

- name: Copy the public key to the user's authorized_keys
    # It also creates the dir as needed
    authorized_key:
      user: "{{user_name}}"
      key: '{{ lookup("file", "./files/" ~ user_name ~ ".ssh.pub") }}'
      state: present

our playbook to create roles/common/files/moose.ssh.pub and then upload to whatever machine we are building. Instead it gives us the following friendly message:

fatal: [ansibletest -> localhost]: FAILED! => {"changed": true, "cmd": "ssh-
keygen -f ./files/\"moose\".ssh -t rsa -b 4096 -N \"\" -C \"moose\"", "delta": "0:
00:01.056746", "end": "2016-11-30 15:55:19.618226", "failed": true, "rc": 1, "s
tart": "2016-11-30 15:55:18.561480", "stderr": "Saving key \"./files/moose.ssh\
" failed: No such file or directory", "stdout": "Generating public/private rsa
 key pair.", "stdout_lines": ["Generating public/private rsa key pair."], "warni
ngs": []}

Clearly we have a bit of a path problem. Don't believe me? Let's ask where the playbook thinks it is running from

- name: Find out where I am locally speaking
    local_action: shell pwd > /tmp/here

All we are doing here is getting the path this is being executed from and saving in a file so we can see it. And see it we shall:

raub@desktop:~/dev/ansible$ cat /tmp/here
/home/raub/dev/ansible
raub@desktop:~/dev/ansible$

My guess is that if we use relative paths, they are all from where we ran the playbook from, /home/raub/dev/ansible in my case. Which really sucks! Now, there is an environmental variable in Ansible called role_path, which I found being mentioned in the docs which should show the path to the role we are currently running. Let's try it out

- name: Find out where I am locally speaking
    debug: msg="Role path is supposed to be {{ role_path }} "

which gives

TASK [website : Find out where I am locally speaking] ******************************
ok: [ansibletest] => {
    "msg": "Role path is supposed to be /home/raub/dev/ansible/roles/website "
}

So, the solution is to rewrite our little function as

- name: Copy the public key to the user's authorized_keys
    # It also creates the dir as needed
    authorized_key:
      user: "{{user_name}}"
      key: '{{ lookup("file", "{{ role_path }}/files/" ~ user_name ~ ".ssh.pub") }}'
      state: present

so the path becomes /home/raub/dev/ansible/roles/website/files/moose.ssh.pub

Saturday, November 26, 2016

Creating a fixed length blank(spaces) string and padding others with spaces in powershell

Powershell has clever ways to create an empty string or string array that can then be increased in size. And then I need a fixed length empty string. Why? I need to create a file where each line is composed by fields which are position dependent. So I need to pad between the fields with blank spaces.

After searching for quite a while I found Ethan's reply on the bottom of thread on limiting a string to N characters. And it was so obvious I had to smack me on the head: if you want to create a string called moo that contains nothing but 24 spaces, you can do it by typing

PS C:\> $moo = " "*24
PS C:\> $moo.length
PS C:\> 24

So let's do something useful with that then!

So the plan is to write a function that given the string that will be used in a given field and the field size, it will return a properly padded string. Now, I think we should also create a second function whose only goal in life is to create a string with N blank spaces. And then the other function can call this. Reason I propose this is in case we need to provide some extra blank spaces because the format of this file might be, well, rather silly. Planning ahead is always a smart move. So, we start with our blank string generating function, which will look very similar to the $moo example above:

function BlankString($size)
{
   return " "*$size
}

If we want to test that we can modify the moo example

$moo = BlankString(24)
$moo.length

Which should return 24 as before. So far so good. Now let's create a function that will return a properly padded field. Something like

function PadString($name, $stringSize)
{
   if($name.length -lt $stringSize)
   {
      $name += BlankString ($stringSize - $name.length)
   }
   return $name
}

looks like a good starting place. Let's try it out:

$noo = PadString "My Head hurts" 48
DebugLine "[$noo]"
$noo.length

Results in

[My Head hurts                                   ]
48

If you are curious, my DebugLine function looks like this

function DebugLine($OutputMessage)
{
   Write-Host $OutputMessage -foregroundcolor red -backgroundcolor yellow
}

The [ and ] are there just so we can see where the string, blank spaces included, begins and ends. If you do not like those characters, change them to something more remarkable, say ---> and <---

Now, what if the string we are trying to put in this new string is bigger than it can handle? This would be the opposite of padding the new string: we need to chop it. So, we modify our PadString function a bit:

function PadString($name, $stringSize)
{
   if($name.length -lt $stringSize)
   {
      $name += BlankString ($stringSize - $name.length)
   }
   else
   {
      $name = $name.Substring(0, $stringSize)
   }
   return $name
}

So, if we redo the same test we did before, but this time we make our resulting string shorter,

$noo = PadString "My Head hurts" 8
DebugLine "[$noo]"
$noo.length

the result is now truncated

[My Head ]
8

And we can still create weird blank spaces using BlankString as needed to meet the file format requirments.

Monday, November 14, 2016

Creating a user with random password using Ansible

So I need to create an user in a machine so I can then have a script that will log into this machine and backup its database. Which database is that, you might ask? For this discussion it does not matter besides that it is running in a Linux box. But, if you want a more specific example, we could be backing up a sqlite database since we talked about how to do the deed before.

Anyway, the plan is to have the script ssh into the database server and grab the backup. Since we are using ssh in the script, we might as well use key pair authentication. Now I have learned that by default if you create a user and do not assign it a password, you will not be able to login as said user using key pair authentication. You can turn that off but I would rather not. Instead, since I am creating said user programmatically, I can give it a long random password.

Now that we have a plan, let's see how to do it in an ansible playbook. I will present only the relevant bits since I do not know how you do your playbooks. So, we could create a user using something like

- name: Create user_name
    user:
      name={{user_name}}
      groups={{user_groups}}
      shell=/bin/bash
      state=present
      append=yes

which would create user user_name who will also belong to the groups defined in user_groups, where user_name and user_groups are variables defined somewhere earlier in the show. And this would create a user without a password, which would do us no good. Nor would us make the playbook stop and ask us to enter a password. We said earlier we are going to create a random password, so let's see if we can make something random enough for our needs.

I plan on generating this random password in the machine we are running ansible on, not the target machine. One of the reasons is that I want to use the Linux command mkpassword to create the password hash (note it is being called using the shell command. So, I will use a local_action to do the deed. For instance, let's say I want the password to be pickles and encoded using SHA-512 hash (mild encryption). I could accomplish it by writing

- name: generate random password for new user
    local_action: shell  mkpasswd --method=SHA-512 pickles
    register: user_pw

This would create a hash, say

$6$rIcep9bGJTUpE$s8pka1dX6gWfyeNfi8YLaqrg/85tgtpJv809AUmO2jHhMQbSUnuNSloJSa6EmOQS02Ek4mvpiIu2DAvA9W0UL0

and assign it to the variable user_pw. This of course has to be done before the user is created. To use it with our new user, we can then modify our little user creation function to something like this:

- name: Create user_name
    user:
      name={{user_name}}
      groups={{user_groups}}
      shell=/bin/bash
      state=present
      append=yes
      update_password=on_create
      password="{{user_pw.stdout}}"

In the last line we are feeding the value of user_pw, user_pw.stdout, to password. But why can't I just feed user_pw? Here's an exercise to you: tell your playbook just print user_pw. Doesn't it look very object-like?

If you run your playbook and all went well, go to the target machine and check if there is a password associated with the user in /etc/shadow. If the user was already created, you will need to delete user and let ansible recreate it.

So we have so far created a way to create a password hash and then create a new user with that password. The last step we need is to make the password random. Here is what I am proposing: how about if we use date since epoch in seconds as our password and then mangle it a bit? Here is a simple mangling example:

raub@desktop:~$ date +%s
1479095572
raub@desktop:~$ date +%s | sha384sum
3d47137f4cd6bfb638deecc661b4e0ae9545ab2454b20eac51b6992807b3f518a49be9ef3b72a5abdc9167e32acc2473  -
raub@desktop:~$ date +%s | sha384sum | md5sum
65db8ecf178a505ea17e9b53cc5adf31  -
raub@desktop:~$ date +%s | sha384sum | md5sum | cut -f 1 -d ' '
b8a49ccaa4721877cf39e510c7ac3622
raub@desktop:~$

Which gives b8a49ccaa4721877cf39e510c7ac3622 as the output, which should be long enough to fulfill our needs. Of course if you run it again, it will spit out a different result, which is what we want? Perfectly random? Not by a long shot, but it is long enough for our needs. Remember: there is nothing saying you have to use the above. Hav efun creating your own function!

So, let's apply that to our little password generating function:

- name: generate random password for new user
    local_action: shell  mkpasswd --method=SHA-512 $(date +%s | sha384sum | md5sum | cut -f 1 -d ' ')
    register: user_pw

And we should be good to go. Here is how the final version should look like in a playbook:

- name: generate random password for new user
    local_action: shell  mkpasswd --method=SHA-512 $(date +%s | sha384sum | md5sum | cut -f 1 -d ' ')
    register: user_pw
# Something might happen here before we create user
- name: Create user_name
    user:
      name={{user_name}}
      groups={{user_groups}}
      shell=/bin/bash
      state=present
      append=yes
      update_password=on_create
      password="{{user_pw.stdout}}"

Now we have an user, we can then create the ssh key pair we talked about in an earlier article. Of course we might edit the ./ssh/authorized_keys file to restrict what that key can do.

Thursday, November 03, 2016

Yum Manually and multiple repos

Quick post (I hope) about something I learned today that has a bit of a Captain Obvious taste to it. But I thought some people might find that amusing... even if it is at my expense.

Like many who use Red Hat products (CentOS comes to mind) and derived distros, I use repos outside the official ones. Because those repos tend to have newer versions of some packages, I try to be careful about only upgrading the packages that are required by the package I added said repo to my list. Short answer is the Law of Unintended Consequences. Long version is that I expect that people building the official packages being quite careful about compatibility and security. So, I should only reach out to the different repos after I found out the official packages do not do what I need.

What I have been doing, and I do not claim it is the best solution, is to install and then disable the non-official repos so if I want something from them I have to specifically ask for it. So, if I want to use the remi repo, I would first disable it

sed -i -e 's/^enabled=1/enabled=0/' /etc/yum.repos.d/remi.repo

and then specifically ask for it to install, say, php

yum install php --enablerepo=remi

which would install the latest PHP version that remi has. On a side note, if you had to install php 5.6 from remi, you would use remi-php56. But, what about upgrading them? After all, yum check-update and yum update by default will not check the disabled repos even if you have packages installed from them. So, you have to use --enablerepo. Now, what I learned today is that you can just list all the repos you are using by separating them with commas.

Let me show you in action: at first we think there are no updates:

raub@pickles ~]$ sudo yum check-update
Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-
              : manager
This system is receiving updates from RHN Classic or Red Hat Satellite.
[raub@pickles ~]$

Now, let's list all the repos we have been using with this machine and see what it tells us:

raub@pickles ~]$ sudo yum check-update --enablerepo=remi-php56,epel,secu
rity_shibboleth
Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-
              : manager
This system is receiving updates from RHN Classic or Red Hat Satellite.
security_shibboleth                                      | 1.2 kB     00:00
security_shibboleth/primary                                |  15 kB   00:00
security_shibboleth                                                       96/96

libcurl-openssl.x86_64                7.51.0-2.1             security_shibboleth
opensaml-schemas.x86_64               2.6.0-1.1              security_shibboleth
shibboleth.x86_64                     2.6.0-2.1              security_shibboleth
xmltooling-schemas.x86_64             1.6.0-1.1              security_shibboleth
[raub@pickles ~]$

As you can see, we do need to update shibboleth and its dependencies! And update we shall:

[raub@pickles ~]$ sudo yum update --enablerepo=remi-php56,epel,security_shibboleth
[...]
[raub@pickles ~]$

Kinda neat, eh?

Monday, October 31, 2016

Creating and uploading ssh keypair using ansible

As some of you have noticed from previous posts, I do use docker. However, I also use Ansible to install packages on and configure hosts. I am not going to talk about when to use each one. Instead, I will try to make another (hopefully) quick post, this time ansible-related.

Let's say I have an user abd I want to set said user up so I can later on ssh to the target machine as that user using ssh key pair. From the machine I am running ansible from. To do so I need to create a ssh keypair and then put the public key to the user's authorized_keys file. And that might require creating said file and ~/.ssh if not available. Now that is a mouthfull so let's do it in steps.

Decide where to put the key pair: This is rather important since we not only need to have it somewhere we can find when we aare ready to copy it to the target server but also we might want not to recreate the key pair every time we run this task.
For now I will put it in roles/files/, which is not ideal for many reasons. Just to name one, we might want to put it somewhere encrypted or with minimum access by suspicious individuals. But for this discussion it is a great starting place for the key pair. So, let's see how I would verify if the pair already exists or not:
```
- name: Check if user_name will need a ssh keypair
    local_action: stat path="roles/files/{{user_name}}.ssh"
    register: sshkeypath
    ignore_errors: True
```
The first line (name) writes something to the screen to show where we are in the list of steps. It is rather useful because when we run ansible we then can see which task we are doing. The local_action uses stat to determine if the file roles/files/{{user_name}}.ssh exists. The return/result code is then written to the variable ssheypath. Finally I do not care about any other messages from stat.
Create key pair: there are probably really clever ways that are purely ansible native to do it, but I am not that bright so I will use openssh instead, specifically ssh-keygen. Ansible has a way to call shell scripts, the shell module. Now, I want to run tt>ssh-keygen on the machine I am running ansible from, the local machine in ansible parlance. And for that reason we have the local_action command. Here is one way to do the deed:
```
- name: Create user_name ssh keypair if non-existent
    local_action: shell ssh-keygen -f roles/files/"{{user_name}}".ssh -t rsa -b 4096 -N "" -C "{{user_name}}"
    when: not sshkeypath.stat.exists
```
As you guessed, we are using ssh-keygen to create the key pair roles/files/user_name.ssh and roles/files/user_name.ssh.pub. The other ssh-keygen are just because I would like my keys to be a bit more secure than those created by the average bear. So far so good: we are here using local_action and shell like in the previous step.
The interesting part if the when statement. What it is saying is to only create the key pair if it does not already exist. And that is why we did check if roles/files/user_name.ssh existed in the previous step. Of course that meant we assumed that if roles/files/user_name.ssh existed, roles/files/user_name.ssh.pub must also exist. There are ways to check for both but I will leave that to you.
Copy public key to remote server: Now we know we have a ssh key pair, we do need to copy it. You can do it using the file module, which will require a few extra tasks (create the ~/.ssh directory with the proper permissions comes to mind). Or, you can use the appropriately named authorized_key module. I am lazy so I will pick the last option.
```
- name: Copy the public key to the user's authorized_keys
    # It also creates the dir as needed
    authorized_key:
      user: "{{user_name}}"
      key: '{{ lookup("file", "roles/files/" ~ user_name ~ ".ssh.pub") }}'
      state: present
```
What this does is create ~/.ssh and ~/.ssh/authorized_keys and then copy the key to ~/.ssh/authorized_keys. The most important statement here is the key one. Since it expects us to pass the key as a string, we need to read the contents of the public key. And that is why we called upon the lookup command; think of it here as using cat. And, why we have all the crazy escaping in "roles/files/" ~ user_name ~ ".ssh.pub").

And that is pretty much it. If you want to be fancy you could use a few local_actions to copy the private key to your (local) ~/.ssh and then create/edit ~/.ssh/config to make your life easier. I did not do it here because this user will be used as is by ansible; I need no access to this user as me outside ansible.

Wednesday, September 28, 2016

DHCP renewal, dynamic DNS, and changing network cards

Another adventure that started with a simple and quick plan and then felt into the rabbit hole! If you did not get the hint, this is one of those examples of why proper documentation is important. Or, what happens when I rely on my memory...

I built a windows VM inside VMWare ESXi a while ago called dr-zaius. When I did the deed, I setup it to use an emulated Intel E1000-series ethernet card. Fast forward and I learned VMWare recommends to use their native NIC, vmxnet3. I am fine with that since that is what I do in KVM, namely use the native interface to the hypervisor and it does result in increased performance. So let's go over the steps for doing the deed.

Note: The following assumes you have the rights to edit a vm client's configuration in ESXi. If not, just contact who does and explain what you want to accomplish.

The Steps

We are going to be lazy and use the vsphere client. Which we will be using while the vm is up and running. When we select the Edit Virtual Machine Settings option, we have the option to add an additional interface, so let's add it and later on delete the old one.
By default in this version of the vsphere client, we are offered the E1000E emulated interface. And that is not what we want so we need to change the interface as shown on the right (they are calling it an adapter) and select the vmxnet3 one. Don't forget to select the proper VLAN -- Network Label in this dialog box -- while you are there. Also notice by default it will be connected at power on, which is what we want in this article. And then hit the Next button until it finally gets to the screen saying the deed is done.
As soon as we do that, login to the windows machine via console. It will tell us it detected a new device and will start with its driver installation dance. Once that is done, this vm client will end up with two network interfaces. Now we need to set it up.
The best thing to do is disable the old interface -- you are accessing it from console -- inside the vm client, just as you normally do in a physical host. If you use static IP, you might want to delete the IP or change it to a bogus address so the other interface will not get pissed (Windows does like to be very helpful).
Now configure the new interface, be it to use static IP or DHCP. In this example we will be using DHCP.
After testing everything works, we can then shut the vm client down at our convenience and remove the old network interface using the vmsphere client.

And that should be the end of this article. Maybe in another article we can talk about doing the same deed in KVM (and whatever else I get my hands on).

But things did not happen according to the plan

I am not very good with words, so let me just show it:

raub@desktop:~$ ping dr-zaius
PING dr-zaius.in.example.com (192.168.42.105) 56(84) bytes of data.
From desktop.in.example.com (192.168.42.102) icmp_seq=1 Destination Host Unreachable
From desktop.in.example.com (192.168.42.102) icmp_seq=2 Destination Host Unreachable
From desktop.in.example.com (192.168.42.102) icmp_seq=3 Destination Host Unreachable
^C
--- dr-zaius.in.example.com ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3036ms
pipe 3
raub@desktop:~$

But when I login to dr-zaius from a console, I get the IP for its new interface is 192.168.42.126, which seems to work fine when I probulate using netcat:

raub@desktop:~$ nc -v 192.168.42.126 3389
Connection to 192.168.42.126 3389 port [tcp/*] succeeded!
^C

raub@desktop:~$

Er, what is going on? So, when I went to the dhcp/dns server (nameserver) to check. It uses the ISC DHCPD and BIND (I never knew it was an acronym until I wrote this article. Who said you can't learn anything in the web?) to do its thing, and they are configured to do dynamic DNS. So, we cannot just grab the zone files

[root@nameserver ~]# ls /var/named/data/in.example.com.*
/var/named/data/in.example.com.db
/var/named/data/in.example.com.db.jnl
/var/named/data/in.example.com.rev.db
/var/named/data/in.example.com.rev.db.jnl
[root@nameserver ~]#

and edited them. The telltale hint we are doing some dynamic DNS'ing is the journal (.jnl) files. Let's first stop the DNS updates in both forward and reverse zones.

rndc freeze in.example.com
rndc freeze 42.168.192.in-addr.arpa

We now can take a look at the zone files. Here is the entry for dr-zaius in the forward zone file:

dr-zaius                A       192.168.42.105
                        TXT     "316335b72f00f1bf82eb484894d5263cdf"

I do not know about you but it sure seems like it still thinks dr-zaius's IP is ,tt>192.168.42.105. so, we delete those two lines (I like to also update the serial entry on the top of the file just in case). The reverse zone is a bit easier as it has only one line

126                     PTR     dr-zaius.in.example.com.

to be deleted. Once those are done, we can let dynamic DNS take place in that zone again.

rndc thaw 42.168.192.in-addr.arpa
rndc thaw in.example.com

The log seems to show that it is now properly associating the hostname to the IP

Sep 23 13:10:47 nameserver named[1536]: client 192.168.55.10#48383: updating zone 
'in.example.com/IN': adding an RR at 'dr-zaius.in.example.com' A
Sep 23 13:10:47 nameserver named[1536]: client 192.168.55.10#48383: updating zone 
'in.example.com/IN': adding an RR at 'dr-zaius.in.example.com' TXT
Sep 23 13:10:47 nameserver dhcpd: Added new forward map from dr-zaius.in.example.com 
to 192.168.42.126
Sep 23 13:10:47 nameserver named[1536]: client 192.168.55.10#51133: signer 
"dhcpupdate" approved
Sep 23 13:10:47 nameserver named[1536]: client 192.168.55.10#51133: updating zone 
'42.168.192.in-addr.arpa/IN': deleting rrset at '126.42.168.192.in-addr.arpa' PTR
Sep 23 13:10:47 nameserver named[1536]: client 192.168.55.10#51133: updating zone 
'42.168.192.in-addr.arpa/IN': adding an RR at '126.42.168.192.in-addr.arpa' PTR
Sep 23 13:10:47 nameserver dhcpd: added reverse map from 126.42.168.192.in-addr.arpa. 
to dr-zaius.in.example.com.
Sep 23 13:10:47 nameserver dhcpd: DHCPREQUEST for 192.168.42.126 from 
00:0c:29:84:95:8c (dr-zaius) via eth0
Sep 23 13:10:47 nameserver dhcpd: DHCPACK on 192.168.42.126 to 00:0c:29:84:95:8c 
(dr-zai$s) via eth0

So let's try again. nameserver seems to be able to resolve the hostname and IP:

[root@nameserver ~]# nslookup 192.168.42.126
Server:         127.0.0.1
Address:        127.0.0.1#53

126.0.0.10.in-addr.arpa name = dr-zaius.in.example.com.

[root@nameserver ~]# nslookup 192.168.42.126
Server:         127.0.0.1
Address:        127.0.0.1#53

126.0.0.10.in-addr.arpa name = dr-zaius.in.example.com.

[root@nameserver ~]#

Now we go back to the desktop and try again

raub@desktop:~$ nslookup dr-zaius
Server:         192.168.4.1
Address:        192.168.4.1#53

Non-authoritative answer:
Name:   dr-zaius.in.example.com
Address: 192.168.42.105

raub@desktop:~$

What is going on here? I thought I had taken care of this! And who is this 192.168.4.1? nameserver is at 192.168.55.10! We probably need to see which DNS servers desktop is using

raub@desktop:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.4.1
nameserver 192.168.55.10
nameserver 192.168.4.1
search in.example.com mgmt.example.com
raub@desktop:~$

A Candle just lit above mt head! Now things make sense! You see, desktop is dual homed in both 192.168.42.0/24 and 192.168.4.0/24 networks, the latter is called mgmt.example.com. desktop also gets its IPs on both networks using dhcp and as a result its nameservers. I do not know why /etc/resolv.conf has two entries for 192.168.4.1, the nameserver as seen in 192.168.4.0/24, but the point is that it is the first nameserver. And, this is not an IP used by the real nameserver. You see, the Juniper router that is doing all this routing is configured to be a forwarding DNS server for a few networks, 192.168.4.0/24 included. And it seems it has not updated its cache. Let's see if I am right:

raub@desktop:~$ nslookup dr-zaius 192.168.4.1
Server:         192.168.4.1
Address:        192.168.4.1#53

Non-authoritative answer:
Name:   dr-zaius.in.example.com
Address: 192.168.42.105

raub@desktop:~$ nslookup dr-zaius 192.168.55.10
Server:         192.168.55.10
Address:        192.168.55.10#53

Name:   dr-zaius.in.example.com
Address: 192.168.42.126

raub@desktop:~$

So, all I have to do is wait a bit for the router to catch up.

I feel better now

PS: if you feel that I brushed over the setup of DHCPD and BIND, or how to configure a Juniper router, you are quite right. Those topics, while interesting, were not what we are trying to solve here. And what is that since it seems changing the Windows vm client interface in VMWware is not it? Actually, it is. But things sometimes do not happen according to the plan and we have to figure out what is going on. In this case, the culprit was DNS lease renew times at the DNS primary and forwarding (the router in my example) servers. If you think every time I set it up it is all rainbows and unicorns and things work perfectly at the first time, I have some news for you sunshine.

Wednesday, September 14, 2016

Setting up atftp in (ubuntu) Linux

I have a switch whose firmware was in dire need of being updated. Thing is, the switch will only take the upgrade firmware if it is offered by a tftp server, don't ask me why. And that is something I tend not to have running; last time I used one was to network boot Solaris boxes. Sounds like an excuse to write another article.

Just to be different I will be deploying this on a Ubuntu Linux host instead of a CentOS/RedHat one as I usually do. Why? Doing the same all the time gets boring quickly. So, let's see which versions of tftpd we can pick and choose:

raub@desktop:~$ apt-cache search tftpd
tftpd-hpa - HPA's tftp server
atftpd - advanced TFTP server
libnet-tftpd-perl - Perl extension for Trivial File Transfer Protocol Server
tftpd - Trivial file transfer protocol server
uec-provisioning-tftpd - the UEC Provisioning TFTP server
raub@desktop:~$

After careful scientific consideration, the atftpd one sounds more interesting (Multi-threaded TFTP server implementing extension and multicast), so we will pick that one. I think this is the part of the show in which we go through the steps to do the deed:

A good place to start is to install it. I like command line and apt-get, so I think
```
sudo apt-get install atftpd
```
should do the trick. Of course you can use aptitude or the GUI. But I am lazy.
Traditionally the directory used to put stuff that will be shared by tftp is /tftpboot, but current practice is to use /srv/tftp. In fact, atftpd does create /srv/tftp for you. For the same of showing how to customize things, let's say we want to be old school. And that means creating /tftpboot ourselves:
```
sudo mkdir /tftpboot
sudo chmod -R 777 /tftpboot
sudo chown -R nobody:nogroup /tftpboot
```
This might be a good time to put the files we want to be shared, in this example a file called image.bin in /tftpboot
We need to configure it by editing /etc/default/atftpd. Here is what mine looks like
```
raub@desktop:~$ cat /etc/default/atftpd
USE_INETD=false
OPTIONS="--tftpd-timeout 300 --retry-timeout 5 --port=69 --mcast-port 1758 
--mcast-addr 192.168.0.0-255 --mcast-ttl 1 --maxthread 100 --verbose=7 /tftpboot"
raub@desktop:~$
```
where:
- --port=69 we are forcing it to use the default tftp port
- --mcast-addr 192.168.0.0-255 specifies the multicast address range we will be using. Being lazy, I am using the entire 192.168.0.0/24 range
- /tftpboot is the directory we will be sharing as explained above. By default the config file specifies /srv/tftp which means if we put our file in /tftpboot we would get a message like
```
Sep 13 13:54:00 desktop atftpd[28045]: File /srv/tftp/image.bin 
not found
```
  when we try to fetch the file
- --verbose=7 is the highest amount of verbose we can use. By default its value is set to 5.
Once it starts properly (service atftpd start should be enough to start it), you should see something like
```
raub@desktop:~$ ps -ef|grep ftp
raub     16510 11504  0 15:26 pts/11   00:00:00 grep ftp
nobody   28161     1  0 Sep13 ?        00:00:00 /usr/sbin/atftpd --daemon --tftpd-timeout 300 --retry-timeout 5 --port=69 --mcast-port 1758 --mcast-addr 192.168.0.0-255 --mcast-ttl 1 --maxthread 100 --verbose=7 /tftpboot
raub@desktop:~$
```

How to get the file using tftp is beyond this discussion because it depends on your OS and the tftp client you are using. For instance the switch might show a webpage where you can configure the name of the tftp server -- 192.168.0.102 in my example -- and the name of the file you want to grab. What is more interesting is to see how the entire enchilada from we started the tftp server until we get the file image.bin looks like. By default (can be configured) we would see that in /var/log/syslog:

Sep 13 13:55:52 desktop systemd[1]: Starting LSB: Launch atftpd server...
Sep 13 13:55:52 desktop atftpd[28160]: Advanced Trivial FTP server started (0.7)
Sep 13 13:55:52 desktop atftpd[28153]: Starting Advanced TFTP server: atftpd.
Sep 13 13:55:52 desktop atftpd[28161]:   running in daemon mode on port 69
Sep 13 13:55:52 desktop atftpd[28161]:   logging level: 7
Sep 13 13:55:52 desktop atftpd[28161]:   directory: /tftpboot/
Sep 13 13:55:52 desktop atftpd[28161]:   user: nobody.nogroup
Sep 13 13:55:52 desktop atftpd[28161]:   log file: syslog
Sep 13 13:55:52 desktop atftpd[28161]:   not forcing to listen on local interfaces.
Sep 13 13:55:52 desktop atftpd[28161]:   server timeout: Not used
Sep 13 13:55:52 desktop atftpd[28161]:   tftp retry timeout: 5
Sep 13 13:55:52 desktop atftpd[28161]:   maximum number of thread: 100
Sep 13 13:55:52 desktop atftpd[28161]:   option timeout:   enabled
Sep 13 13:55:52 desktop atftpd[28161]:   option tzise:     enabled
Sep 13 13:55:52 desktop atftpd[28161]:   option blksize:   enabled
Sep 13 13:55:52 desktop atftpd[28161]:   option multicast: enabled
Sep 13 13:55:52 desktop atftpd[28161]:      address range: 192.168.0.0-255
Sep 13 13:55:52 desktop atftpd[28161]:      port range:    1758
Sep 13 13:55:52 desktop systemd[1]: Started LSB: Launch atftpd server.
Sep 13 13:55:59 desktop atftpd[28161]: socket may listen on any address, including broadcast
Sep 13 13:55:59 desktop atftpd[28161]: Creating new socket: 192.168.0.102:45115
Sep 13 13:55:59 desktop atftpd[28161]: Serving image.bin to 192.168.0.3:2295
Sep 13 13:56:03 desktop atftpd[28161]: End of transfer
Sep 13 13:56:03 desktop atftpd[28161]: Server thread exiting

The underline on the entry when the file image.bin is transferred was added to make it easy to see. And that is pretty much all I had to do. Once file was transfered, I stopped atftpd and then remove it

sudo apt-get remove --purge atftpd

because I do not like to have unused services running.

PS: Always backup your switch/network device's config before upgrading its firmware in case it reverts to default as part of the upgrade process. Guess who forgot to do that? Good thing I had good notes and could reconfigure it using the time-honored cut-n-paste technique

Wednesday, August 31, 2016

Uninstalling annoying packages using SCCM

It seems this week is Bad Windows Developer Week for me.

Do you remember that unnamed software package that forced me to create some crappy code to convert UTF-8 to ASCII7? Well, I do need to upgrade it. And plan on letting SCCM do most of the heavy lifting.

Now, sensible developers will make their upgrade package uninstall the old one by itself. Or maybe installed the packages using wmic so we can query them

and uninstall them

And then there are certain software installers that laugh at standards and place their packages wherever they feel like. In a previous article we talked about how to find how to uninstall those packages, so let's apply that knowledge here.

For this article we will create a windows batch file to uninstall the program. In a future article we will do the same using powershell. The command we will use to look into the registry is, surprisingly enough, reg query which needs to know the path in the registry

C:\Users\raub\dev> reg query "HKLM\software\Wow6432Node\Microsoft\Win
dows\CurrentVersion\Uninstall\Careless Client 12.3.4_r3" /v UninstallString

HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\
Careless Client version 12.3.4 r3
    UninstallString    REG_SZ    "C:\Windows\unins000.exe"

C:\Users\raub\dev>

Here is what it looks like in a batch file:

@echo off

set _uninst_string=reg query "HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\
Windows\CurrentVersion\Uninstall\Careless Client version 12.3.4_r3" 
/v UninstallString

for /f "tokens=3" %%R in ('%_uninst_string% ^| find "C:"') do (set pickles=%%R)

rem pickles here is the command. To run it we might want to do something like
start /wait %pickles% /SILENT

Note: The set _uninst_string line ends at UninstallString; I just split it into 3 lines to make it easier to read it.

Tuesday, August 30, 2016

Get info on installed packages using Powershell (and maybe also WMIC)

Warning: Expect this article to be rather vague

From what I understand, when you install a program in Windows using some kind of installing package, it writes some info about the program in the registry. And that allows us to go to the control panel and see the program being listed there.

You are probably waiting for me to ask the question I seem to ask a lot here: Can we do it from command line? Good question! First candidate as the command of choice is wmic; it really allows you to get (and set) a lot of info about the computer. Let's use some examples to see what we can do: we shall begin by asking it which CPU we have

C:\Users\raub\dev> wmic cpu get name
Name
Intel(R) Core(TM) i5-4570T CPU @ 2.90GHz

C:\Users\raub\dev>

Which bios version do we have?

C:\Users\raub\dev> wmic bios get version
Version
LENOVO - 14b0

C:\Users\raub\dev>

What about who made the motherboard?

C:\Users\raub\dev> wmic baseboard get manufacturer
Manufacturer
LENOVO

C:\Users\raub\dev>

From the previous command, we kinda expected this to be a Lenovo. So, let's call this verifying that we are in fact talking to a Lenovo. Now, would it tell us which motherboard is in the machine?

C:\Users\raub\dev> wmic baseboard get model
Model


C:\Users\raub\dev>

So it is hiding info from me. Bastard! Let's see what else we can find out about the motherboard then:

C:\Users\raub\dev> wmic baseboard get /?

Property get operations.
USAGE:

GET [<property list>] [<get switches>]
NOTE: <property list> ::= <property name> | <property name>,  <property list>

The following properties are available:
Property                                Type                    Operation
========                                ====                    =========
ConfigOptions                           N/A                     N/A
Depth                                   N/A                     N/A
Description                             N/A                     N/A
Height                                  N/A                     N/A
HostingBoard                            N/A                     N/A
HotSwappable                            N/A                     N/A
InstallDate                             N/A                     N/A
Manufacturer                            N/A                     N/A
Model                                   N/A                     N/A
Name                                    N/A                     N/A
OtherIdentifyingInfo                    N/A                     N/A
PartNumber                              N/A                     N/A
PoweredOn                               N/A                     N/A
Product                                 N/A                     N/A
Removable                               N/A                     N/A
Replaceable                             N/A                     N/A
RequirementsDescription                 N/A                     N/A
RequiresDaughterBoard                   N/A                     N/A
SKU                                     N/A                     N/A
SerialNumber                            N/A                     N/A
SlotLayout                              N/A                     N/A
SpecialRequirements                     N/A                     N/A
Status                                  N/A                     N/A
Tag                                     N/A                     N/A
Version                                 N/A                     N/A
Weight                                  N/A                     N/A
Width                                   N/A                     N/A

The following GET switches are available:

/VALUE                       - Return value.
/ALL(default)                - Return the data and metadata for the attribute.
/TRANSLATE:<table name>      - Translate output via values from <table name>.
/EVERY:<interval> [/REPEAT:<repeat count>] - Returns value every (X interval) seconds, If /REPEAT specified the command
is executed <repeat count> times.
/FORMAT:<format specifier>   - Keyword/XSL filename to process the XML results.

NOTE: Order of /TRANSLATE and /FORMAT switches influences the appearance of output.
Case1: If /TRANSLATE precedes /FORMAT, then translation of results will be followed by formatting.
Case2: If /TRANSLATE succeeds /FORMAT, then translation of the formatted results will be done.

PS C:\Users\mtavares\dev>

From the above, we could run

wmic baseboard get /all

to see all the properties and their current values. It does not look very pretty unless you format it (option /FORMAT), but you can see it has a lot of potential. In fact, here is a list of other interesting queries you can do in wmic.

All that is great, but what about packages since that is the subject of this article? Another very good question. Let's pick an example from the control panel and see if we can find it using wmic. The example I will use is the TightVNC, whose publisher is GlavSoft LLC:

C:\Users\raub> wmic product where "vendor like '%%glav%%'" get name
Name
TightVNC

C:\Users\raub>

Real life example

It is real life but I changed the name to protect the guilty

Let's try it with a program I am interested on. We use a program called Careless Data, which is produced by Cargo Cult Development. According to the control panel, it consists of the following packages:

Careless Client: The program the users run to connect to the Careless Server.
Careless Data: Data from Careless Server that is temporarily stored in an unencrypted flat file.
Careless Upgrades: The upgrade package that brought it to version 12.3.4 r3. Well, the Careless Client binary was also upgraded.
Careless Dependencies: Support libraries for the Careless Client. It is actually seen as two distinct packages, CarelessDependencies and CarelessDependenciesMSI

Let's see if we can find all those packages using wmic:

C:\Users\raub>  wmic product where "vendor like '%%cargo%%'" get name
Name
CarelessDependenciesMSI
CarelessData

C:\Users\raub>

Houston we have a problem: three out of five the packages are missing. What is going on here?

Enter Powershell

So I got annoyed and decided to go back to what I know more: Powershell. Short version (this article is getting long): I knew the information about installed packages was in HKLM:\Software, so I searched for a few installed packages and found some info in

HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall
HKLM:\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\U‌ninstall

Here is an example

PS C:\Users\raub> get-itemproperty -path 'HKLM:\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\{09DA5EE2-7E46-4DC4-96F9-BFEE50D40659}' 


PSPath       : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\{09DA5EE2-7E46-4DC4-96F9-BFEE50D40659}
PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall
PSChildName  : {09DA5EE2-7E46-4DC4-96F9-BFEE50D40659}
PSDrive      : HKLM
PSProvider   : Microsoft.PowerShell.Core\Registry
DisplayName  : Citrix Online Launcher
[...]


PS C:\Users\raub>

I know I did not show it above but one of the attributes we can get, besides PSPath and DisplayName, is UninstallString. so, I think we could write a script that would search those paths for programs and fetch for some attributes. Let's use the vendor name as the search criteria:

param($vendor)

$locations = ("software\microsoft\windows\currentversion\uninstall", "software\Wow6432Node\Microsoft\Windows\CurrentVer
sion\Uninstall")
$count = 0

foreach ($location in $locations)
{
   foreach ($obj in (get-childitem "hklm:\$location" | get-itemproperty | where {$_.publisher -match $vendor} ))
   {  
      write-host "Name: $($obj.displayname)"
      write-host "   PSChildName: $($obj.PSChildName)"
      write-host "   PSPath $($obj.PSPath)"
      write-host "   Publisher: $($obj.Publisher)"
      write-host "   Version: $($obj.DisplayVersion)"
      write-host "   BuildVersion: $($obj.Version)"
      write-host "   UninstallString: $($obj.UninstallString)"

      $global:count++
   }
}

write-host "Done"
write-host " $($count) entries found"

Now let's see what we can find about our careless program:

.\findinstalledprogram.ps1 cargo
Name: Careless Client 12.3.4
   PSChildName: Careless Client 12.3.4_r3
   PSPath Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Careless Client 12.3.4_r3
   Publisher: CargoCultDevelopment LLC.
   Version:
   BuildVersion:
   UninstallString: "C:\Windows\unins000.exe"
Name: CarelessDependenciesMSI
   PSChildName: {45B39321-107B-4F40-A7DC-A4CB4BFC3051}
   PSPath Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\{45B39321-107B-4F40-A7DC-A4CB4BFC3051}
   Publisher: CargoCultDevelopment
   Version: 1.00.0000
   BuildVersion: 16777216
   UninstallString: MsiExec.exe /I{45B39321-107B-4F40-A7DC-A4CB4BFC3051}
Name: CarelessDependencies
   PSChildName: {4509C469-6B21-4999-BE60-3449EE7B6EDF}
   PSPath Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\{4509C469-6B21-4999-BE60-3449EE7B6EDF}
   Publisher: CargoCultDevelopment.com LLC
   Version: 1.00.0000
   BuildVersion: 16777216
   UninstallString: MsiExec.exe /I{4509C469-6B21-4999-BE60-3449EE7B6EDF}
Name: CarelessUpgrades
   PSChildName: {E20A19AD-27C1-4F59-AED2-A04636BD2575}
   PSPath Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\{E20A19AD-27C1-4F59-AED2-A04636BD2575}
   Publisher: CargoCultDevelopment LLC.  
   Version: 12.3.4
   BuildVersion: 167772240
   UninstallString: MsiExec.exe /I{E20A19AD-27C1-4F59-AED2-A04636BD2575}
Name: CarelessData 12.3.4 r3
   PSChildName: {EA737DCD-71C9-4a06-97B5-BB2D4EE45564}_r3
   PSPath Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\{EA737DCD-71C9-4a06-97B5-BB2D4EE45564}_r3
   Publisher: CargoCultDevelopment, LLC
   Version:
   BuildVersion:
   UninstallString: "C:\Program Files (x86)\CargoCultDevelopment\unins000.exe"
Done
 5 entries found

Back to WMIC and why you should not use it

Let's revisit using wmic again.

What is happening is that our syntax is using a WMI class, Win32_Product, that only displays products installed using Windows Installer. Now, we really should not be using Win32_Product because querying Win32_Product is quite dangerous because every time you run a query (wmic product is really a macro to wmic "select * from Win32_Product"), it will cause a consistency check of all the packages installed, leading to it trying to be helpful and trying to verify and repair the install. And that opens the door to some corruption. Microsoft suggests to use Win32reg_AddRemovePrograms, which may not show everything from wmic.

Bottom line, wmic is great but looking for package info is better done by Powershell.

Monday, August 29, 2016

Poor Man's Downconverting UTF-8 to ASCII7 in powershell

I will warn you this is really lousy and inelegant code. But I needed something quick. I will also warn that this was going to be focus primarily on doing the UTF-8 to ASCII7 conversion, but it went a bit sideways with me learning something new about Powershell. If you feel a bit disappointed, oh well.

I will do my best take the high road and not bitch and make snide comments about a certain software company whose product caused me to write this script. However, after rereading this article I think I failed miserably. You have been warned!

The Problem

So, I am dealing with yet a not so bright piece of software. Specifically, one that has a database whose fields for usernames, addresses, and even phone numbers and such things assume that those users are all living in the United States and have names like Bob or Sue or John or Travis (OK, maybe not that one since it has two syllables) who lived in towns like Cleveland or Tampa. This vendor developed the code in early 2000(!) hoping it would never have to deal with umlauts and accents and other characters. Probably they did that to save space since an ASCII7-only DB field is smaller than a UTF-aware one. Imagine how much money they would make their customers save on storage!

Then someone told their customers that there are lands past the dragons and the turtles, where strange people with mysterious habits and incomprehensible languages roam.

Like Sweden.

Imagine how shocked they were when now their software had to deal with names like Фёдор, Ljungström, Simón, 宮崎, and Häyhä and towns like Malmö. To address that, in 1987 a formal work group, which later became the Unicode Consortium was formed and, anfter much head-smacking, released its first version in 1991. Unfortunately, someone failed to mention that to this company. Fear not, however, for they were not the first and will not be the last to develop their code in ASCII and then realize they have to support other languages.

This is all nice and all, but we feed data to this program as text files. Which is in UTF-8 format since it contains people and places with names using non-ASCII characters. And as a result the database gets very unhappy.

The Solution

I called them to see if they had plan on supporting Unicode in the future. And they replied. Bottom line is they will not change their code to handle Unicode. So, if we are going to use this program we need to massage the input data.

As many times before in many different operating systems, if we are going to do this, it better be doable using some kind of scripting language. Being this Windows my language of choice will once again be Powershell. Before we write a single line of code, let's do something dangerously subversive and think about what we need to do.

We need to detect the non-ASCII characters. Specifically we are talking about the characters that are not within the original 7bit ASCII (characters 0 to 127 in decimal, or 0x00 and 0x7F respectively in hexadecimal) table; we talked a bit before about detecting different files formats. From there we know the UTF-8 characters we are looking for are represented by a pair: "ö" has a UTF8 Hex of 0xc3b6. Do note both characters > 0x7F.
Once we detect said character pair, we need to do something with them. We are going to convert them into a single or a combination of characters, and that is where it gets interesting. Take our friend ö: should it become o or oe? And the answer is it depends.
Some countries came up with their own representation in ASCI7 of their non-ASCI7 characters. And they are not really universal. Also, if the import file uses some kind of fixed column formatting, we have to convert it to one single character. This is really not a coding problem; we can easily create a lookup table to replace the character. So I will just leave this here in the open for you to ponder deeply about. In this article we will cheat, and badly.

Let's talk about that cheating. Instead of bothering to do some proper converting, all we will do is replace the non-ASCII7 char pair with a single question mark (63 or 0x3F). This way, whoever sees the data will know we did something there and can (if they are the affected user) replace that "?" with whatever character they feel like. Talk about passing the hot potato around...

So, if we find a character whose decimal value is > 127, we can replace it with a "?" (decimal 63). Of course, being it UTF-8 the next character in the pair is also > 127. So, we need to replace the pair of characters with a single "?".

With that in mind, let's try coding that:

param($infile) 

# Convert UTF-8 document into ASCII7
function UTF2ASCII($encoded)
{
   $oldi = ''
   $converted = ''

   foreach ($line in $encoded)
   {
      # Convert each character in $line into a decimal number
      foreach($i in [int[]][char[]]$line)
      {
         # 0 <= ASCII7 <= 127
         if ($i -gt 127)
         {
            # KLUDGE: only change to "?" if previous char isn't.
            if ($oldi -ne 63)
            {
               $i = 63
            }
            else
            { continue }
         }
         $converted = $converted + [char]$i
         $oldi = $i
      }

   }

   return $converted
}

# Read in the contents of $infile and feed them to our function
UTF2ASCII(get-content $infile)

The if statements are the lousiest part of the code, specially the part that assumes if the previous char is "?" we must be dealing with the two-bit representation of the UTF-8 character; that is a rather tall assumption but I was in a hurry and could not think of an edge case. Now that I admitted my kludge, let's test it. The way you run this script is

utf2ascii.ps1 infile

which will output the modified text to the screen. If we want to save it into a file,

utf2ascii.ps1 infile > outfile

We also need a test file. I created it by gabbing a few paragraphs from the wikipedia page on the Automatgevär m/42 plus an extra line:

Olivenöl
Ag m/42, kitaip AG42, AG-42 ar Ljungman (šved. Automatgevär m/42) – švediškas savitaisis šautuvas, nuo 1942 m. iki 7-ojo dešimtmečio naudotas Švedijos armijos.
Šautuvą 1941 m. sukonstravo Erikas Eklundas iš kompanijos AB C. J. Ljungmans Verkstäder (Malmė, Švedija). Juos gaminti 1942 m. ėmė kompanija Carl Gustafs Stads Gevärsfaktori (Eskilstuna, Švedija). Švedijos armijai buvo pagaminta apie 30 000 Ag m/42. Tačiau šis šautuvas nebuvo pagrindinis Švedijos armijos šautuvas, juo buvo 6,5 mm pertaisomas šautuvas Mauser m/96.
Po kurio laiko buvo aptikti kai kurie šautuvo trūkumai (pvz., dujų vamzdelio rūdijimas), todėl 1953–1956 Švedijos armijos šautuvai Ag m/42 buvo modernizuoti ir pavadinti Ag m/42B. Modernizaciniai pakeitimai:

cat utftest.txt | wc -l (yes it is Linux, sue me) tells me that file is 7 lines long. So, let's run it and see what happens.

$ utf2ascii.ps1 infile
Oliven?l

Ag m/42, kitaip AG42, AG-42 ar Ljungman (?ved. Automatgev?r m/42) ? ?vedi?kas s
avitaisis ?autuvas, nuo 1942 m. iki 7-ojo de?imtme?io naudotas ?vedijos armijos
.

?autuv? 1941 m. sukonstravo Erikas Eklundas i? kompanijos AB C. J. Ljungmans Ve
rkst?der (Malm?, ?vedija). Juos gaminti 1942 m. ?m? kompanija Carl Gustafs Stad
s Gev?rsfaktori (Eskilstuna, ?vedija). ?vedijos armijai buvo pagaminta apie 30
000 Ag m/42. Ta?iau ?is ?autuvas nebuvo pagrindinis ?vedijos armijos ?autuvas,
juo buvo 6,5 mm pertaisomas ?autuvas Mauser m/96.

Po kurio laiko buvo aptikti kai kurie ?autuvo tr?kumai (pvz., duj? vamzdelio r?
dijimas), tod?l 1953?1956 ?vedijos armijos ?autuvai Ag m/42 buvo modernizuoti i
r pavadinti Ag m/42B. Modernizaciniai pakeitimai:
$

Looks great, right? Not quite. Problem with the above code is the following line:

UTF2ASCII(get-content $infile)

It just tries to read the entire file into memory in one sitting and then passes all that to UTF2ASCII(). That works fine with small files, but we will be filling memory/swap space and maybe even running out of them if the file is big enough. And that is a bit of shooting ourselves in the foot. Let's rewrite it a bit by creating a second function, ConvertFile(), which reads the file line by line. This way we need not to care how large the file is.

param($infile)
$count = 0

function ConvertFile($encoded)
{
   $oldi = ''
   $converted = ''

   foreach ($line in $encoded)
   {
      ConvertLine($line)
   }
}

function ConvertLine($line)
{
   foreach($i in [int[]][char[]]$line)
   {
      if ($i -gt 127)
      {
         if ($oldi -ne 63)
         {
            $i = 63
            $global:count++
         }
         else
         { continue }
      }
      $converted = $converted + [char]$i
      $oldi = $i
   }

   $converted = $converted + "`r`n"
   return $converted
}

ConvertFile(get-content $infile)
"$($count) characters were converted"

The $count global variable is just there to count how many times the script found and converted UTF-8 characters.

When we test the script with a large file, and found out we did solve the problem in the original code we mentioned above; you can try it out if you do not believe in me.

So, let's now do a few more tests.

$ utf2ascii.ps1 infile > outfile
$ cat outfile
Oliven?l

Ag m/42, kitaip AG42, AG-42 ar Ljungman (?ved. Automatgev?r m/42) ? ?vedi?kas s
avitaisis ?autuvas, nuo 1942 m. iki 7-ojo de?imtme?io naudotas ?vedijos armijos
.

?autuv? 1941 m. sukonstravo Erikas Eklundas i? kompanijos AB C. J. Ljungmans Ve
rkst?der (Malm?, ?vedija). Juos gaminti 1942 m. ?m? kompanija Carl Gustafs Stad
s Gev?rsfaktori (Eskilstuna, ?vedija). ?vedijos armijai buvo pagaminta apie 30
000 Ag m/42. Ta?iau ?is ?autuvas nebuvo pagrindinis ?vedijos armijos ?autuvas,
juo buvo 6,5 mm pertaisomas ?autuvas Mauser m/96.

Po kurio laiko buvo aptikti kai kurie ?autuvo tr?kumai (pvz., duj? vamzdelio r?
dijimas), tod?l 1953?1956 ?vedijos armijos ?autuvai Ag m/42 buvo modernizuoti i
r pavadinti Ag m/42B. Modernizaciniai pakeitimai:
35 characters were converted
$

Yeah, the fact the line stating how many characters were converted is in the outfile is a drag, but I will survive. I really want to count the number of lines though:

$ cat outfile | wc -l
16

$

Now that was unexpected to me, specially coming from a Linux background. How did my 7 lines became 16? You see, we run the script as

utf2ascii.ps1 infile > outfile

Thing is, in Powershell the > will split a long line to the width of the console window we are using. I read the explanation quite a few times and still am not satisfied. But, since we need this code to work, time to change it one more time!

param($ifile, $ofile)
$count = 0

function ConvertFile($originalFile, $convertedFile)
{
   $oldi = ''
   $converted = ''
   rm $convertedFile -ea SilentlyContinue

   foreach ($line in (get-content -path $originalFile))
   {
      ConvertLine($line) | `
      Out-File -Append -Encoding ASCII $ofile
   }
}

function ConvertLine($line)
{
   foreach($i in [int[]][char[]]$line)
   {
      if ($i -gt 127)
      {
         if ($oldi -ne 63)
         {
            $i = 63
            $global:count++
         }
         else
         { continue }
      }
      $converted = $converted + [char]$i
      $oldi = $i
   }

   return $converted
}

if ( [string]::IsNullOrEmpty($ifile) )
{
   Write-Host "Usage: powershell -ExecutionPolicy ByPass -File", `
               $MyInvocation.MyCommand.Name, "infile [outfile]"
   exit
}

# Create a $ofile if one was not given
if ( [string]::IsNullOrEmpty($ofile) )
{
   $ofile = ([io.path]::GetFileNameWithoutExtension("$ifile")) + `
            (get-date -format yyyymmdd) + `
            ([io.path]::GetExtension("$ifile"))
}

ConvertFile $ifile $ofile
"$($count) characters were converted"

Notes:

Instead of sending output to console, which then would need to be redirected to a file, this script reads the input and output filenames. If an outputfilename is not provided, it will create one based on that of the inputfile.
Corrected the issue of splitting the processed lines before saving them.
Gives usage message if no parameters are given
Outputfile is overwritten without questioning.
Only output to the stdout is the number of conversions

I am bored, let's run it

$ utf2ascii.ps1 infile outfile
35 characters were converted

$

Note that it now wrote the converted character count message to the screen as promised. How about the number of lines?

$ cat outfile | wc -l
7

$

Much better! Now all we need to do is decide if we want to keep the "?" or make some conversion table, which I will leave the decision up to you.

Friday, July 29, 2016

Checking if RedHat/CentOS has new updates

If you use Debian Linux derivatives, specially Ubuntu, you probably noticed when you login it will tell you (using the MOTD) that there are new updates waiting for you. And, you can use that if, say, you are writing a script to let you know about that; someone I know use that to monitor his Ubunu boxes in Nagios. But, what about in RedHat and derviatives the lazy way?

Let's do some thinking aloud and see if we can come up with something. We know that if we run yum check-update, it should reply with the list of packages needing to be upgraded if any

[root@vmhost ~][raub@duckwitch ~]$ yum check-update
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirror.supremebytes.com
 * extras: mirror.hostduplex.com
 * updates: mirror.scalabledns.com

chkconfig.x86_64                        1.3.61-5.el7_2.1                 updates
device-mapper.x86_64                    7:1.02.107-5.el7_2.5             updates
device-mapper-libs.x86_64               7:1.02.107-5.el7_2.5             updates
dracut.x86_64                           033-360.el7_2.1                  updates
glibc.x86_64                            2.17-106.el7_2.6                 updates
glibc-common.x86_64                     2.17-106.el7_2.6                 updates
iproute.x86_64                          3.10.0-54.el7_2.1                updates
kernel.x86_64                           3.10.0-327.22.2.el7              updates
kpartx.x86_64                           0.4.9-85.el7_2.5                 updates
libxml2.x86_64                          2.9.1-6.el7_2.3                  updates
ntpdate.x86_64                          4.2.6p5-22.el7.centos.2          updates
pcre.x86_64                             8.32-15.el7_2.1                  updates
selinux-policy.noarch                   3.13.1-60.el7_2.7                updates
selinux-policy-targeted.noarch          3.13.1-60.el7_2.7                updates
systemd.x86_64                          219-19.el7_2.11                  updates
systemd-libs.x86_64                     219-19.el7_2.11                  updates
systemd-python.x86_64                   219-19.el7_2.11                  updates
systemd-sysv.x86_64                     219-19.el7_2.11                  updates
tzdata.noarch                           2016f-1.el7                      updates
[raub@duckwitch ~]$

As you can see, it does not require you to run that command as root. And you can even check if a repo you use but configured to be normally disabled, like epel in the following example:

raub@duckwitch ~]$ yum check-update --enablerepo=epel
Loaded plugins: fastestmirror
epel/x86_64/metalink                                     |  11 kB     00:00
epel                                                     | 4.3 kB     00:00
(1/3): epel/x86_64/group_gz                                | 170 kB   00:00
(2/3): epel/x86_64/updateinfo                              | 584 kB   00:00
(3/3): epel/x86_64/primary_db                              | 4.2 MB   00:00
Loading mirror speeds from cached hostfile
 * base: mirror.supremebytes.com
 * epel: mirror.chpc.utah.edu
 * extras: mirror.hostduplex.com
 * updates: mirror.scalabledns.com

chkconfig.x86_64                        1.3.61-5.el7_2.1                 updates
device-mapper.x86_64                    7:1.02.107-5.el7_2.5             updates
device-mapper-libs.x86_64               7:1.02.107-5.el7_2.5             updates
dracut.x86_64                           033-360.el7_2.1                  updates
epel-release.noarch                     7-7                              epel
glibc.x86_64                            2.17-106.el7_2.6                 updates
glibc-common.x86_64                     2.17-106.el7_2.6                 updates
iproute.x86_64                          3.10.0-54.el7_2.1                updates
kernel.x86_64                           3.10.0-327.22.2.el7              updates
kpartx.x86_64                           0.4.9-85.el7_2.5                 updates
libxml2.x86_64                          2.9.1-6.el7_2.3                  updates
ntpdate.x86_64                          4.2.6p5-22.el7.centos.2          updates
pcre.x86_64                             8.32-15.el7_2.1                  updates
selinux-policy.noarch                   3.13.1-60.el7_2.7                updates
selinux-policy-targeted.noarch          3.13.1-60.el7_2.7                updates
systemd.x86_64                          219-19.el7_2.11                  updates
systemd-libs.x86_64                     219-19.el7_2.11                  updates
systemd-python.x86_64                   219-19.el7_2.11                  updates
systemd-sysv.x86_64                     219-19.el7_2.11                  updates
tzdata.noarch                           2016f-1.el7                      updates
[raub@duckwitch ~]$

What if there are no upgrades?

[root@server1 ~]# yum check-update
Loaded plugins: fastestmirror
base                                                     | 3.6 kB     00:00
extras                                                   | 3.4 kB     00:00
updates                                                  | 3.4 kB     00:00
Loading mirror speeds from cached hostfile
 * base: mirror.us.leaseweb.net
 * extras: mirror.us.leaseweb.net
 * updates: reflector.westga.edu
[root@server1 ~]#

Sounds like we need to check for the first blank line. We can do that using sed. We can find the blank line by using the /^\s*$/ search pattern in sed. So we could start with something like (the -n is there because we only care when we find said blank line)

yum check-update | `sed -n '/^\s*$/p'`

which if you run does not seem to do much. Reason is that what we really want is to know if the match was successful or not. And, sed actually has the answer: look at these entries I stole from its man pange:

q [exit-code]
              Immediately  quit  the  sed  script  without processing any more
              input, except that if auto-print is  not  disabled  the  current
              pattern  space will be printed.  The exit code argument is a GNU
              extension.

       Q [exit-code]
              Immediately quit the sed  script  without  processing  any  more
              input.  This is a GNU extension.

Let's try it then. First we will find a machine that has a package that needs to be updated. In this case it will be my KVM server, which you have met before when we talked about USB passthrough:

[raub@vmhost ~]$ yum check-update 
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
 * base: mirror.vcu.edu
 * extras: centos.mirror.constant.com
 * updates: mirror.vcu.edu

samba4-libs.x86_64                    4.2.10-7.el6_8                     updates
[raub@vmhost ~]$So as of this writing it has one update. We then try our one-liner, which we want to return 0 if it did not find any pending upgrades and 1 if it did. And that is done by adding q1 past the search string, as in /^\s*$/q1, which means write 1 if search successful. Of course we now need to print the result, which can be done with echo $?. So, let's try it:[raub@vmhost ~]$ yum check-update | `sed -n '/^\s*$/q1'` ; echo $?
1
[raub@vmhost ~]$

It thinks that it found it. I am not completely sure it works, so we need also to verify it works as it should when we find no updates:

[root@server1 ~]# yum check-update
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.netdepot.com
 * extras: mirror.solarvps.com
 * updates: ftpmirror.your.org
[root@server1 ~]# yum check-update | `sed -n '/^\s*$/q1'` ; echo $?
0
[root@server1 ~]#

Sounds like we have a winner. Now all that is left is to wrap something around it to something with that. I will leave that to you. Note that it does not differentiate between normal and security updates though.

Wednesday, July 27, 2016

Starting Tomcat manually in Docker

So I needed to test some settings between CentOS 6 + Apache Tomcat 6 + Java 6 and CentOS 7 + Tomcat7 + Java 8. Best way I found to do these quick tests is to build them in docker: write a quick dockerfile, spool it up, test, and blow it up.

Now, when I built and ran the tomcat 6 + CentOS 6 + Java 6 setup by connecting to the container and manually starting tomcat by typing

service tomcat6 start

It worked fine. So I then built the tomcat 7 + CentOS 7 + Java 8, and tried to start it

[root@tomcat ~]# systemctl start tomcat.service
Failed to get D-Bus connection: Operation not permitted
[root@tomcat ~]#

Since systemd could not start it, and I could not figure out why (I do not take solace knowing I am not the only one), I tried starting it even more manually:

[root@tomcat /]# /usr/sbin/tomcat start
/usr/sbin/tomcat: line 21: .: /etc/sysconfig/: is a directory
/usr/sbin/tomcat: line 39: /logs/catalina.out: No such file or directory
[root@tomcat /]#

This is the time to do what everyone does at a time like this: look for answers online. All I got was someone asking the very same question. It seems if we want some answers we will need do more exploring on our own.

With that in mind, let's see those two lines we are being barked about:

[root@tomcat /]# sed -n '21p' /usr/sbin/tomcat
    . /etc/sysconfig/${NAME}
[root@tomcat /]# sed -n '39p' /usr/sbin/tomcat
  ${JAVACMD} $JAVA_OPTS $CATALINA_OPTS \
[root@tomcat /]#

Not much help here, but we will revisit that later. First, let's run the bash script again, but this time in a debugging (-x) mode:

[root@tomcat /]# bash -x /usr/sbin/tomcat start
+ '[' -r /usr/share/java-utils/java-functions ']'
+ . /usr/share/java-utils/java-functions
++ _load_java_conf
++ local IFS=:
++ local java_home_save=
++ local java_opts_save=
++ local javaconfdir
++ local conf
++ unset _javadirs
++ unset _jvmdirs
++ set -- /etc/java
++ _log 'Java config directories are:'
++ '[' -n '' ']'
++ for javaconfdir in '"$@"'
++ _log '  * /etc/java'
++ '[' -n '' ']'
++ for javaconfdir in '"$@"'
++ conf=/etc/java/java.conf
++ '[' '!' -f /etc/java/java.conf ']'
++ local IFS
++ local JAVA_LIBDIR
++ local JNI_LIBDIR
++ local JVM_ROOT
++ '[' -f /etc/java/java.conf ']'
++ _log 'Loading config file: /etc/java/java.conf'
++ '[' -n '' ']'
++ . /etc/java/java.conf
+++ JAVA_LIBDIR=/usr/share/java
+++ JNI_LIBDIR=/usr/lib/java
+++ JVM_ROOT=/usr/lib/jvm
++ _javadirs=/usr/share/java:/usr/lib/java
++ _jvmdirs=/usr/lib/jvm
++ _load_java_conf_file /root/.java/java.conf
++ local IFS
++ local JAVA_LIBDIR
++ local JNI_LIBDIR
++ local JVM_ROOT
++ '[' -f /root/.java/java.conf ']'
++ _log 'Skipping config file /root/.java/java.conf: file does not exist'
++ '[' -n '' ']'
++ _javadirs=/usr/share/java:/usr/lib/java
++ _jvmdirs=/usr/lib/jvm
++ '[' -d '' ']'
++ '[' -n '' ']'
++ '[' _ '!=' _off -a -f /usr/lib/abrt-java-connector/libabrt-java-connector.so
-a -f /var/run/abrt/abrtd.pid ']'
++ _log 'ABRT Java connector is disabled'
++ '[' -n '' ']'
+ '[' -z '' ']'
+ TOMCAT_CFG=/etc/tomcat/tomcat.conf
+ '[' -r /etc/tomcat/tomcat.conf ']'
+ . /etc/tomcat/tomcat.conf
++ TOMCAT_CFG_LOADED=1
++ TOMCATS_BASE=/var/lib/tomcats/
++ JAVA_HOME=/usr/lib/jvm/jre
++ CATALINA_HOME=/usr/share/tomcat
++ CATALINA_TMPDIR=/var/cache/tomcat/temp
++ SECURITY_MANAGER=false
+ '[' -r /etc/sysconfig/ ']'
+ . /etc/sysconfig/
/usr/sbin/tomcat: line 21: .: /etc/sysconfig/: is a directory
+ set_javacmd
+ local IFS
+ local cmd
+ '[' -x '' ']'
+ set_jvm
+ local IFS=:
+ local cmd
+ local cmds
+ _set_java_home
+ local IFS=:
+ local jvmdir
+ local subdir
+ local subdirs
+ '[' -n /usr/lib/jvm/jre ']'
+ '[' -z '' ']'
++ readlink -f /usr/lib/jvm/jre/..
+ JVM_ROOT=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
+ return
+ '[' -n /usr/lib/jvm/jre ']'
+ return
+ for cmd in jre/sh/java bin/java
+ JAVACMD=/usr/lib/jvm/jre/jre/sh/java
+ '[' -x /usr/lib/jvm/jre/jre/sh/java ']'
+ for cmd in jre/sh/java bin/java
+ JAVACMD=/usr/lib/jvm/jre/bin/java
+ '[' -x /usr/lib/jvm/jre/bin/java ']'
+ _log 'Using configured JAVACMD: /usr/lib/jvm/jre/bin/java'
+ '[' -n '' ']'
+ '[' -n '' ']'
+ return 0
+ cd /usr/share/tomcat
+ '[' '!' -z '' ']'
+ '[' -n '' ']'
+ CLASSPATH=/usr/share/tomcat/bin/bootstrap.jar
+ CLASSPATH=/usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-jul
i.jar
++ build-classpath commons-daemon
+ CLASSPATH=/usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-jul
i.jar:/usr/share/java/commons-daemon.jar
+ '[' start = start ']'
+ '[' '!' -z '' ']'
[root@tomcat /]# + /usr/lib/jvm/jre/bin/java -classpath /usr/share/tomcat/bin/bo
otstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon
.jar -Dcatalina.base= -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -D
java.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/conf/logg
ing.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 org.apache.catalina.startup.Bootstrap start
/usr/sbin/tomcat: line 39: /logs/catalina.out: No such file or directory
[root@tomcat /]#

If you never ran a bash script with the -x option, you should since it shows the steps being performed by the script, including tests, as it runs. For instance, you can see it starts by learning a lot about the current Java installation. After that, it loads some file, TOMCAT_CFG=/etc/tomcat/tomcat.conf, and then gives the first error message:

+ TOMCAT_CFG=/etc/tomcat/tomcat.conf
+ '[' -r /etc/tomcat/tomcat.conf ']'
+ . /etc/tomcat/tomcat.conf
++ TOMCAT_CFG_LOADED=1
++ TOMCATS_BASE=/var/lib/tomcats/
++ JAVA_HOME=/usr/lib/jvm/jre
++ CATALINA_HOME=/usr/share/tomcat
++ CATALINA_TMPDIR=/var/cache/tomcat/temp
++ SECURITY_MANAGER=false
+ '[' -r /etc/sysconfig/ ']'
+ . /etc/sysconfig/
/usr/sbin/tomcat: line 21: .: /etc/sysconfig/: is a directory

Now, /etc/tomcat/tomcat.conf (same thing as /usr/share/tomcat/conf/tomcat.conf) defines a few global (to tomcat) to variables. The top of the file also explains it is the file where you should define variables that are custom to your system but global to all tomcat instances being run here. For instance, when I built the tomcat6 container, I had

JAVA_HOME="/usr/lib/jdk1.6.0_41"

because that was the specific java version I wanted to run. Now, if we look not only at line 21 in /usr/sbin/tomcat but also around said line, we can see it wants to load a file in /etc/sysconfig

# Get instance specific config file
if [ -r "/etc/sysconfig/${NAME}" ]; then
    . /etc/sysconfig/${NAME}
fi

If we look at /etc/sysconfig,

[root@tomcat ~]# ls /etc/sysconfig/
network  network-scripts  rdisc  tomcat
[root@tomcat ~]#

It sure makes me think that $NAME = "tomcat" and $NAME is not defined.

For the second error message we should examine the following lines

[root@tomcat /]# + /usr/lib/jvm/jre/bin/java -classpath /usr/share/tomcat/bin/bo
otstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon
.jar -Dcatalina.base= -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -D
java.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/conf/logg
ing.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 org.apache.catalina.startup.Bootstrap start
/usr/sbin/tomcat: line 39: /logs/catalina.out: No such file or directory

That really looks like it wants to write to the log file catalina.out but can't find it. So we take a look at the lines around line 39:

if [ "$1" = "start" ]; then
  ${JAVACMD} $JAVA_OPTS $CATALINA_OPTS \
    -classpath "$CLASSPATH" \
    -Dcatalina.base="$CATALINA_BASE" \
    -Dcatalina.home="$CATALINA_HOME" \
    -Djava.endorsed.dirs="$JAVA_ENDORSED_DIRS" \
    -Djava.io.tmpdir="$CATALINA_TMPDIR" \
    -Djava.util.logging.config.file="${CATALINA_BASE}/conf/logging.properties" \
    -Djava.util.logging.manager="org.apache.juli.ClassLoaderLogManager" \
    org.apache.catalina.startup.Bootstrap start \
    >> ${CATALINA_BASE}/logs/catalina.out 2>&1 &
    if [ ! -z "$CATALINA_PID" ]; then
      echo $! > $CATALINA_PID
    fi

where we find the line

>> ${CATALINA_BASE}/logs/catalina.out 2>&1 &

That makes me think that the $CATALINA_BASE = "/usr/share/tomcat" since

[root@tomcat ~]# ls /usr/share/tomcat/logs/
catalina.out
[root@tomcat ~]#

Now, /etc/sysconfig/tomcat knows about $CATALINA_BASE even though it really does not define it (commented out):

#CATALINA_BASE="/usr/share/tomcat"

Sounds like we need to define $NAME and $CATALINA_BASE somewhere. My vote would be for
/etc/tomcat/tomcat.conf because it claims it is where we put custom stuff.

# For tomcat.service it's /etc/sysconfig/tomcat, for
# tomcat@instance it's /etc/sysconfig/tomcat@instance.

# THE TWO LINES I MENTIONED IN THE ARTICLE
NAME="tomcat"                                   
CATALINA_BASE="/usr/share/tomcat"

# This variable is used to figure out if config is loaded or not.
TOMCAT_CFG_LOADED="1"

# In new-style instances, if CATALINA_BASE isn't specified, it will
# be constructed by joining TOMCATS_BASE and NAME.
TOMCATS_BASE="/var/lib/tomcats/"

After that, I was able to start it and verify it was indeed running

[root@tomcat tomcat]# ps -ef|grep tomcat
root       352     1  8 13:09 ?        00:00:01 /usr/lib/jvm/jre/bin/java -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start
root       372     1  0 13:09 ?        00:00:00 grep --color=auto tomcat
[root@tomcat tomcat]#

I wrote a shorter version of this article as a reply to the question I found online and mentioned earlier in this article. I hope it will be useful to the original poster.

Wednesday, June 29, 2016

Killing the VMWare vSphere Client from command line

Hope this will be a quick article; just putting it here so I will know how to do it next time. It took a bit longer to post it since I forgot to do the screen capture. Shame on me!

First of all, I will admit I use VMWare's vSphere client on occasion (at least enough to remember how they like their camelcase to be). I know vmware wants us to use the web interface (they even said all new features will be going to the web version), but there are times the C#-based vsphere client is more convenient even though it only runs on Windows. This convenience, however, is not flawless. There are times it might crash and crash horribly. Like now. I needed to access a vm client's console window because it was refusing to let me RDP into it.

Actually, we need to back this out a bit: I had first to remote into the windows desktop where I have the vsphere client so I could then have console access to the problematic vm. You see, we keep this special desktop in a separate vlan that can connect to the vm server. I am making a point to explain this because it might be important down the line.

When I launched the vsphere client, as soon as I passed the window where it asks for my authentication it barked about certificates (it does that all the time; I did not set that vm server up, but will talk about deploying certs in a future article) as shown on the picture to the right. And then it decided to freeze: I could not get out of the modal dialog box (the error message window), which would hover over every other window on the desktop. I could click on the window to my heart's content and it would never get focus, which in Plain English means I could not click on any of its buttons.

Usually this is a pain in you-know-exactly-what, but when you are remoting into a machine to run that problem, it is even more so. The standard way to deal with this is to run the task manager, which you can get by pressing the famous 3-finger salute (no, not that one), CTRL-ALT-DEL. Sometimes you can't send that key sequence out, like in my case who was connecting from a MacBookPro and was having a bit of an issue passing that to the Windows desktop. So, the next option is to do what I like to do anyway (translation: I could not be bothered to figure out how to send CTRL-ALT-DEL), which is the command line. Typing

taskmgr

on the DOS or PowerShell prompt will run the task manager GUI. Once it is up, find vmware client under the applications list,

And click on the End Task button. But, I think we can do better than that. We have the terminal window (DOS or Powershell) open; why not do it all there? Let's see first if we can list the applications (tasks) that are being run (I used more because I was using a powershell console; you could just run the program and then scroll until you find what you want):

PS C:\Users\dr-gori> tasklist|more

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
System Idle Process              0 Services                   0         24 K
System                           4 Services                   0         68 K
smss.exe                       324 Services                   0      1,000 K
csrss.exe                      480 Services                   0      6,704 K
csrss.exe                      572 Console                    1     25,272 K
wininit.exe                    580 Services                   0      1,744 K
winlogon.exe                   636 Console                    1      4,912 K
services.exe                   684 Services                   0     19,176 K
[...]
vmware-usbarbitrator64.ex    12560 Services                   0      8,016 K
audiodg.exe                   9008 Services                   0     18,608 K
taskmgr.exe                  15548 Console                    1     11,528 K
VpxClient.exe                 4800 Console                    1    169,556 K
vmware-vmrc.exe               2520 Console                    1     38,836 K
tasklist.exe                  9412 Console                    1      6,040 K

PS C:\Users\dr-gori>

Yes, this shows more than just the applications. In fact, it feels more like when you list the processes being run in Linux by typing, say

ps -aux

The only thing we have to know is that the name of the vsphere client here is VpxClient.exe; don't look at me like that: I didn't come with that name. So, just like in Linux (and Unix in general), we can see VpxClient.exe has a process ID (PID) of 4800. Now, if Microsoft is calling it process ID, why they are calling the process a task? But I digress. In any case, let's kill the process, er task:

PS C:\Users\dr-gori> tskill 4800
PS C:\Users\dr-gori>

And away the client went. Still need to figure out what is going on with it, but that will be for another time. Right now I can use the web-based client, which is not tied down to one single operating system, and do what I need to do.

Record of the UNIX Wars