Monday, December 09, 2019

Configuring autofs in a generic way on redhat/debian derived distros in ansible

Let's say you want to install and configure autofs so to have network fileshares mounted on demand. That sounds like a good task for Ansible.

Installing AutoFS

The installing part as you know is fairly easy. In both RedHat and Debian derivatives the package name is autofs. Since we are not doing anything special we can create a rather generic Ansible task using the Ansible package module. What that means is instead of having to worry about having a version for, say, Ubuntu and RedHat, the package module uses whatever package manager is the default for the distro in question:

- name: setup autofs
  package:
    name: autofs
    state: latest
NOTE: I know package works in the redhat/debian derived distros, but I am not sure if it will work in other Linux flavours. I also would check the autofs package name for these other distros.

Configuring AutoFS

Next thing we want to do is ensure we are using the right NFS version. That is done finding the line beginning with mount_nfs_default_protocol and editing its value. In my case I want to make sure it is using NFS v4, which is the default anyway in a modern autofs package. So, why bother? Well, call me paranoid: I want to have exactly what I want. Or call it a simple example of using the Ansible lineinfile module. Or maybe someone is using NFS v3 and want to see how to change it.

- name: Ensure nfs v4
  lineinfile:
    path: /etc/autofs.conf
    regexp: '^mount_nfs_default_protocol '
    line: 'mount_nfs_default_protocol = 4'

Let's use this as an excuse to talk about lineinfile: the regexp here is looking for a line that starts with the string 'mount_nfs_default_protocol '; I wrote it in quotes because it includes the blank space after mount_nfs_default_protocol. Note that the search pattern also means "and anything else to the end of the line," so a line looking like this

mount_nfs_default_protocol could be something you do not want to touch
is fair game. I know usually in regexp you would end the query statement with (.*)$ to include everything to the end of the page, but just nod a lot and move on. Now if the looked like this:
#mount_nfs_default_protocol could be something you do not want to touch
the regexp would not work because it expects the line to begin with m. line defines what we want the line to look like. If it matches that, no changes made.

The next step is to define what we want to use autofs for. In my case, I want to mount user home directories off the fileserver. That means creating a /etc/auto.home file which describes which fileserver we are using. For this I suggest using the template module since we can define the name of the fileserver somewhere earlier in the playbook or in a config file (I am thinking here of a file in host_vars/ or group_vars/ associated with the host in question.). In my task file I use something like

- name: configure auto.home
  template:
    src: auto.home.j2
    dest: /etc/auto.home
    mode: 0644
    owner: root
    group: root
    serole: _default
    setype: _default
    seuser: _default
  notify: restart autofs
which
  1. Grabs the template templates/auto.home.j2 and puts it in /etc/auto.home
  2. Sets the permission, ownership, and selinux parameters for /etc/auto.home. The _default means that if there is a default selinux setting for that file/directory we will use it.

And roles/common/templates/auto.home.j2 looks like this:

/etc/auto.home
#
# File: /etc/auto.home
#
*   -fstype=nfs4,hard,intr,rsize=8192,wsize=8192 {{ nfs_server }}:/home/&
where nfs_server is the name of the nfs server defined somewhere else.

Now, there are two ways to deal with it, the old autofs and the new one.

  • Old autofs: In the old days, you would edit the /etc/auto.master file, adding a line underneath +auto.master that would tell us how to mount user home directory. In the following example,
    +auto.master
    /home   /etc/auto.home --timeout=300
    
    the bottom line is saying "if you notice someone trying to access a file/directory in /home, go to /etc/auto.home to see how to mount it. But, if there is no activity after 300 seconds, unmount that." I use timeout of 300 seconds; change it to fit your needs. Now we need to add that to /etc/auto.master, which we will do using the lineinfile module once more:

    - name: Enable auto.home in auto.master
      lineinfile:
        path: /etc/auto.master
        regexp: '^\/home'
        insertafter: '^\+auto.master'
        line: /home   /etc/auto.home --timeout=300

    As you can see, it is a little more complex than the previous task:

    • The regexp statement has to escape the /
    • insertafter is used to look for a line which matches that pattern and then insert/change the line we want below this pattern. This is useful when you have more than one line matching the regexp pattern or you want the line we are creating/replacing to be on a specific location. You see, without that if line does not exist, it is appended on the end of the file.
  • New autofs: The more modern /etc/auto.master file has the following lines in it:
    #
    # Include /etc/auto.master.d/*.autofs
    # The included files must conform to the format of this file.
    #
    +dir:/etc/auto.master.d
    #

    Instead of editing the /etc/auto.master file, which might be overwritten by an upgrade, we simply throw a file inside the /etc/auto.master.d directory which is then loaded into the /etc/auto.master file. This file, say /etc/auto.master.d/home.autofs, looks very much like what we did in the old autofs example, main difference is that it is a file

    raub@desktop:~/dev/ansible$ cat roles/common/files/home.autofs
    /home   /etc/auto.home --timeout=300
    raub@desktop:~/dev/ansible$
    that needs to be uploaded using the Ansible file copy module:

    - name: Enable auto.home in auto.master.d
      copy:
        src: home.autofs
        dest: /etc/auto.master.d/home.autofs
        owner: root
        group: root
        serole: _default
        setype: _default
        seuser: _default
        mode: 0644

Which one to pick? You know your setup, so pick the one that fits your needs.

(Re)Starting autofs

The final step we do need to do is (re)start autofs service after all this configuring. We do that using handlers:

- name: start autofs
  service:
    name: autofs
    state: started
    enabled: yes

- name: restart autofs
  service:
    name: autofs
    state: restarted
    enabled: yes

The way I use them is to start autofs when package is installed (see the notify statement),

- name: setup autofs
  package:
    name: autofs
    state: latest
  notify: start autofs
and then restart it after finishing with auto.home
- name: configure auto.home
  template:
    src: auto.home.j2
    dest: /etc/auto.home
    mode: 0644
    owner: root
    group: root
    serole: _default
    setype: _default
    seuser: _default
  notify: restart autofs

After I unleash ansible, I then ssh as the non-root user which is allowed to login to vmhost2 and then see if fileshare was mounted:

[raub@vmhost2 ~]$ df -h
Filesystem                            Size  Used Avail Use% Mounted on
devtmpfs                               16G     0   16G   0% /dev
tmpfs                                  16G     0   16G   0% /dev/shm
tmpfs                                  16G  8.9M   16G   1% /run
tmpfs                                  16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/vmhost-root               2.0G   71M  2.0G   4% /
/dev/mapper/vmhost-usr                4.0G  1.8G  2.2G  46% /usr
/dev/sda2                             976M  179M  731M  20% /boot
/dev/sda1                             200M  6.8M  194M   4% /boot/efi
/dev/mapper/vmhost-var                4.0G  376M  3.7G  10% /var
/dev/mapper/vmhost-vg_backup           10G  104M  9.9G   2% /var/lib/libvirt/qemu/save
fileserver.example.com:/home/raub     690G  629G   61G  92% /home/raub
tmpfs                                 3.2G     0  3.2G   0% /run/user/1001
[raub@vmhost2 ~]$
[raub@vmhost2 ~]$ systemctl status autofs
● autofs.service - Automounts filesystems on demand
   Loaded: loaded (/usr/lib/systemd/system/autofs.service; enabled; vendor pres>
   Active: active (running) since Mon 2019-12-09 14:17:26 EST; 2min 39s ago
 Main PID: 28387 (automount)
    Tasks: 6 (limit: 26213)
   Memory: 3.3M
   CGroup: /system.slice/autofs.service
           └─28387 /usr/sbin/automount --foreground --dont-check-daemon
[raub@vmhost2 ~]$

I do not know about you but it seems we have a winner. I will put a cleaner version of this playbook and supporting files in my github account later on.

Thursday, December 05, 2019

Replacing a VMWare ESXi host with a KVM one

Why?

Ok, I think I did my due diligence. With all the entries I have in this blog as proof, I think I put up long enough with the ESXi box. It was not as bad as Xen but I got tired of not being able to get it to behave as it should. And when I could not do PCI passthrough -- I am not even saying the Netronome card, but every single PCI or PCIe card I had available and had no problems passing to a vm guest using KVM as the hypervisor -- it was time to move on. The writing on the wall came after almost a year I could not get an answer from VMWare.

The Plan

  1. While the ESXi server, vmhost2, is still running, export the guests in .ovf format to a safe location. Just to be on the safe side, write down somewhere the specs for each guest (memory, cpu, OS, which network it is using, etc).
  2. Build the new vmhost2 using Debian or CentOS as the base OS and kvm as the hypervisor. Some things to watch out for:
    • Setup Network trunk and bridges to replicate old setup.
    • Use the same IP as before since we are keeping the same hostname.
    • Setup the logical volume manager so I can move things around later on.
    • Configure ntp to use our internal server
    • Configure DNS to use our internal server
    • Accounts of users who need to access the vm host itself will be mounted through autofs. If that fails, can login to root using ssh keypair authentication. If that is down (say, network issues, switch to console).
    • Like in the old vmhost2, ISOs for the install images are available through NSF.
    • Add whatever kernel options we might need. Remember we are building this from scratch, not just dropping a prebuilt system like xenserver. Or ESXi.
  3. Import enough vm guests to validate the system. Might take the opportunity to do some guest housecleaning.
  4. Add any PCI/PCIe cards we want to passthrough.
  5. Import the rest of the vm guests.
  6. (Future:) set it up so it can move/load balance vm guests with vmhost, the other KVM host.
Note: I did not wipe the original hard drive; instead I just bought a 111.8GB (128GB in fake numbers) SSD to run the OS in. I did not get a second drive and make it a RAID for now since I plan on running the OS in that disk, which will be configured using Ansible so I can rebuilt it quickly. Any vm guest running in this vm host will either run from the fileserver (iSCSI) or in a local RAID setup of sorts. Or I might simply deploy ZFS and be done with it. With that said, I might run a few vm gusts from that drive to validate the system.
Note: This project will be broken down into many articles otherwise it will be long and boring to read on a single sitting (some of the steps are applicable to other applications besides going from ESXi to KVM). I will try to come back and add the links of those articles, treating this post as the index.

Tuesday, November 12, 2019

Capturing the output of a command sent through ssh within script without it asking to verify host key

You have noticed when you connect to a new server, if its key is not in ~/.ssh/known_hosts ssh will ask you to verify the key:

raub@desktop:~$ ssh -i $SSHKEY_FILE cc@"$server_ip"
The authenticity of host 'headless.example.com (10.0.1.160)' can't be established.
ECDSA key fingerprint is SHA256:AgwYevnTsG2m9hQLu/ROp+Rjj5At2HU0HVoGZ+5Ug58.
Are you sure you want to continue connecting (yes/no)? 

That is a bit of a drag if you want to run a script to connect to said server. Fortunately ssh has an option just for that, StrictHostKeyChecking, as mentioned in the man page:

ssh automatically maintains and checks a database containing identification for all hosts it has ever been used with. Host keys are stored in ~/.ssh/known_hosts in the user's home directory. Additionally, the file /etc/ssh/ssh_known_hosts is automatically checked for known hosts. Any new hosts are automatically added to the user's file. If a host's identification ever changes, ssh warns about this and disables password authentication to prevent server spoofing or man-in-the-middle attacks, which could otherwise be used to circumvent the encryption. The StrictHostKeyChecking option can be used to control logins to machines whose host key is not known or has changed.

Now we can rewrite the ssh command as

ssh -o "StrictHostKeyChecking no" -i $SSHKEY_FILE cc@"$server_ip"

and it will login without waiting for us to verify the key. Of course this should only be done when the balance between security and automation meets your comfort level. However, if we run the above from a script, it will connect and just stay there with the session open, not accept any commands. i.e. if we try to send a command, pwd in this case,

ssh -o "StrictHostKeyChecking no" -i $SSHKEY_FILE cc@"$server_ip"
pwd

it will never see the pwd. Only way to get to the pwd is to kill the ssh session, when it will then go to the next command but running it in the local host, not the remote one. If we want to run pwd in the host we ssh into, we need to pass it as an argument, i.e.

raub@desktop:/tmp$ pwd
/tmpraub@desktop:/tmp$ ssh -o "StrictHostKeyChecking no" -i $SSHKEY_FILE cc@"$server_ip" pwd
/home/raub
raub@desktop:/tmp$

which means it will connect, run said command, and then exit. But, how to do that from a script? The answer is to use eval or bash -c (or other available shell):

raub@desktop:/tmp$ moo="ssh -o \"StrictHostKeyChecking no\" -i $SSHKEY_FILE cc@$server_ip"
raub@desktop:/tmp$ dirlist=$(eval "$moo pwd")
raub@desktop:/tmp$ echo $dirlist
/home/raub
raub@desktop:/tmp$ dirlist2=$(sh -c "$moo pwd")
raub@desktop:/tmp$ echo $dirlist2
/home/raub
raub@desktop:/tmp$

Now, there are subtle differences between eval and bash -c. There is a good thread in stackexchange which explains them better than I can do.

Now why someone would want to do that?

How about doing some poor-man ansible, i.e. when host on the other side does not have python or has some other idiosyncrasy. Like my home switch, but that is another story...

Saturday, November 09, 2019

Creating a RSA key for rsync for Android using docker

I use Rsync4Android to backup my phone. It is convenient and you can even run a cronjob so it does its thing without your intervention (think of it backing up while you are having dinner somewhere) and it backs up to wherever you want, so you are in control. Now, one day it stopped working. Not knowing what was going on, I did what one usually does when dealing with ssh issues: run sshd in debug mode:

raub@desktop:~$ sudo /usr/sbin/sshd -D
[...]
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: inetd sockets after dupping: 3, 3
debug1: list_hostkey_types: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256 [preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
debug1: SSH2_MSG_KEXINIT received [preauth]
debug1: kex: algorithm: curve25519-sha256@libssh.org [preauth]
debug1: kex: host key algorithm: ecdsa-sha2-nistp256 [preauth]
debug1: kex: client->server cipher: aes128-ctr MAC: hmac-sha1 compression: none [preauth]
debug1: kex: server->client cipher: aes128-ctr MAC: hmac-sha1 compression: none [preauth]
debug1: expecting SSH2_MSG_KEX_ECDH_INIT [preauth]
debug1: rekey after 4294967296 blocks [preauth]
debug1: SSH2_MSG_NEWKEYS sent [preauth]
debug1: expecting SSH2_MSG_NEWKEYS [preauth]
debug1: SSH2_MSG_NEWKEYS received [preauth]
debug1: rekey after 4294967296 blocks [preauth]
debug1: KEX done [preauth]
debug1: userauth-request for user raub service ssh-connection method none [preauth]
debug1: attempt 0 failures 0 [preauth]
debug1: PAM: initializing for "raub"
debug1: PAM: setting PAM_RHOST to "10.0.0.129"
debug1: PAM: setting PAM_TTY to "ssh"
debug1: userauth-request for user raub service ssh-connection method publickey [preauth]
debug1: attempt 1 failures 0 [preauth]
userauth_pubkey: key type ssh-dss not in PubkeyAcceptedKeyTypes [preauth]
Connection closed by 10.0.0.129 port 39739 [preauth]
debug1: do_cleanup [preauth]
debug1: monitor_read_log: child log fd closed
debug1: do_cleanup
debug1: PAM: cleanup
debug1: Killing privsep child 10239
debug1: audit_event: unhandled event 12
raub@desktop:~$

The line that tells what is going on is

userauth_pubkey: key type ssh-dss not in PubkeyAcceptedKeyTypes [preauth]

When creating a key pair, Rsync4Android uses DSA algorithm. As we know, DSA has been considered insecure for a while and the current releases of openssh do not support it by default. So, if I want to keep on using rsync4android, I either configure my ssh server to accept DSA keys or find a way to convince it to use a RSA key. I chose the RSA route, but how to do it?

Create Key

Nothing special here.

ssh-keygen -t rsa -b 4096 -C "Phone_backup" -f ~/.ssh/phonebackup

Don't like 4096 bits? Double it; Rsync4Android does not care.

Convert private key to something Rsync4Android can use

The public key will go to the Linux host, which runs openssh and can handle those keys just fine. Rsync4Android, on the other hand, needs dropbear style keys. As mentioned in Rsync4Android docs, the best way to convert is to use the dropbearconvert command, which for ubuntu comes in the dropbear package. As I did not want to install it in desktop, I quickly created a docker container, copied the private key, and then installed the package. And then ran it (note the path) telling I am feeding it an openssh format key and want a dropbear style key:

root@b3f7ed8c4f24:/home# apt-get install dropbear
[...]
root@b3f7ed8c4f24:/home# /usr/lib/dropbear/dropbearconvert openssh dropbear phonebackup phonebackup.dropbear
Key is a ssh-rsa key
Wrote key to 'phonebackup.dropbear'
root@b3f7ed8c4f24:/home#

Now I have the dropbear-formated private key, phonebackup.dropbear, I can finally set the circus up.

Copy public and private keys to the proper locations

Public key goes to the computer we are backing the phone to (in my case desktop): added to the account's ~/.ssh/athorized_keys file.

The private key to the android. I put it in the same directory Rsync4Android placed the original (DSA) key it created, /sdcard. Then it was a matter of renaming the key in the Rsync4Android config and running it again.

Test it

Rsync4Android has a dry run mode so you can see if it works. When testing, I also ran sshd in debug mode. Then I ran the backup in "production mode"; to the left is a screen capture of my phone. The reason you see a lot of output is because I have --partial --progress as rsync options; you should configure it to fit your needs. I suggest to also check on the --exclude option; I think you will find it quite useful.

Now we know it works, we can worry about running it as a cron job (the Rsync4Android docs have a link for how to do that) and then make it work from an external network. But that is for another article.

Saturday, June 22, 2019

Bitten by double quotes while passing arguments to a bash function

So I wrote a script to setup some networks. The script is not that important for this article; in fact I do not like said script anymore and will replace it with some wholesome ansible playbooks. What matters in it is this bit of code:

So the script is called nething.sh and kinda looks like this:

#! /bin/bash
# Populate openstack networks

HEAD_NODE_IPADDRESS="10.0.0.30"
COMPUTE_NOTE=$HEAD_NODE_IP_ADDRESS
SEC_GROUP="custom-1"
NET_TYPE="flat"
PHYSICAL_NETWORK_DEFAULT="extnet"
NET_NAME="public"
SUBNET_NAME="subnet-${NET_NAME}"
CIDR="10.0.0.0/24"
DNS_NAMESERVER1="10.0.0.10"
GATEWAY_IP="10.0.0.2"
DHCP_ALLOCATION_START="10.0.0.249"
DHCP_ALLOCATION_END="10.0.0.254"

# Boring things here
function CreateNetwork
{
   net_name=$1
   net_id=$2          # vlan tag, segmentation ID
   net_type=$3
   subnet_name=$4
   subnet_range=$5    # Subnet is not necessarily as wide as its parent network
   gateway=$6
   subnet_dhcp_start=$7
   subnet_dhcp_end=$8
   nameserver=$9
   physical_net=$10

   if [[ ${net_id} == "EXT" ]]; then
      openstack network create --provider-network-type ${net_type} \
[...]
}

# Public Network
   CreateNetwork \
      $NET_NAME \
      "EXT" \
      $NET_TYPE \
      $SUBNET_NAME \
      $CIDR \
      $GATEWAY_IP \
      $DHCP_ALLOCATION_START \
      $DHCP_ALLOCATION_END \
      $DNS_NAMESERVER1 \
      $PHYSICAL_NETWORK_DEFAULT

which defines a few constants on the top and then pass them to a function in the end of the script.

Every time I ran the script, it kept crashing and burning, and I did not know why. So I ran the script as bash -x nething.sh so it would spit out each line/variable as it passed through them. Here is how the output looks like:

+ HEAD_NODE_IPADDRESS=10.0.0.30
+ COMPUTE_NOTE=
+ SEC_GROUP=custom-1
+ NET_TYPE=flat
+ PHYSICAL_NETWORK_DEFAULT=extnet
+ NET_NAME=public
+ SUBNET_NAME=public_subnet
+ CIDR=10.0.0.0/24
+ DNS_NAMESERVER1=10.0.0.10
+ GATEWAY_IP=10.0.0.2
+ DHCP_ALLOCATION_START=10.0.0.249
+ DHCP_ALLOCATION_END=10.0.0.254
+ CreateNetwork public BLANK flat public_subnet 10.0.0.0/24 ' '
+ net_name=public
+ net_id=BLANK
+ net_type=flat
+ subnet_name=public_subnet
+ subnet_range=10.0.0.0/24
+ gateway=' '
+ subnet_dhcp_start=
+ subnet_dhcp_end=
+ nameserver=
+ physical_net=public0
+ [[ BLANK == \E\X\T ]]
+ [[ BLANK == \B\L\A\N\K ]]

The line starting with CreateNetwork public BLANK means we are going to the function CreateNetwork(); what is come after is a list of the local variables inside the function which were populated by the function call. The [[ BLANK == \E\X\T ]] is how an if test looks like under the -x option, specifically

if [[ ${net_id} == "EXT" ]]; then

To save time I will spoil the fun: take a look at the gateway variable. Why is it blank? And why every other variable after it is blank? Note that the line

+ CreateNetwork public BLANK flat public_subnet 10.0.0.0/24 ' '
tells us that
  • gateway is being passed as ' ' right from the function call.
  • No other argument after gateway is being passed.

Why?

The problem is CIDR="10.0.0.0/24", specifically the double quotes. They are expanding the string 10.0.0.0/24 and interpreting the /24 as a character. The solution is to use single quotes, as in

HEAD_NODE_IPADDRESS="10.0.0.30"
COMPUTE_NOTE=$HEAD_NODE_IP_ADDRESS
SEC_GROUP="custom-1"
NET_TYPE="flat"
PHYSICAL_NETWORK_DEFAULT="extnet"
NET_NAME="public"
SUBNET_NAME="${NET_NAME}_subnet"
CIDR='10.0.0.0/24'
DNS_NAMESERVER1="10.0.0.10"
GATEWAY_IP="10.0.0.2"
DHCP_ALLOCATION_START="10.0.0.249"
DHCP_ALLOCATION_END="10.0.0.254"

If we then run that, the strings now behave as expected:

[root@stakola ~(keystone_admin)]# bash -x nething.sh
+ HEAD_NODE_IPADDRESS=10.0.0.30
+ COMPUTE_NOTE=
+ SEC_GROUP=custom-1
+ NET_TYPE=flat
+ PHYSICAL_NETWORK_DEFAULT=extnet
+ NET_NAME=public
+ SUBNET_NAME=public_subnet
+ CIDR=10.0.0.0/24
+ DNS_NAMESERVER1=10.0.0.10
+ GATEWAY_IP=10.0.0.2
+ DHCP_ALLOCATION_START=10.0.0.249
+ DHCP_ALLOCATION_END=10.0.0.254
+ CreateNetwork private 6855 vlan private_subnet 192.168.55.0/24 192.168.55.1 192.
168.55.100 192.168.55.200 10.0.0.10 physnet1
+ net_name=private
+ net_id=6855
+ net_type=vlan
+ subnet_name=private_subnet
+ subnet_range=192.168.55.0/24
+ gateway=192.168.55.1
+ subnet_dhcp_start=192.168.55.100
+ subnet_dhcp_end=192.168.55.200
+ nameserver=10.0.0.10
+ physical_net=private0
+ [[ 6855 == \E\X\T ]]
+ [[ 6855 == \B\L\A\N\K ]]
[...]

Moral of the story: know when to use single and double quotes!

Thursday, May 23, 2019

Programming a Netronome network card (inside a VM) from command line

This is the same card(s) we got to work inside a vm guest using the magic of PCI passthrough. Netronome wants us to use a Windows-only IDE to do development work in it while the card is placed in a Linux box we can reach; some of its features remind me of the IDE Google has for Androids, which allows you to run an emulator and do some real time debugging. The difference is the Google one works in Linux, Windows, and Mac, and it only requires one computer (which could be remotely accessed).

Do we really need to use the Netronome SDK? I guess it depends on what we want to do. For now, let's see if we can get something running using command line only to the point we can compile in Micro-C and run something in the card.


Get the packages

  1. First we need a few packages available for either CentOS or Ubuntu.
    Note: The Netronome Linux SDK officially only support CentOS and Ubuntu. So we will only be covering those distros.
    • Ubuntu:
      apt-get install libftdi1 libjansson4 build-essential \
       linux-headers-`uname -r` dkms git
    • CentOS: (Still using yum; will do a dnf version when I feel like.
      yum -y install epel-release && yum update -y
      yum -y install libftdi jansson pciutils kernel-devel dkms wget git

    Netronome does require you to have an account to get the SDK packages. I can't help with that; what I can tell you is that once I got the account I downloaded everything which was available at the time I wrote this article:

    raub@desktop:~$ ls Downloads/netronome/
    SDK
    agilio-nfp-driver-dkms-2018.01.11.2333.f40482a-1.el7.noarch.rpm
    agilio-nfp-driver-dkms_2018.01.11.2333.f40482a_all.deb
    firmware
    readme
    raub@desktop:~$ ls Downloads/netronome/SDK/
    6.0.4.1 6.1.0.1
    raub@desktop:~$ ls Downloads/netronome/SDK/6.1.0.1/
    nfp-sdk-6.1.0.1-preview-3286-setup.exe
    nfp-sdk-6.1.0.1_preview-0-3243.x86_64.rpm
    nfp-sdk-p4-rte-6.1.0.1-preview-3202.centos.x86_64.tar
    nfp-sdk-p4-rte-6.1.0.1-preview-3214.ubuntu.x86_64.tar
    nfp-sdk-sim-6.1.0.0-preview-3179.x86_64.tar
    nfp-sdk_6.1.0.1-preview-3243-2_amd64.deb
    nfp-toolchain-6.1.0.1-preview-3243.x86_64.tar
    raub@desktop:~$

    and then copied them all to the development vm guest we created in the previous article, desktop1

    What are those files? I put what I have gathered about their function in the readme file (it covers the old file version but it should get an idea):
    Programmer Studio IDE
    nfp-sdk-6.0.4.1-3276-setup.exe - Windows
    
    Run Time Environment (RTE)
    nfp-sdk-p4-rte-6.0.4.1-3195.ubuntu.x86_64.tgz
    nfp-sdk-p4-rte-6.0.4.1-3191.centos.x86_64.tgz
    
    Hosted Toolchain (to be used with BSP and SmartNIC)
    nfp-sdk_6.0.4.1-3227-2_amd64.deb
    nfp-sdk-6.0.4.1-0-3227.x86_64.rpm
    
    NFP Simulator
    nfp-sdk-sim-6.0.4.1-3177.x86_64.tgz
    
    Hosted Toolchain (to be used with NFP Simulator)
    nfp-toolchain-6.0.4.1-3227.x86_64.tgz
    Note: There are two versions of the SDK. Just pick the latest.
  2. Then install the basic SDK
    • Ubuntu:
      sudo dpkg -i nfp-sdk_6.1.0.1-preview-3243-2_amd64.deb
    • CentOS:
      sudo rpm -ivh nfp-sdk-6.1.0.1_preview-0-3243.x86_64.rpm

    This creates a /opt/netronome directory.

  3. And add to the path where the binaries will be installed.
    cat >> ~/.bash_profile << 'EOF'
    
    # Netronome SDK
    PATH=$PATH:/opt/netronome/bin
    export PATH
    EOF
    source ~/.bash_profile
    Note: If the user you are building your code on does not have rights to write to the card, you should edit the root's .bash_profile file as well.
  4. We do need the Netronome modified but open source nfp driver which has the development features (specificially, it has a nfp_dev_cpp option we will need to expose the low-level user space access ABIs of non-netdev mode) we need. So, we install it, which requires installing the Netronome repo:

    • Ubuntu:
      wget https://deb.netronome.com/gpg/NetronomePublic.key
      apt-key add NetronomePublic.key
      add-apt-repository "deb https://deb.netronome.com/apt stable main"
      apt-get update
      apt-get install agilio-nfp-driver-dkms
    • CentOS:
      wget https://rpm.netronome.com/gpg/NetronomePublic.key
      rpm --import NetronomePublic.key
      cat << 'EOF' > /etc/yum.repos.d/netronome.repo
      [netronome]
      name=netronome
      baseurl=https://rpm.netronome.com/repos/centos/
      gpgcheck=0
      enabled=1
      EOF
      yum makecache
      yum install -y agilio-nfp-driver-dkms --nogpgcheck
    and then reboot.
  5. Now we can install the RTE
    • Ubuntu:
      tar xvf nfp-sdk-p4-rte-6.1.0.1-preview-3214.ubuntu.x86_64.tar
      cd nfp-sdk-6-rte-v6.1.0.1-preview-Ubuntu-Release-r2750-2018-10-10-ubuntu.binary/
      sudo ./sdk6_rte_install.sh install
    • CentOS:
      tar xvf nfp-sdk-p4-rte-6.1.0.1-preview-3202.centos.x86_64.tar
      cd nfp-sdk-6-rte-v6.1.0.1-preview-CentOS-Release-r2749-2018-10-09-centos.binary/
      sudo ./sdk6_rte_install.sh install
    This should cause /opt/netronome/bin/ to fill with many more files; this is a good way to check progress.
    NOTE:Chances are It will get pissed:
    [...]
    Loaded plugins: fastestmirror
    Examining /home/centos/netronome/SDK/6.1.0.1/nfp-sdk-6-rte-v6.1.0.1-preview-CentOS-Release-r2749-2018-10-09-centos.binary/dependencies/nfp-bsp/rpm//nfp-bsp-dkms_2018.08.17.1104_all.rpm: nfp-bsp-dkms-2018.08.17.1104-1dkms.noarch
    Marking /home/centos/netronome/SDK/6.1.0.1/nfp-sdk-6-rte-v6.1.0.1-preview-CentOS-Release-r2749-2018-10-09-centos.binary/dependencies/nfp-bsp/rpm//nfp-bsp-dkms_2018.08.17.1104_all.rpm to be installed
    Resolving Dependencies
    --> Running transaction check
    ---> Package nfp-bsp-dkms.noarch 0:2018.08.17.1104-1dkms will be installed
    --> Processing Conflict: agilio-nfp-driver-dkms-2019.04.02.0225.bf81349-1.el7.noarch conflicts nfp-bsp-dkms
    Loading mirror speeds from cached hostfile
     * base: packages.oit.ncsu.edu
     * epel: mirror.umd.edu
     * extras: packages.oit.ncsu.edu
     * updates: packages.oit.ncsu.edu
    No package matched to upgrade: nfp-bsp-dkms
    --> Finished Dependency Resolution
    Error: agilio-nfp-driver-dkms conflicts with nfp-bsp-dkms-2018.08.17.1104-1dkms.noarch
     You could try using --skip-broken to work around the problem
     You could try running: rpm -Va --nofiles --nodigest
    Error! There are no instances of module: nfp-bsp-dkms
    located in the DKMS tree.
    [centos@desktop1 nfp-sdk-6-rte-v6.1.0.1-preview-CentOS-Release-r2749-2018-10-09-centos.binary]$
    but it will get over and will work fine.
  6. Ensure that nfp_dev_cpp = 1
    theuser@desktop1:~$ cat /sys/module/nfp/parameters/nfp_dev_cpp
    1
    theuser@desktop1:~$ 

    If not, say, yet get an error message like this

    [theuser@desktop1 ~]# cat /sys/module/nfp/parameters/nfp_dev_cpp
    cat: /sys/module/nfp/parameters/nfp_dev_cpp: No such file or directory
    [theuser@desktop1 ~]#

    uninstall nfp and install it back with the option set. There are ways to load said option at boot time; I will leave that as an exercise to the reader.

    theuser@desktop1:~$ sudo modprobe -r -v nfp && sudo modprobe nfp nfp_dev_cpp=1
    theuser@desktop1:~$
  7. Ensure nfp-hwinfo is talking to the card. The expected outcome should look like this:
    theuser@desktop1:~$ sudo /opt/netronome/bin/nfp-hwinfo
    nfp.interface=pci.0.0
    nfp.model=0x40010010
    nfp.serial=00:15:4d:13:5d:2b
    board.exec=bootloader.bin
    uart.baud=115200
    preinit.setup.version=nfp-bsp-6000-b0 (4ef1e19ba176)
    pcie0.type=ep
    assembly.revision=11
    assembly.model=lithium
    assembly.partno=AMDA0096-0001
    assembly.serial=17290647
    assembly.vendor=SMC
    ddr0.spd=spi:1:0:0x3F0F00
    ddr1.spd=spi:1:0:0x3F0F00
    ddr2.spd=none
    ddr3.spd=none
    ddr4.spd=none
    ddr5.spd=none
    emu1.type=cache
    emu2.type=cache
    ethm.mac=00:15:4d:13:5d:2b
    eth.mac=00:15:4d:13:5d:2c
    eth.macs=2
    vpd=fis:1:0:vpd.bin
    board.setup.version=nfp-bsp-6000-b0 (4ef1e19ba176)
    chip.model=NFP4001
    chip.revision=B0
    core.speed=633
    me.speed=633
    arm.speed=475
    chip.model.device=0x62006c20
    chip.identifier=0x219b8546c
    chip.model.hard=0x5
    chip.model.soft=0x40010096
    chip.route=0xc96f1e8e
    chip.island=0x1001f13000112
    mem.setup.version=nfp-bsp-6000-b0 (4ef1e19ba176)
    ddr0.mem.size=1024
    ddr1.mem.size=1024
    ddr0.mem.speed=1600
    ddr1.mem.speed=1600
    emu0.mem.size=2048
    emu0.mem.base=0x2000000000
    emu1.mem.size=3
    [...]
    theuser@desktop1:~$

    If it looks like this:

    theuser@desktop1:~$ sudo /opt/netronome/bin/nfp-hwinfo
    /opt/netronome/bin/nfp-hwinfo: Failed to open NFP device 0 (No such device)
    Please check that:
     -lspci -d 19ee: shows atleast one Netronome device
     -the nfp device number is correct
     -the user has read and write permissions to the Netronome device
     -the nfp.ko module is loaded
     -the nfp_dev_cpp option is enabled (please try modinfo nfp to see all params)
    theuser@desktop1:~$ 
    stop, do not continue. Go back and check if nfp_dev_cpp = 1 and also if the vm was configured to support PCIe cards. Do not continue until you have checked and addressed these two items.

Coding, at last!

This Hello World was stolen from the Netronome appropriately named Hello World example. I will be rushing through it, concentrating on getting it to compile and showing some common issues. Lookup on the example docs for what each line does.

  1. So we create our hello world project using lab_template as the, well, template.
    mkdir dev
    cd dev
    git clone https://github.com/open-nfpsw/c_packetprocessing.git
    cd c_packetprocessing/apps/
    cp -r lab_template lab_hello_world
    cd lab_hello_world
    NOTE: This creates a ~/dev/c_packetprocessing/apps/lab_hello_world directory. If you want to move it to a different location, edit the line
    ROOT_SRC_DIR  ?= $(realpath $(app_src_dir)/../..)
    in the Makefile.
  2. So far the hello world directory looks rather bare:

    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$ ls
    Makefile  README
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$

    so let's start populating it.

    cat > hello_world.c << 'EOF'
    #include <nfp.h>
    __declspec(ctm) int old[] = {1,2,3,4,5,6,7,8,9,10};
    __declspec(ctm) int new[sizeof(old)/sizeof(int)];
    
    int main(void)
    {
            if (__ctx() == 0)
            {
                    int i, size;
                    size = sizeof(old)/sizeof(int);
                    for (i=0; i < size; i++)
                    {
                            new[i] = old[size - i - 1];
                    }
            }
            return 0;
    }
    EOF
  3. We add a few lines to the makefile. Their explanation is listed in.

    sed -i -e '/^# Application definition starts here/ a\
    $(eval $(call micro_c.compile_with_rtl,hello_world_obj,hello_world.c)) \
    $(eval $(call fw.add_obj,hello_world,hello_world_obj,i32.me0 i32.me1)) \
    $(eval $(call fw.link_with_rtsyms,hello_world))' Makefile
  4. Time for some compiling!

    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$ make
    /opt/netronome/bin/nfcc -Fo/home/theuser/dev/c_packetprocessing/apps/lab_hello_world/ -Fe/home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world_obj.list -W3 -chip nfp-4xxx-b0 -Qspill=7 -Qnn_mode=1 -Qno_decl_volatile -single_dram_signal -Qnctx_mode=8 -I. -I/home/theuser/dev/c_packetprocessing/microc/include -I/home/theuser/dev/c_packetprocessing/microc/lib   /opt/netronome/components/standardlibrary/microc/src/rtl.c /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world.c
    /opt/netronome/bin/nfld -chip nfp-4xxx-b0 -mip -rtsyms -o /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world.fw -map /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world.map -u i32.me0 /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world_obj.list -u i32.me1 /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world_obj.list
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$

    which creates a few intermediate files:

    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$ ls
    hello_world.c   hello_world.map  hello_world_obj.list  README
    hello_world.fw  hello_world.obj  Makefile              rtl.obj
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$ cat hello_world.map
    Memory Map file: /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world.map
    Date: Tue May  7 10:39:30 2019
    
    nfld version: 6.0.4.1,  NFFW: /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world.fw
    
    Address       Region     ByteSize        Symbol
    ===================================================
    0x0000000000800000    i24.emem      108                 .mip
    0x0000000000000000    i32.ctm       704                 i32.me0.ctm_40$tls
    0x00000000000002c0    i32.ctm       704                 i32.me1.ctm_40$tls
    
    ImportVar                       Uninitialized Value
    ===================================================
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$
  5. Next upload the firmware we created into the card. This needs to be run either as root or as an user who can write to card.
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world# make load_hello_world
    nfp-nffw load --no-start /home/theuser/dev/c_packetprocessing/apps/lab_hello_world/hello_world.fw
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world#
    NOTE: If you see the following error message
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$ make load_hello_world
    nfp-nffw load --no-start /home/centos/dev/c_packetprocessing/apps/lab_hello_world/hello_world.fw
    nfp-nffw: Failed to open NFP device 0 (No such device)
    Please check that:
     -lspci -d 19ee: shows atleast one Netronome device
     -the nfp device number is correct
     -the user has read and write permissions to the Netronome device
     -the nfp.ko module is loaded
     -the nfp_dev_cpp option is enabled (please try modinfo nfp to see all params)
    nfp-nffw: Command 'load' failed
    make: *** [load_hello_world] Error 1
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$
    ou should check if
    • If you are running make load_hello_world as user who can write to the card.
    • nfp_dev_cpp = 1
    • the vm was configured to support PCIe cards.
    Go back in this document for instructions on how to do so.

    Now, if you see this error message

    [F] nfp6000_nffw.c:4643: Firmware already loaded. Unload first.
    Failed to load firmware: Operation not permitted
    nfp-nffw: Command 'load' failed
    Makefile:43: recipe for target 'load_hello_world' failed
    either you or someone else had already loaded firmware into the card. All you have to do is unload it
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$ nfp-nffw unload
    theuser@desktop1:~/dev/c_packetprocessing/apps/lab_hello_world$
    and then run make load_hello_world again.

  6. In the hello world instructions, the next step is to see the card memory since later on we will be writing to it. So, here is it.
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world# nfp-rtsym --len 176 i32.me0.ctm_40\$tls:0
    0x0000000000:  0x00000001 0x00000002 0x00000003 0x00000004
    0x0000000010:  0x00000005 0x00000006 0x00000007 0x00000008
    0x0000000020:  0x00000009 0x0000000a 0x00000000 0x00000000
    0x0000000030:  0x00000000 0x00000000 0x00000000 0x00000000
    *
    0x0000000050:  0x00000000 0x00000000 0x00000001 0x00000002
    0x0000000060:  0x00000003 0x00000004 0x00000005 0x00000006
    0x0000000070:  0x00000007 0x00000008 0x00000009 0x0000000a
    0x0000000080:  0x00000000 0x00000000 0x00000000 0x00000000
    *
    
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world#
  7. Unleash the code so it does things:
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world# make fw_start
    nfp-nffw start
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world# 
  8. If things were successfully done, we now can see the memory contents have changed:
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world# nfp-rtsym --len 176 i32.me0.ctm_40\$tls:0
    0x0000000000:  0x00000001 0x00000002 0x00000003 0x00000004
    0x0000000010:  0x00000005 0x00000006 0x00000007 0x00000008
    0x0000000020:  0x00000009 0x0000000a 0x00000000 0x00000000
    0x0000000030:  0x0000000a 0x00000009 0x00000008 0x00000007
    0x0000000040:  0x00000006 0x00000005 0x00000004 0x00000003
    0x0000000050:  0x00000002 0x00000001 0x00000001 0x00000002
    0x0000000060:  0x00000003 0x00000004 0x00000005 0x00000006
    0x0000000070:  0x00000007 0x00000008 0x00000009 0x0000000a
    0x0000000080:  0x00000000 0x00000000 0x00000000 0x00000000
    *
    
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world#
  9. Don't forget to unload the firmware by typing nfp-nffw unload!
  10. Checking that we are done
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world# nfp-rtsym --len 176 i32.me0.ctm_40\$tls:0
    No runtime symbol named 'i32.me0.ctm_40$tls'
    root@desktop1:/home/theuser/dev/c_packetprocessing/apps/lab_hello_world#

So congratulations! You not only installed the SDK and wrote and ran your first Netronome program! You may want to look into the Network Flow C Compiler User's Guide for further info on what you can do; I would put the link but right on the front page it states it is Proprietary and Confidential.

Next time we will do some openflow or P4 coding. Don't ask me to tell which one will be because I have not decided yet. Brain hurts!

What about the simulator? Maybe one day.

Friday, May 03, 2019

Passing a Network card to a KVM vm guest because we are too lazy to configure SR-IOV

This can be taken as a generic how-to about passing PCIe cards to a mv guest. I will

Why

I can come up with a lot of excuses. The bottom line is you want the vm guest to do something with the card the vm host can't or shouldn't. For instance, what if we want to give a wireless card for a given vm guest? And the card is not supported by the vm host (I am looking at you, VMWare ESXi) or the vm host does not know how to virtualize it in a meaningful way?

Note: What we are talking about here should work with any PCI/PCIe card, but we said we will be talking about network cards, so there.

The Card

The card is a PCIe network card; for this article it should be seen as a garden-variety network card. You probably will not let me leave at that so, here is the info on the specific card I will be using in this article: it is a Netronome Agilio CX 2x10GbE (the one in the picture is a CX 1x40GbE, which I happen to own hence the crappy picture), which is built around their NFP-4000 flow processor. Basic informercial on it can be found at https://www.netronome.com/m/documents/PB_Agilio_CX_2x10GbE.pdf (it used to be https://www.netronome.com/media/documents/PB_Agilio_CX_2x10GbE.pdf, but I guess they thought media was too long a word. It also means that sometime after this article is posted the link will change again; no point on making them orange links). It is supposed to do things like KVM hypervisor support (SR-IOV comes to mind) right out of the box, so why we would want to passthrough the entire card to a vm guest? Here are some reasons:

  • What if the card can do something and the VM abstraction layers do not expose that?
  • What if we want to program the card to do our bidding?
  • What if we want to change the firmware of the card? Some cards allow you to upgrade the firmware, or change it completely to use it for other thingies (the Netronome card in question fits this second option, details about that might be discussed in a future article).
  • Why did you pick this card? Hey, this is not a reason to pass the entire card, but I will answer it anyway: because I have a box with 3 of them I was going to use for something else (we may talk about that in a future article). With that said, I avoided going over any of the special sauce this card has. For the purpose of this article, it is just a PCIe card I want to give to a vm guest.

How

Finding the card

Ok, card is inserted into the vm host, which booted properly. Now what? Well, we need to find where the card is so we can tell our guests. Most Linux distros come with lspci, which probulates the PCI bus. The trick is to search for the right pattern. Let's for instance look for network devices in one of my ESXi nodes:

[root@vmhost2:~] lspci | grep 'Network'
0000:00:19.0 Network controller: Intel Corporation 82579LM Gigabit Network Connection [vmnic0]
0000:04:00.0 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic1]
0000:04:00.1 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic2]
0000:05:00.0 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic3]
0000:05:00.1 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic4]
[root@vmhost2:~]

Notes

  1. ESXi is really not Linux but freebsd with gnu packages sprinkled over
  2. I just mentioned ESXi here because I needed another system I could run lscpi on.
  3. The lscpi options in ESXi are not as extensive as in garden-variety Linux. But, it is good enough to show it in action.
  4. If we had searched for Intel Corporation we would get much much more replies including the CPU itself. So, taking the time to get the right search string pays off.

If we were going to probulate in a Linux host, Ethernet works better than Network as the search pattern. We can even look at virtual interfaces KVM is feeding to a vm guest:

theuser@desktop1:~$ lspci |grep Ethernet
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:06.0 Ethernet controller: Red Hat, Inc. Virtio network device
theuser@desktop1:~$

Note that the 0000: is assumed. A very useful option available in the Linux version of lspci but not the ESXi one is -nn:

theuser@desktop1:~$ lspci -nn |grep Ethernet
00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
00:06.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
theuser@desktop1:~$

The [1af4:1000] means [vendor_id:product_id]; remember it well.

For the Netronome cards we can just look for netronome since there should be no other devices matching that name besides the cards made by them:

raub@vmhost ~$ sudo lspci -nn|grep -i netronome
11:00.0 Ethernet controller [0200]: Netronome Systems, Inc. Device [19ee:4000]
raub@vmhost ~$

The card's PCI address is 11:00.0

Handing out the card to the guest

Two things we need to do when passing a PCI device to a vm guest (a.k.a. desktop1 in this example):

  1. Tell the vm host to keep its hands off it. The reason is that, in the case of a network card, it might want to configure it, creating interfaces (in the /dev/ directory tree) which either the host server (vmhost) can use for its own nefarious uses or so KVM can then virtualize (as a Virtio network device or some other emulation) to hand out to the guests. Since we want to use said card for our personal private personal nefarious purposes within a specific vm guest (desktop1), we are not going to be nice and share it.

    So we need to tell vmhost to leave it alone.

    • KVM knows it exists because it can look in the PCI chain by itself:
      [root@vmhost ~]# virsh nodedev-list | grep pci_0000_11
      pci_0000_11_00_0
      [root@vmhost ~]#
    • So now we can tell vmhost to leave pci-0000:11:00.0 alone:

      [root@vmhost ~]$ sudo virsh nodedev-dettach pci_0000_11_00_0
      Device pci_0000_11_00_0 detached
      
      [root@vmhost ~]$
  2. Tell the vm guest there is this shiny card it can lay its noodly appendages on.
    1. Shut the vm guest down.
    2. Edit the desktop.
      virsh edit desktop
    3. Add something like
      <hostdev mode='subsystem' type='pci' managed='yes'>
            <source>
                <address domain='0x0000' bus='0x11' slot='0x00' function='0x0'/>
            </source>
          </hostdev>
      to the end of the devices session. When you save it, it will properly place
      and configure the entry.
    4. Restart vm guest check if it can see the card using dmesg (Ubuntu 19.04 example. Note it is being listed as pci-0000:04:00.0 inside the vm guest). I expect to see something like

      [    7.348276] Netronome NFP CPP API
      [    7.352347] nfp-net-vnic: NFP vNIC driver, Copyright (C) 2010-2015 Netronome Systems
      [    7.361865] nfp 0000:04:00.0: Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card Probe
      [    7.372133] nfp 0000:04:00.0: RESERVED BARs: 0.0: General/MSI-X SRAM, 0.1: PCIe XPB/MSI-X PBA, 0.4: Explicit0, 0.5: Explicit1, free: 20/24
      [    7.396094] nfp 0000:11:00.0: Model: 0x40010010, SN: 00:15:4d:13:5d:58, Ifc: 0x10ff

      But what I am getting is something more like this:

      [    1.768683] nfp: NFP PCIe Driver, Copyright (C) 2014-2017 Netronome Systems
      [    1.773014] nfp 0000:00:07.0: Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card Probe
      [    1.774066] nfp 0000:00:07.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)
      [    1.775212] nfp 0000:00:07.0: can't find PCIe Serial Number Capability
      [    1.776252] nfp 0000:00:07.0: Interface type 15 is not the expected 1
      [    1.777285] nfp 0000:00:07.0: NFP6000 PCI setup failed

      What is going on? The answer to that is the next topic. You see,

PCIe is more demanding

Do you remember the can't find PCIe Serial Number Capability message? This is a PCIe card, meaning we need to setup the vm guest machine type to q35, which supports the ICH9 chipset which can handle a PCIe bus. The default (I440FX) can only do PCI bus. QEMU has a nice description on the difference. So, let's give it a try by recreating the KVM guest:

virt-install \
   --name desktop1 \
   --disk path=/home/raub/desktop1.qcow2,format=qcow2,size=10 \
   --ram 4098 --vcpus 2 \
   --cdrom /export/public/ISOs/Linux/ubuntu/ubuntu-16.04.5-server-amd64.iso  \
   --os-type linux --os-variant ubuntu19.04 \
   --network network=default \
   --graphics vnc --noautoconsole \
   --machine=q35 \
   --arch x86_64

When we try to build that vm guest, we get an error message stating that

ERROR    No domains available for virt type 'hvm', arch 'x86_64', machine type 'q35'

What now? You see, at the time I wrote this, the CentOS KVM package does not support q35 out of the box. We need more packages!

yum install centos-release-qemu-ev
yum update
reboot

And we try again, this time when we login to the guest, desktop1, it looks more promising (note the PCI address changed to 0000:01:00.0; this is a new vm guest):

theuser@desktop1:~$ dmesg |grep -i netro
[    1.922051] nfp: NFP PCIe Driver, Copyright (C) 2014-2017 Netronome Systems
[    1.954196] nfp 0000:01:00.0: Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card Probe
[    2.239018] nfp 0000:01:00.0: nfp:   netronome/serial-00-15-4d-13-5d-46-10-ff.nffw: not found
[    2.239059] nfp 0000:01:00.0: nfp:   netronome/pci-0000:01:00.0.nffw: not found
[    2.239913] nfp 0000:01:00.0: nfp:   netronome/nic_AMDA0096-0001_2x10.nffw: found, loading...
[   11.954477] nfp 0000:01:00.0 eth0: Netronome NFP-6xxx Netdev: TxQs=2/32 RxQs=2/32
[   11.971175] nfp 0000:01:00.0 eth1: Netronome NFP-6xxx Netdev: TxQs=2/31 RxQs=2/31
theuser@desktop1:~$

Which then becomes

theuser@desktop1:~$ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3:  mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:d4:9e:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.105/24 brd 192.168.122.255 scope global dynamic enp0s3
       valid_lft 3489sec preferred_lft 3489sec
    inet6 fe80::5054:ff:fed4:9e50/64 scope link
       valid_lft forever preferred_lft forever
3: enp1s0np0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:15:4d:13:5d:47 brd ff:ff:ff:ff:ff:ff
4: enp1s0np1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:15:4d:13:5d:48 brd ff:ff:ff:ff:ff:ff
theuser@desktop1:~$

And now we can do something useful with it.

References

  • https://stackoverflow.com/questions/14061840/kvm-and-libvirt-wrong-cpu-type-in-virtual-host
  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-kvm_guest_virtual_machine_compatibility-supported_cpu_models
  • https://github.com/libvirt/libvirt/blob/v4.0.0/src/util/virarch.c#L37

Wednesday, January 16, 2019

Getting the profile name for installing a ESXi patch from command line

So I go to the vmware page for ESXi patches and downloaded the patch

ESXi670-201811001

Product:ESXi (Embedded and Installable) 6.7.0

Download Size:317.5 MB

But when I try to install it, I need a "patch" name, whatever that is

[root@vmhost2:/tmp] esxcli software profile update -d /vmfs/volumes/VOL1
/ISO/ESXi670-201811001.zip
Error: Missing required parameter -p|--profile

Usage: esxcli software profile update [cmd options]

Description:
  update                Updates the host with VIBs from an image profile in a
                        depot. Installed VIBs may be upgraded (or downgraded
                        if --allow-downgrades is specified), but they will
                        not be removed. Any VIBs in the image profile which
                        are not related to any installed VIBs will be added
                        to the host. WARNING: If your installation requires a
                        reboot, you need to disable HA first.

Cmd options:
  --allow-downgrades    If this option is specified, then the VIBs from the
                        image profile which update, downgrade, or are new to
                        the host will be installed. If the option is not
                        specified, then the VIBs which update or are new to
                        the host will be installed.
  -d|--depot=[  ... ]
                        Specifies full remote URLs of the depot index.xml or
                        server file path pointing to an offline bundle .zip
                        file. (required)
  --dry-run             Performs a dry-run only. Report the VIB-level
                        operations that would be performed, but do not change
                        anything in the system.
[...]
  -p|--profile=    Specifies the name of the image profile to update the
                        host with. (required)
  --proxy=         Specifies a proxy server to use for HTTP, FTP, and
                        HTTPS connections. The format is proxy-url:port.
[root@vmhost2:/tmp]

Maybe it is named after the file? Let's punt

[root@vmhost2:/tmp] esxcli software profile update -d /vmfs/volumes/VOL1/IS
O/ESXi670-201811001.zip  -p ESXi-6.7.0-201811001-standard
 [NoMatchError]
 No image profile found with name 'ESXi-6.7.0-201811001-standard'
         id = ESXi-6.7.0-201811001-standard
 Please refer to the log file for more details.
[root@dentistryesxi2:/tmp]

After feeling a bit frustrated, I found some docs on updating a host using image profiles with the following example:

esxcli --server=server_name software sources profile list --depot=http://webserver/depot_name

Well, we know even though the docs seem to imply the "depot" is a server somewhere with the patch bundles as .zip files, we can just download and give the path. So, let's see if that applies here too:

[root@vmhost2:/tmp] esxcli software sources profile list -d /vmfs/volume
s/VOL1/ISO/ESXi670-201811001.zip
Name                             Vendor        Acceptance Level  Creation Time        Modification Time
-------------------------------  ------------  ----------------  -------------------  -------------------
ESXi-6.7.0-20181104001-no-tools  VMware, Inc.  PartnerSupported  2018-11-08T08:39:27  2018-11-08T08:39:27
ESXi-6.7.0-20181104001-standard  VMware, Inc.  PartnerSupported  2018-11-08T08:39:27  2018-11-08T08:39:27
[root@dentistryesxi2:/tmp]

Aha! I was close but no cookies. I really would never have guessed it was 20181104001 instead of 201811001; that is why it pays to ask whoever knows the right answer, in this case the file itself. Let's try it:

[root@vmhost2:/tmp] esxcli software profile update -d /vmfs/volumes/VOL1/ISO/ESXi670-201811001.zip \
 -p ESXi-6.7.0-20181104001-standard
Update Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Reboot Required: true 
   VIBs Installed: VMW_bootbank_bnxtroce_20.6.101.0-20vmw.670.1.28.10302608, VMW_bootbank_brcmfcoe_11.4.1078.5-11vmw.670.1.28.10302608, VMW_bootbank_elxnet_11.4.1095.0-5vmw.670.1.28.10302608, VMW_bootbank_i40en_1.3.1-22vmw.670.1.28.10302608, VMW_bootbank_ipmi-ipmi-devintf_39.1-5vmw.670.1.28.10302608, VMW_bootbank_ipmi-ipmi-msghandler_39.1-5vmw.670.1.28.10302608, VMW_bootbank_ipmi-ipmi-si-drv_39.1-5vmw.670.1.28.10302608, VMW_bootbank_iser_1.0.0.0-1vmw.670.1.28.10302608, VMW_bootbank_ixgben_1.4.1-16vmw.670.1.28.10302608, VMW_bootbank_lpfc_11.4.33.3-11vmw.670.1.28.10302608, VMW_bootbank_lsi-mr3_7.702.13.00-5vmw.670.1.28.1030260
[...]
otbank_shim-vmklinux-9-2-1-0_6.7.0-0.0.8169922, VMW_bootbank_shim-vmklinux-9-2-
2-0_6.7.0-0.0.8169922, VMW_bootbank_shim-vmklinux-9-2-3-0_6.7.0-0.0.8169922, VM
W_bootbank_uhci-usb-uhci_1.0-3vmw.670.0.0.8169922, VMW_bootbank_usb-storage-usb
-storage_1.0-3vmw.670.0.0.8169922, VMW_bootbank_usbcore-usb_1.0-3vmw.670.0.0.8169922, VMW_bootbank_vmkata_0.1-1vmw.670.0.0.8169922, VMW_bootbank_vmkplexer-vmkplexer_6.7.0-0.0.8169922, VMW_bootbank_xhci-xhci_1.0-3vmw.670.0.0.8169922, VMware_bootbank_elx-esx-libelxima.so_11.4.1184.0-0.0.8169922, VMware_bootbank_esx-dvfilter-generic-fastpath_6.7.0-0.0.8169922, VMware_bootbank_esx-xserver_6.7.0-0.0.8169922, VMware_bootbank_lsu-lsi-lsi-msgpt3-plugin_1.0.0-8vmw.670.0.0.8169922, VMware_bootbank_lsu-lsi-megaraid-sas-plugin_1.0.0-9vmw.670.0.0.8169922, VMwa
re_bootbank_lsu-lsi-mpt2sas-plugin_2.0.0-7vmw.670.0.0.8169922, VMware_bootbank_
native-misc-drivers_6.7.0-0.0.8169922, VMware_bootbank_qlnativefc_3.0.1.0-5vmw.
670.0.0.8169922, VMware_bootbank_rste_2.0.2.0088-7vmw.670.0.0.8169922
[root@vmhost2:/tmp]

Much better.

Failover and swapping controllers in a Equallogic SAN, command line involved

Equallogic as some of you know are old storage systems sold by Dell that can do at best RAID6 in a good day. As of now it has been out of support for a few years; Dell bought EMC and would rather if we used that instead. We do have them too but this article is about the Equallogic we still use.

On a Tuesday evening I got an email from one of those telling me it was unhappy. As my recurring motif, I do not like GUIs to manage devices. This one decided it would not reinforce that belief and instead just refused to start. So it is time for plan B. Most of the storage appliances and hypervisors out there either are built on Linux or Freebsd. In the last time I had to probulate it, I found it runs freebsd. And that usually means we can ssh into it. And we do it; I think I do not need to show how to do that; if you ssh into a machine, you ssh into them all.

Now we are inside we need to find what's up (I am cheating a bit because the email did tell me that the device PS4100E-01 was unhappy, which is why I am probulating it specifically):

STORGroup01> member select PS4100E-01 show
_____________________________ Member Information ______________________________
Name: PS4100E-01                       Status: online
TotalSpace: 15.98TB                    UsedSpace: 3.86TB
SnapSpace: 192.92GB                    Description:
Def-Gateway: 192.168.1.1               Serial-Number:
Disks: 12                                CN-TWO_AND_A_HALF
Spares: 1                              Controllers: 2
CacheMode: write-thru                  Connections: 4
RaidStatus: ok                         RaidPercentage: 0.000%
LostBlocks: false                      HealthStatus: critical
LocateMember: disable                  Controller-Safe: disabled
Version: V9.1.4 (R443182)              Delay-Data-Move: disable
ChassisType: DELLSBB2u12 3.5           Accelerated RAID Capable: no
Pool: default                          Raid-policy: raid6
Service Tag: H9CHSW1                   Product Family: PS4100
All-Disks-SED: no                      SectorSize: 512
Language-Kit-Version: de, es, fr, ja,  ExpandedSnapDataSize: N/A
  ko, zh                               CompressedSnapDataSize: N/A
CompressionSavings: N/A                Data-Reduction: no-capable-hardware
Raid-Rebuild-Delay-State: disabled     Raid-Expansion-Status: enabled
_______________________________________________________________________________

____________________________ Health Status Details ____________________________

Critical conditions::
Critical hardware component failure.

Warning conditions::
None
_______________________________________________________________________________


____________________________ Operations InProgress ____________________________


ID StartTime            Progress Operation Details                             

-- -------------------- -------- -----------------------------------------------
STORGroup01>

Yep, the Health Status Details menu tells us it is unhappy; we knew that already. But, what does the log say?

STORGroup01> show recentevents
[...]
6492:881:PS4100E-01:SP: 8-Jan-2019 19:34:51.700834:cache_driver.cc:1056:WARNING:
28.3.17:Active control module cache is now in write-through mode. Array performa
nce is degraded.

6491:880:PS4100E-01:SP: 8-Jan-2019 19:34:51.700833:emm.c:355:ERROR:28.4.85:Criti
cal hardware component failure, as shown next.
        C2F power module is not operating.

6490:879:PS4100E-01:SP: 8-Jan-2019 19:34:51.700832:emm.c:2363:ERROR:28.4.47:Crit
ical health conditions exist.
 Correct immediately before they affect array operation.
        Critical hardware component failure.
        There are 1 outstanding health conditions. Correct these conditions before they
 affect array operation.

OK, controller thingie is not happy. But, this device has 2 of them. Which one is it?

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: failed
ProcessorTemperature: 65               ChipsetTemperature: 44
LastBootTime: 2018-04-23:15:06:55      SerialNumber:
Manufactured: 0327                       CN-I_AM_FEELING_DEPRESSED
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011                             
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2018-04-23:15:13:17      SerialNumber:
Manufactured: 031S                       CN-I_AM_FEELING_GREAT
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011                             
_______________________________________________________________________________

______________________________ Cache Information ______________________________
CacheMode: write-thru                  Controller-Safe: disabled
Low-Battery-Safe: enabled                                                      
_______________________________________________________________________________
STORGroup01>

Fun fact: I haven't the foggiest idea of which slot is 1 and which one is 0. We will have to find that out as we go along.

SPOILER ALERT: Making ASSumptions here is a bad idea.

The Controller

We sent out the above info -- model, firmware -- to the vendor we have a support contract with, which sent a controller card. Here is it in its purple gloriousness.

The configuration is stored in a SD card on the socket being pointed by my finger. When swapping the controller, the laziest thing to do is put the old SD card in the new controller. This way we do not have to configure it.


We know one of the controllers is unhappy, but which one? You see, this storage device only needs one controller to work. But, it is an enterprise device; the reason to have two is that the second one is on standby: if the primary has a problem, service can failover to the secondary/backup. If you look at the picture below, you will see one controller has both lights green while the one below is has the top green and the bottom orange. That (orange light) indicates it is the either the backup, secondary, failover, or unused controller. The OR is very important here

Now, this system is designed to be hot swappable but with one proviso: you can swap the device that is not currently in use, which would make sense since chances are the bad controller failed and the backup took over. So, the failed device should be the one in standby.

So, I swapped it. And then checked again.

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: failed
ProcessorTemperature: 64               ChipsetTemperature: 44
LastBootTime: 2018-04-23:15:07:14      SerialNumber:
Manufactured: 0327                       CN-I_AM_FEELING_DEPRESSED
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:12:56:01      SerialNumber:
Manufactured: 025E                       CN-I_AM_THE_NEW_GUY
ECOLevel: C00                          CM Rev.: A03
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

It seems I replaced the secondary (serial number CN-I_AM_THE_NEW_GUY), and the primary one (CN-I_AM_FEELING_DEPRESSED) is still the problematic one. Why didn't it failover when it realized it had a problem? I don't know; what I know is that I still need to replace the problematic controller. At least now we know the SlotID 0 is the top controller and SlotID 1 the bottom. Small progress but progress nevertheless.

It turns out there is a way to programmatically force the failover; that is achieved using the command restart. But, since I have 3 different storage devices -- PS4100E-01, PS4100E-00, and PS6100E -- in this setup and am currently accessing them from the system that control all of them, and restart does not seem to allow me to specify which device I want to reboot,

STORGroup01> restart
            - Optional argument to restart.
 
STORGroup01>

we should ssh into PS4100E-01 (IP 192.168.1.3 for those who are wondering) and then issue the command there. We do not need to worry about it restarting the secondary controller because it only reboots the active one, which causes it to failover to the secondary.

STORGroup01> restart

Restarting the system will result in the active and secondary control
modules switching roles. Therefore, the current active control module
will become the secondary after the system restart.


After you enter the restart command, the active control module will fail over.
To continue to use the serial connection when the array restarts, connect the
serial cable to the new active control module.

Do you really want to restart the system? (yes/no) [no]:yes
Restarting at Tue Jan 15 13:40:48 EST 2019 -- please wait...
Waiting for the secondary to synchronize with us (max 900s)
....Rebooting the active controller
Connection to 192.168.1.3 closed by remote host.
Connection to 192.168.1.3 closed.
raub@desktop:~$

And, yes, it is that anticlimatic. I just got kicked off PS4100E-01; our network access to the storage device is through the controller. Now we need to wait for it to come back. I am lazy so I will use ping to let me know when the network interface is back up.

raub@desktop:~$ ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.

64 bytes from 192.168.1.3: icmp_seq=10 ttl=254 time=3.50 ms
64 bytes from 192.168.1.3: icmp_seq=10 ttl=254 time=3.51 ms (DUP!)
64 bytes from 192.168.1.3: icmp_seq=11 ttl=254 time=90.8 ms
64 bytes from 192.168.1.3: icmp_seq=12 ttl=254 time=1.14 ms
64 bytes from 192.168.1.3: icmp_seq=13 ttl=254 time=1.49 ms
64 bytes from 192.168.1.3: icmp_seq=14 ttl=254 time=1.64 ms
64 bytes from 192.168.1.3: icmp_seq=15 ttl=254 time=1.06 ms
^C
--- 192.168.1.3 ping statistics ---
15 packets transmitted, 6 received, +1 duplicates, 60% packet loss, time 14078m
s
rtt min/avg/max/mdev = 1.068/14.741/90.815/31.072 ms
raub@desktop:~$

What I can't show in this article is that it took a while until I started getting packets back. In fact, I went to make coffee (which required me to find coffee first) and then came back and then run ping a few times until I got the above output. These devices are in no hurry when rebooting.

Once we see pinging, we can check if ssh is running again and then log back in and asked what's up with the controllers:

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: unknown
Model: ()                              BatteryStatus: unknown
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:13:43:19      SerialNumber:
Manufactured:                          ECOLevel:
CM Rev.:                               FW Rev.:
BootRomVersion:                        BootRomBuilDate:
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 60               ChipsetTemperature: 43
LastBootTime: 2019-01-15:12:56:00      SerialNumber:
Manufactured: 025E                       CN-I_AM_THE_NEW_GUY
ECOLevel: C00                          CM Rev.: A03
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

When the output was captured, the controller in slot 0 has not come back up from the restart command. What matters, however, is the controller in slot 1 became the active one, meaning we can swap the unhappy controller. If we wait a bit, we will see the controller on slot 0 be reported as failed,

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: failed
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:13:43:09      SerialNumber:
Manufactured: 0327                       CN-I_AM_FEELING_DEPRESSED
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

which means we can now swap the top guy (slot 0) with the old controller we accidentally removed from slot 1. After a few minutes, all is well in the world:

STORGroup01> member select PS4100E-01 show controllers
___________________________ Controller Information ____________________________
SlotID: 0                              Status: secondary
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 0                ChipsetTemperature: 0
LastBootTime: 2019-01-15:13:55:33      SerialNumber:
Manufactured: 031S                       CN-I_AM_FEELING_GREAT
ECOLevel: C00                          CM Rev.: A04
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________
_______________________________________________________________________________
SlotID: 1                              Status: active
Model: 70-0476(TYPE 12)                BatteryStatus: ok
ProcessorTemperature: 62               ChipsetTemperature: 42
LastBootTime: 2019-01-15:12:55:43      SerialNumber:
Manufactured: 025E                       CN-I_AM_THE_NEW_GUY
ECOLevel: C00                          CM Rev.: A03
FW Rev.: Storage Array Firmware V9.1.4 BootRomVersion: 3.6.4
   (R443182)                           BootRomBuilDate: Mon Jun 27 10:20:45
                                          EDT 2011
_______________________________________________________________________________

There might be a GUI way to do all of that but I am not smart enough to click on things.