Friday, May 03, 2019

Passing a Network card to a KVM vm guest because we are too lazy to configure SR-IOV

This can be taken as a generic how-to about passing PCIe cards to a mv guest. I will

Why

I can come up with a lot of excuses. The bottom line is you want the vm guest to do something with the card the vm host can't or shouldn't. For instance, what if we want to give a wireless card for a given vm guest? And the card is not supported by the vm host (I am looking at you, VMWare ESXi) or the vm host does not know how to virtualize it in a meaningful way?

Note: What we are talking about here should work with any PCI/PCIe card, but we said we will be talking about network cards, so there.

The Card

The card is a PCIe network card; for this article it should be seen as a garden-variety network card. You probably will not let me leave at that so, here is the info on the specific card I will be using in this article: it is a Netronome Agilio CX 2x10GbE (the one in the picture is a CX 1x40GbE, which I happen to own hence the crappy picture), which is built around their NFP-4000 flow processor. Basic informercial on it can be found at https://www.netronome.com/m/documents/PB_Agilio_CX_2x10GbE.pdf (it used to be https://www.netronome.com/media/documents/PB_Agilio_CX_2x10GbE.pdf, but I guess they thought media was too long a word. It also means that sometime after this article is posted the link will change again; no point on making them orange links). It is supposed to do things like KVM hypervisor support (SR-IOV comes to mind) right out of the box, so why we would want to passthrough the entire card to a vm guest? Here are some reasons:

  • What if the card can do something and the VM abstraction layers do not expose that?
  • What if we want to program the card to do our bidding?
  • What if we want to change the firmware of the card? Some cards allow you to upgrade the firmware, or change it completely to use it for other thingies (the Netronome card in question fits this second option, details about that might be discussed in a future article).
  • Why did you pick this card? Hey, this is not a reason to pass the entire card, but I will answer it anyway: because I have a box with 3 of them I was going to use for something else (we may talk about that in a future article). With that said, I avoided going over any of the special sauce this card has. For the purpose of this article, it is just a PCIe card I want to give to a vm guest.

How

Finding the card

Ok, card is inserted into the vm host, which booted properly. Now what? Well, we need to find where the card is so we can tell our guests. Most Linux distros come with lspci, which probulates the PCI bus. The trick is to search for the right pattern. Let's for instance look for network devices in one of my ESXi nodes:

[root@vmhost2:~] lspci | grep 'Network'
0000:00:19.0 Network controller: Intel Corporation 82579LM Gigabit Network Connection [vmnic0]
0000:04:00.0 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic1]
0000:04:00.1 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic2]
0000:05:00.0 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic3]
0000:05:00.1 Network controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) [vmnic4]
[root@vmhost2:~]

Notes

  1. ESXi is really not Linux but freebsd with gnu packages sprinkled over
  2. I just mentioned ESXi here because I needed another system I could run lscpi on.
  3. The lscpi options in ESXi are not as extensive as in garden-variety Linux. But, it is good enough to show it in action.
  4. If we had searched for Intel Corporation we would get much much more replies including the CPU itself. So, taking the time to get the right search string pays off.

If we were going to probulate in a Linux host, Ethernet works better than Network as the search pattern. We can even look at virtual interfaces KVM is feeding to a vm guest:

theuser@desktop1:~$ lspci |grep Ethernet
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:06.0 Ethernet controller: Red Hat, Inc. Virtio network device
theuser@desktop1:~$

Note that the 0000: is assumed. A very useful option available in the Linux version of lspci but not the ESXi one is -nn:

theuser@desktop1:~$ lspci -nn |grep Ethernet
00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
00:06.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
theuser@desktop1:~$

The [1af4:1000] means [vendor_id:product_id]; remember it well.

For the Netronome cards we can just look for netronome since there should be no other devices matching that name besides the cards made by them:

raub@vmhost ~$ sudo lspci -nn|grep -i netronome
11:00.0 Ethernet controller [0200]: Netronome Systems, Inc. Device [19ee:4000]
raub@vmhost ~$

The card's PCI address is 11:00.0

Handing out the card to the guest

Two things we need to do when passing a PCI device to a vm guest (a.k.a. desktop1 in this example):

  1. Tell the vm host to keep its hands off it. The reason is that, in the case of a network card, it might want to configure it, creating interfaces (in the /dev/ directory tree) which either the host server (vmhost) can use for its own nefarious uses or so KVM can then virtualize (as a Virtio network device or some other emulation) to hand out to the guests. Since we want to use said card for our personal private personal nefarious purposes within a specific vm guest (desktop1), we are not going to be nice and share it.

    So we need to tell vmhost to leave it alone.

    • KVM knows it exists because it can look in the PCI chain by itself:
      [root@vmhost ~]# virsh nodedev-list | grep pci_0000_11
      pci_0000_11_00_0
      [root@vmhost ~]#
    • So now we can tell vmhost to leave pci-0000:11:00.0 alone:

      [root@vmhost ~]$ sudo virsh nodedev-dettach pci_0000_11_00_0
      Device pci_0000_11_00_0 detached
      
      [root@vmhost ~]$
  2. Tell the vm guest there is this shiny card it can lay its noodly appendages on.
    1. Shut the vm guest down.
    2. Edit the desktop.
      virsh edit desktop
    3. Add something like
      <hostdev mode='subsystem' type='pci' managed='yes'>
            <source>
                <address domain='0x0000' bus='0x11' slot='0x00' function='0x0'/>
            </source>
          </hostdev>
      to the end of the devices session. When you save it, it will properly place
      and configure the entry.
    4. Restart vm guest check if it can see the card using dmesg (Ubuntu 19.04 example. Note it is being listed as pci-0000:04:00.0 inside the vm guest). I expect to see something like

      [    7.348276] Netronome NFP CPP API
      [    7.352347] nfp-net-vnic: NFP vNIC driver, Copyright (C) 2010-2015 Netronome Systems
      [    7.361865] nfp 0000:04:00.0: Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card Probe
      [    7.372133] nfp 0000:04:00.0: RESERVED BARs: 0.0: General/MSI-X SRAM, 0.1: PCIe XPB/MSI-X PBA, 0.4: Explicit0, 0.5: Explicit1, free: 20/24
      [    7.396094] nfp 0000:11:00.0: Model: 0x40010010, SN: 00:15:4d:13:5d:58, Ifc: 0x10ff

      But what I am getting is something more like this:

      [    1.768683] nfp: NFP PCIe Driver, Copyright (C) 2014-2017 Netronome Systems
      [    1.773014] nfp 0000:00:07.0: Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card Probe
      [    1.774066] nfp 0000:00:07.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)
      [    1.775212] nfp 0000:00:07.0: can't find PCIe Serial Number Capability
      [    1.776252] nfp 0000:00:07.0: Interface type 15 is not the expected 1
      [    1.777285] nfp 0000:00:07.0: NFP6000 PCI setup failed

      What is going on? The answer to that is the next topic. You see,

PCIe is more demanding

Do you remember the can't find PCIe Serial Number Capability message? This is a PCIe card, meaning we need to setup the vm guest machine type to q35, which supports the ICH9 chipset which can handle a PCIe bus. The default (I440FX) can only do PCI bus. QEMU has a nice description on the difference. So, let's give it a try by recreating the KVM guest:

virt-install \
   --name desktop1 \
   --disk path=/home/raub/desktop1.qcow2,format=qcow2,size=10 \
   --ram 4098 --vcpus 2 \
   --cdrom /export/public/ISOs/Linux/ubuntu/ubuntu-16.04.5-server-amd64.iso  \
   --os-type linux --os-variant ubuntu19.04 \
   --network network=default \
   --graphics vnc --noautoconsole \
   --machine=q35 \
   --arch x86_64

When we try to build that vm guest, we get an error message stating that

ERROR    No domains available for virt type 'hvm', arch 'x86_64', machine type 'q35'

What now? You see, at the time I wrote this, the CentOS KVM package does not support q35 out of the box. We need more packages!

yum install centos-release-qemu-ev
yum update
reboot

And we try again, this time when we login to the guest, desktop1, it looks more promising (note the PCI address changed to 0000:01:00.0; this is a new vm guest):

theuser@desktop1:~$ dmesg |grep -i netro
[    1.922051] nfp: NFP PCIe Driver, Copyright (C) 2014-2017 Netronome Systems
[    1.954196] nfp 0000:01:00.0: Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card Probe
[    2.239018] nfp 0000:01:00.0: nfp:   netronome/serial-00-15-4d-13-5d-46-10-ff.nffw: not found
[    2.239059] nfp 0000:01:00.0: nfp:   netronome/pci-0000:01:00.0.nffw: not found
[    2.239913] nfp 0000:01:00.0: nfp:   netronome/nic_AMDA0096-0001_2x10.nffw: found, loading...
[   11.954477] nfp 0000:01:00.0 eth0: Netronome NFP-6xxx Netdev: TxQs=2/32 RxQs=2/32
[   11.971175] nfp 0000:01:00.0 eth1: Netronome NFP-6xxx Netdev: TxQs=2/31 RxQs=2/31
theuser@desktop1:~$

Which then becomes

theuser@desktop1:~$ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3:  mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:d4:9e:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.105/24 brd 192.168.122.255 scope global dynamic enp0s3
       valid_lft 3489sec preferred_lft 3489sec
    inet6 fe80::5054:ff:fed4:9e50/64 scope link
       valid_lft forever preferred_lft forever
3: enp1s0np0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:15:4d:13:5d:47 brd ff:ff:ff:ff:ff:ff
4: enp1s0np1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:15:4d:13:5d:48 brd ff:ff:ff:ff:ff:ff
theuser@desktop1:~$

And now we can do something useful with it.

References

  • https://stackoverflow.com/questions/14061840/kvm-and-libvirt-wrong-cpu-type-in-virtual-host
  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-kvm_guest_virtual_machine_compatibility-supported_cpu_models
  • https://github.com/libvirt/libvirt/blob/v4.0.0/src/util/virarch.c#L37

No comments: