Monday, January 30, 2017

Using tcpdump to see vlan traffic in xenserver

Short version: it is a bit convoluted, bordering into the Rube Goldberg domain.

If you are in a hurry, you can now move onto something more interesting. If you instead want to hear me ranting and typing annoying commands, read on.

If we talking about building servers to virtualize hosts, we will end up talking about multiple networks organized in vlans which then need to be fed to this vm server. Of course running a 802.1q trunk sometimes does not work perfectly, so we need to be prepared to look behind the curtain. If the vm server is running Linux, like if it was running KVM or Xen, we can unleash tcpdump just like we did when trying to diagnose some trunking issues with a router. What about the Xenserver you mentioned in the title of this article? I thought it was Linux based. Good question. Very good question. I started this assuming it would be just Linux business as usual. You know what they say about assuming.

What I found out is that you cannot just use eth0.12 if you want to look for vlan 12 like you would do in KVM. In fact, it does not use /proc/net/vlan. It just ain't there

[root@thexen ~]# ls /proc/net/
anycast6      ip6_flowlabel        netfilter            route         tcp
arp           ip_conntrack         netlink              rpc           tcp6
dev           ip_conntrack_expect  netstat              rt6_stats     udp
dev_mcast     ip_mr_cache          nf_conntrack         rt_acct       udp6
dev_snmp6     ip_mr_vif            nf_conntrack_expect  rt_cache      udplite
fib_trie      ip_tables_matches    packet               snmp          udplite6
fib_triestat  ip_tables_names      protocols            snmp6         unix
icmp          ip_tables_targets    psched               sockstat
if_inet6      ipv6_route           ptype                sockstat6
igmp          mcfilter             raw                  softnet_stat
igmp6         mcfilter6            raw6                 stat
[root@thexen ~]#

You see, there is no eth0.12 defined here; by default xenserver will try to configure all available network cards (NICs for those craving for acronyms) in a managed (by xenserver) mode. Once they are added, it creates bridges, called xapiN, and then associates them with each network. And how do we find out which of those bridges is being used by our vlan? Er, it requires a few steps using the xen commands (xe something-or-another) which I have not found out how to automate yet.

  1. We begin by finding out which vlans are defined in this server. And that can be done using xe pif-list:
    root@thexen ~]# xe pif-list
    uuid ( RO)                  : 540f3b24-0606-6380-c10c-c2f8c2f4c2ce
                    device ( RO): eth1
        currently-attached ( RO): true
                      VLAN ( RO): 2
              network-uuid ( RO): a874cb50-1c87-0bde-390d-66d0a4e1576c
    
    
    uuid ( RO)                  : 8684e63f-3d1c-241b-8e75-3b2e37f8c859
                    device ( RO): eth0
        currently-attached ( RO): true
                      VLAN ( RO): -1
              network-uuid ( RO): ed2325c5-1f3b-7f25-6104-61902a13d3ac
    
    
    uuid ( RO)                  : 82b9dee1-52db-a6ae-cb42-11ae7f6d3d25
                    device ( RO): eth1
        currently-attached ( RO): true
                      VLAN ( RO): -1
              network-uuid ( RO): 993c9237-5961-9808-36cd-729827e005d8
    
    uid ( RO)                  : 592339bf-cc03-4048-9075-946f5bcc47fb
                    device ( RO): eth1
        currently-attached ( RO): true
                      VLAN ( RO): 12
              network-uuid ( RO): 0d28f847-3da6-11f3-3600-8a033435168c
    
    
    uuid ( RO)                  : 3d60399c-bb8d-5e5a-e01b-8986b8808f12
                    device ( RO): eth0
        currently-attached ( RO): true
                      VLAN ( RO): 3
              network-uuid ( RO): c8726e09-a0a5-b026-013e-2c5edd5062b3
    
    
    uuid ( RO)                  : 19f1fe37-16d1-6fcd-4bbd-4e566abc74c4
                    device ( RO): eth0
        currently-attached ( RO): true
                      VLAN ( RO): 8
              network-uuid ( RO): 2f94b1c8-be16-14d5-a149-90ae35528c22
    
    
    uuid ( RO)                  : 6df7b741-9cef-d34f-e487-fa2abe422068
                    device ( RO): eth1
        currently-attached ( RO): true
                      VLAN ( RO): 100
              network-uuid ( RO): 9ec62435-ec2a-2bfc-9f29-2ea5c9756971
    
    
    [root@thexen ~]#

    What can we gather from this output:

    • This machine has 2 physical interfaces, eth0 and eth1. And each of them have a few vlans groing through them. So, there are two 802.1q trunks. Deal with it.
    • We have two uuid entries per interface uuid (the one after "uuid ( RO) ") and a network-uuid.
    • If we only wanted to see the VLAN number, both the uuids, and the physical NIC/device each virtual interface is using, we could have instead said
      xe pif-list params=device,VLAN,network-uuid,uuid

      But if we wanted to know everything about each virtual interface,

      xe pif-list params=all
    • To get more info on a giver interface (or bridge) you need the uuid associated with uuid ( RO). So if you wanted to know everything about VLAN 100, you could say
      xe pif-list uuid=6df7b741-9cef-d34f-e487-fa2abe422068 params=all
    • The ones with VLAN ( RO): -1 are the untagged networks; we have one per interface even if we do not have it defined.
OK< smart guy, how do we go from this to this crazy xapiN interface? Oh, you mean what xenserver calls a bridge? We shall use the xe network-list command. If you run that, it will give back which xapiN is associated with which vlan. It will also show which bridge is being used for the console to this xenserver, which usually is an untagged vlan. There are ways to make that a tagged vlan but that will be for another episode. What is important is the uuid being shown is the network-uuid we got using xe pif-list. And, we can feed it to the network-list command if we just care about, say, VLAN 12:

[root@thexen ~]# xe network-list uuid=0d28f847-3da6-11f3-3600-8a033435168c params=bridge,name-label
name-label ( RW)    : vlan 12
        bridge ( RO): xapi4


[root@thexen ~]#

We finally found out that vlan 12 is attached to xapi4. Time for some tcpdumping:

[root@thexen ~]# tcpdump -i xapi4 -e
tcpdump: WARNING: xapi4: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on xapi4, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@thexen ~]#

Crickets

What is going on here? I let it run for 10s; that should have been enough to fill the screen. Let's try again, letting it run for longer 94 minutes?) while we do, say, tracepath to the gateway.

[root@thexen ~]# tcpdump -i xapi4 -e -n
tcpdump: WARNING: xapi4: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on xapi4, link-type EN10MB (Ethernet), capture size 65535 bytes
10:46:10.696176 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:11.698310 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:12.700355 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:13.705644 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:14.706364 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:15.708375 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:18.710913 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
[...]
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28
10:46:30.724414 c0:ff:ee:70:63:a4 > Broadcast, ethertype ARP (0x0806), length 42
: Request who-has 192.168.12.241 tell 192.168.12.242, length 28

^C
15 packets captured
15 packets received by filter
0 packets dropped by kernel
[root@thexen ~]#

Fifteen packets in four minutes? I could have done that by hand! What is going on here?

But it really does not work as well as it should

Do I sound bitter? I am just being factual. I can't use tcpdump when this bridge thing is only giving me like 4 packets a minute. Even if there was no talkign between servers, just the ARP requests should have been more often. So, we need to rething this.

We did the proper thing so far. Now it's time to cheat.

We know the network associated with vlan 12 is 192.168.12.0/24, so why not tell tcpdump to look at eth1 for anything that matches that?

tcpdump -i eth1 -e -n | grep '192.168.12'

I will not bother to show the output of that but it will look much more like what we would have expected. Of course, it will only get traffic in that wire matching that pattern so if you have a host trying to reach out for a dhcp server in that network you will not detect that. Nor it would find IPv6 traffic (you would need to feed the proper pattern). But, it is better than using xapi4.

No comments: