Using a virtual machine as a DPDK compute node

“Historically”, when designing a contrail cluster, there have always been some “rules” we have almost blindly followed:

  • if you do not need high performance, deploy a kernel mode vRouter which will work with any NIC your server uses
  • if you need high performance, then deploy a DPDK vRouter but be sure to have the right NICs on your server

Moreover, we have always assumed that:

  • compute nodes must be bare metal servers
  • avoid to turn virtual machines into compute nodes

Those last two lines are “mandatory” if we think about production environments. There are few reasons behind that “rule”. Using a virtual machine as compute nodes means that a virtual machine will host virtual machines. This scenario is called nested virtualization and it is not ideal in terms of performance. This will no longer be totally true when moving to containers as containers are just processes so they might run inside a virtual machine. Anyhow, here we are talking about Openstack clusters where workloads are virtual machines.
Another reason is I/O performance as we go through virtual machine vNIC first, then physical server NIC.

All those considerations are true and the mantra “compute nodes on baremetal servers and dpdk if performance is needed” is absolutely valid for production environments!

Anyhow, if we only want to build a lab we might consider breaking the rules. This means not only using a virtual machine as compute node but also deploying a DPDK vRouter. Of course, performance will be bad but, from a functional point of view it might be very useful.

For example, we want to verify new features or get familiar with some tools. In this case, having a DPDK vrouter inside a “virtual compute node” is more than enough. We are not interested in performance. We only want a running DPDK vRouter!

So let’s see how this can be achieved.

As mentioned before, with DPDK, we have to use some specific NICs that are DPDK compatible. DPDK website contains a list of supported NICs.

Often, it is not a matter of supporting a specific NIC model but more of supporting a specific driver (e.g. ixgbe, i40i). If we lokk at dpdk documentation, we find support for VM emulated drivers https://doc.dpdk.org/guides/nics/e1000em.html . This means that a DPDK application (e.g. Contrail vRouter) might work inside a VM.

Those are the supported “vNICs” and drivers:

  • qemu-kvm emulated Intel® 82540EM Gigabit Ethernet Controller (qemu e1000 device)
  • VMware* emulated Intel® 82545EM Gigabit Ethernet Controller
  • VMware emulated Intel® 8274L Gigabit Ethernet Controller.

Among all the DPDK supported NICs, there is a subset that has also been tested and validated with Contrail. The official list of supported NICs (mapped to Contrail versions) is available here https://www.juniper.net/documentation/en_US/release-independent/contrail/topics/reference/contrail-nic-support-matrix.pdf .

If you look at that list the VM emulated drivers are not present. This does not necessarily mean DPDK vRouter will not work but that there was no official testing/validation process to qualify DPDK vRouter and VM emulated drivers. This should be no surprise; Juniper will put effort into validating physical NICs that will be mounted on physical servers that will be used as compute nodes in real deployments. From a business perspective, it does not make much sense to dedicate energies to this kind of validation (DPDK vRouter with VM emulated drivers).

Anyhow, nothing prevents us from trying to get our DPDK vRouter running inside a VM…and this is what we are going to do!

Here, i’m going to use a CentOS VM running CentOS 7.9:

[root@os-compute ~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

Next, we check NICs:

[root@os-compute ~]# lspci | grep thernet
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
00:03.1 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)

Intel 82540EM is DPDK supported as seen before.

Then, we want to be sure nested virtualization is enabled:

[root@os-compute ~]# cat /sys/module/kvm_intel/parameters/nested
N

“N” means no. In this case, populate the following file as follows:

[root@os-compute ~]# vi /etc/modprobe.d/kvm-nested.conf
options kvm-intel nested=1
options kvm-intel enable_shadow_vmcs=1
options kvm-intel enable_apicv=1
options kvm-intel ept=1

and run

modprobe -r kvm_intel
modprobe -a kvm_intel

which should lead to

[root@os-compute ~]# cat /sys/module/kvm_intel/parameters/nested
Y

Moreover, VT-x should be enabled and several cpu flags available:

[root@os-compute ~]# lscpu
...
Virtualization:        VT-x
Hypervisor vendor:     KVM
Virtualization type:   full
...
Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic tsc_deadline_timer aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept vpid

About flags. Some parts of the vRouter code have been compiled using certain sets of instructions. Those sets must be available within the VM. When creating the VM, I made sure the following flags were available:

vmx,+ssse3,+sse4_1,+sse4_2,+aes,+avx,+pat,+pclmulqdq,+rdtscp,+syscall,+tsc-deadline,+x2apic,+xsave";

Now, everything should be ready!

To install Contrail and deploy DPDK vROuter inside our “virtual compute” i’m gonna use the Ansible deployer which, personally, I consider the best deployer tool to build lab environments.

To learn something about Contrail Ansible deployer, please have a look here.

Inside the instances.yaml file, we define a DPDK compute node as follows:

  os-compute:
    provider: bms
    ip: 10.102.244.61
    roles:
      vrouter:
        VROUTER_GATEWAY: 192.168.200.1
        AGENT_MODE: dpdk
        CPU_CORE_MASK: "0x0c"
        DPDK_UIO_DRIVER: uio_pci_generic
        HUGE_PAGES: 2000
      openstack_compute:

Comparing it to a kernel vrouter definition, we have some additional elements:

  • AGENT MODE set to dpdk
  • CPU CORE MASK set to 0x0c. When translating the hexadecimal number into binary we get 0000 1100 whihc means assigning VM vcpus 2-3 to vrouter forwarding cores.
  • DPDK UIO DRIVER set to uio_pci_generic
  • HUGE PAGES set to 2000. As a result, the deployer will create 2000 huge pages (2MB size each)

Let’s run all the playbooks needed to install Contrail. If everything goes well, we should see no errors.

At this point, let’s connect to the compute node and verify the dpdk vrouter agent is up and running:

[root@os-compute ~]# docker ps | grep dpdk
fc1addf8dc7f        hub.juniper.net/contrail/contrail-vrouter-agent-dpdk:2008.121   "/entrypoint.sh /usr…"   42 hours ago        Up 42 hours                             vrouter_vrouter-agent-dpdk_1
[root@os-compute ~]# docker exec -it vrouter_vrouter-agent-dpdk_1 bash
(vrouter-agent-dpdk)[root@os-compute /]$ 

From within the container, we check dpdk binded NICs:

(vrouter-agent-dpdk)[root@os-compute /]$ /opt/contrail/bin/dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver
============================================
0000:00:03.1 '82540EM Gigabit Ethernet Controller' drv=uio_pci_generic unused=e1000

Network devices using kernel driver
===================================
0000:00:03.0 '82540EM Gigabit Ethernet Controller' if=eth0 drv=e1000 unused=uio_pci_generic *Active*

Other network devices
=====================
<none>
(vrouter-agent-dpdk)[root@os-compute /]$

Interface ens3f1 was binded and used as physical interface for vhost0.

This is confirmed by the fact ens3f1 is no longer visible to kernel:

[root@os-compute ~]# ifconfig ens3f1
ens3f1: error fetching interface information: Device not found

and by checking vhost0 configuration file:

[root@os-compute ~]# cat /etc/sysconfig/network-scripts/ifcfg-vhost0
DEVICE=vhost0
BOOTPROTO=static
ONBOOT=yes
USERCTL=yes
IPV6INIT=no
IPADDR=192.168.200.12
NETMASK=255.255.255.0
GATEWAY=192.168.200.1
DEFROUTE=no
TYPE=dpdk
NM_CONTROLLED=no
BIND_INT=0000:00:03.1

Notice the bind interface PCI address (0000:00:03.1); it is the same one highlighted by dpdk_nic_bind.py.

Last, as usual, let’s run contrail-status:

[root@os-compute ~]# contrail-status
Pod      Service      Original Name                Original Version  State    Id            Status
         rsyslogd                                  2008-121          running  cafa692f02e1  Up 4 hours
vrouter  agent        contrail-vrouter-agent       2008-121          running  0455abfba50c  Up 4 hours
vrouter  agent-dpdk   contrail-vrouter-agent-dpdk  2008-121          running  fc1addf8dc7f  Up 4 hours
vrouter  nodemgr      contrail-nodemgr             2008-121          running  bc61a05cf178  Up 4 hours
vrouter  provisioner  contrail-provisioner         2008-121          running  36df3ce80091  Up 4 hours

WARNING: container with original name '' have Pod or Service empty. Pod: '' / Service: 'rsyslogd'. Please pass NODE_TYPE with pod name to container's env

vrouter DPDK module is PRESENT
== Contrail vrouter ==
nodemgr: active
agent: active

There it is! Our “virtual DPDK compute node” up and running!

Again, this is not a recommended/supported deployment. Use with care and for lab environments only!

This is a learning/education solution to get familiar with DPDK and its ecosystem.

To know more about production ready DPDK best practices,please have a look here (does not include the latest improvements).

Ciao
IoSonoUmberto