I recently started at the Secure Mobile Network Lab at TU Darmstadt, where I work on Software Defined Wireless Networks. For the time, I look into real-time data processing on normal PCs (aka the GNU Radio runtime). I’m really happy for the chance to work in this area, since it’s something that interests me for years. And, now, I have some time to have a close look at the topic.

In the last weeks, I experimented quite a lot with the GNU Radio runtime and new tools that helped me to get some data out of it. Since I didn’t find a lot of information, I wanted to start a series of blog post about the more technical bits. (A more verbose description of the setup is available in [1]. The code and scripts are also on GitHub.)

Setting up the Environment

The first question was how to setup an environment that allows me to conduct reproducible performance measurements. (While this was my main goal, I think most information in this post can also be useful to get a more stable and deterministic throughput of your flow graph.) Usually, GNU Radio runs on a normal operating system like Linux that’s doing a lot of stuff in the background. So things like file synchronization, search indexes that are updated, cron jobs, etc. might interfere with measurements. Ideally, I want to measure the performance of GNU Radio and nothing else.

CPU Sets

The first thing that I had to get rid of was interference from other processes that are running on the same system. Fortunately, Linux comes with CPU sets, which allow to partition the CPUs and migrate all process to a system set, leaving the other CPUs exclusively for GNU Radio. Also new processes will, by default, end up in the system set.

On my laptop, I have a 8 CPUs (4 cores with hyper-threads) and wanted to dedicate 2 cores with their hyper-threads to GNU Radio. Initially, I assumed that CPUs 0-3 would correspond to the first two cores and 4-7 to the others. As it turns out, this is not the case. The lstopo command (on Ubuntu part of the hwloc package) gives a nice overview:

lstopo output

As we can see from the figure, it’s actually CPU 2,3,6,7 that correspond to core 2 and 3. To create a CPU set sdr with CPUs 2,3,6, 7, I ran:

sudo cset shield --sysset=system --userset=sdr --cpu=2,6,3,7 --kthread=on

sudo chown -R root:basti /sys/fs/cgroup/cpuset
sudo chmod -R g+rwx /sys/fs/cgroup/cpuset

The latter two commands allow my user to start process in the SDR CPU set. The kthread options tries to also migrate kernel threads to the system CPU set. This is not possible for all kernel threads, since some have CPU-specific tasks, but it’s the best we can do.

Starting a GNU Radio flow graph in the SDR CPU set can be done with:

cset shield --userset=sdr --exec -- ./run_flowgraph

(As a side note: Linux has a kernel parameter isolcpus that allows to isolate CPUs from the Linux scheduler. I also tried this approach, but as it turns out, it really means what it says, i.e., these cores are excluded from the scheduler. When I started GNU Radio with an affinity mask of the whole set, it always ended up on a single core. Without a scheduler, there are not task migrations to other cores, which renders this approach useless.)

IRQ Affinity

Another issue are interrupts. If, for example, the GPU or network interface constantly interrupts the CPU where GNU Radio is running, we get lots of jitter and the throughput of the flow graph might vary significantly over time. Fortunately, many interrupts are programmable, which means that they can be assigned to (a set of) CPUs. This is called interrupt affinity and can be adjusted through the proc file system.

With watch -n 1 cat /proc/interrupts it’s possible to monitor the interrupt counts per CPU:

            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         12          0          0          0          0          0          0          0  IR-IO-APIC    2-edge      timer
  1:        625          0          0          0         11          0          0          0  IR-IO-APIC    1-edge      i8042
  8:          0          0          0          0          0          1          0          0  IR-IO-APIC    8-edge      rtc0
  9:     103378      43143          0          0          0          0          0          0  IR-IO-APIC    9-fasteoi   acpi
 12:      17737          0          0        533         67          0          0          0  IR-IO-APIC   12-edge      i8042
 14:          0          0          0          0          0          0          0          0  IR-IO-APIC   14-fasteoi   INT34BB:00
 16:          0          0          0          0          0          0          0          0  IR-IO-APIC   16-fasteoi   i801_smbus, i2c_designware.0, idma64.0, mmc0
 31:          0          0          0          0          0          0          0     100000  IR-IO-APIC   31-fasteoi   tpm0
120:          0          0          0          0          0          0          0          0  DMAR-MSI      0-edge      dmar0
[...]

In my case, I wanted to exclude the CPUs from the SDR CPU set from as many interrupts as possible. So I tried to set a mask of 0x33 for all interrupts. In binary, this corresponds to 0b00110011 and selects CPU 0,1,4,5 of the system CPU set.

for i in $(ls /proc/irq/*/smp_affinity)
do
    echo  33 | sudo tee $i
done

Another potential pitfall is the irqbalance daemon, which might work against us by reassigning the interrupts to the CPU that we want to use for signal processing. I, therefore, disabled the service during the measurements.

sudo systemctl stop irqbalance.service

CPU Governors

Finally, there is CPU frequency scaling, i.e., individual cores might adapt the frequencies based on the load of the system. While this shouldn’t be an issue if the system is fully loaded, it might make a difference for bursty loads. In my case, I mainly wanted to avoid initial transients and, therefore, set the CPU governor to performance, which should minimize frequency scaling.

#!/bin/bash

set -e

if [ "$#" -lt "1" ]
then
    GOV=performance
else
    GOV=$1
fi

CORES=$(getconf _NPROCESSORS_ONLN)
i=0

echo "New CPU governor: ${GOV}"

while [ "$i" -lt "$CORES" ]
do
    sudo cpufreq-set -c $i -g $GOV
    i=$(( $i+1 ))
done

I hope this was somewhat helpful. In the next post will do some performance measurements.

  1. Bastian Bloessl, Marcus Müller and Matthias Hollick, “Benchmarking and Profiling the GNU Radio Scheduler,” Proceedings of 9th GNU Radio Conference (GRCon 2019), Huntsville, AL, September 2019. [BibTeX, PDF and Details…]



I just ported my GNU Radio out-of-tree modules to GNU Radio 3.8 and adopted the new proposed development scheme. That means:

  • Legacy GNU Radio 3.7 support is provided through the maint-3.7 branch.
  • GNU Radio 3.8 support is provided through the maint-3.8 branch.
  • Future API-breaking changes towards GNU Radio 3.9 will be in the master branch. For now, master and maint-3.8 are similar.

The modules need some testing. So feel free to try them (and complain :-)):

Finally, some screenshots of the new flow graphs. (I still have to get used to these bezier connections…)


I’m currently looking into WLAN physical layer simulation models of network simulators to figure out how good they are doing. While studying the ns-3 implementation of the PHY error rate model, I noticed that they are using the number of coded bits to calculate the frame error rate. This seams weird, since all papers on the topic consider the number of data bits.

The bug just got confirmed and the issue should be fixed soon.

It means that for the last eight years or so, ns-3 used the wrong number of bits to calculate the error rate for a WLAN transmission. A 500 byte BPSK-1/2 frame, for example, was treated as a 1000 byte frame.

Fortunately, the impact is not too big. It is the shown by the figure below. The bug causes the error curves to shift to the next higher one (i.e., 100 byte → 200 byte, 200 byte → 400 byte etc.).

Impact of bug.

I also recently published a more comprehensible paper about the Veins PHY implementation.

  • Bastian Bloessl and Aisling O’Driscoll, “A Case for Good Defaults: Pitfalls in VANET Physical Layer Simulations,” Proceedings of IFIP Wireless Days Conference 2019, Manchester, UK, April 2019. [BibTeX, PDF and Details…]


I just reviewed a paper that claimed that reactive jamming of ZigBee would only be possible with sophisticated devices that cost over $3000. There was, however, already a paper in 2011, which implemented a reactive ZigBee jammer by modifying the FPGA of a USRP2 SDR [1]. Furthermore, it was shown that low-cost CC2420-based nodes can be turned into a reactive jammer [2]. Since I recently bought a $40 ATUSB ZigBee USB dongle from Sysmocom, I wanted to give it a try with this device.

ATUSB

(The whole idea is very similar to my blog post about jamming WiFi ACKs with WLAN cards, which was based on Mathy Vanhoef’s work, who modified the Atheros ath9k_htc firmware [3].)

read more

Some weeks ago I attended FOSDEM and was lucky enough to get a free PlutoSDR from Analog Devices. Yay! It’s mainly an SDR, but it’s also the first FPGA board that I got my hands on. Having the board, the motivation to learn more about FPGAs reached a local maximum. However, being an FPGA noob, I found the free-to-use development framework from Xilinx not very educational. I felt like there are many levels of abstraction between me clicking around in the GUI and the actual hardware.

Me

However, I also knew that there was something going on around Clifford Wolf, who is working on Open Source FPGA stuff. So it was a good occasion to catch-up with his work. Long story short, he built a complete Open Source workflow for Lattice iCE40 FPGAs (and others seem to be work in progress). I bet some years ago most people wouldn’t have believed that this was possible and that we have to deal with vendor software for the rest of our days.

While iCE40 FPGAs are not the most capable FPGAs, they are cheap and – together with the Open Source tools – the ideal learning platform. If you are interested in the topic, I’d recommend Clifford’s recent talk at 35C3 about the state of the project. I had to try this out, so I bought a cheap iCEstick for around 25EUR. (I didn’t get the memo that the cool kids are now using the iCEBreaker board.)

iCEstick

I had no idea about FPGAs, but I found an excellent tutorial on GitHub. It is in Spanish but Google Translate can jump in here.

The general workflow of IceStorm, Clifford’s Open Source tools, is to write your design in Verilog and

  • synthesize it with Yosys

  • place and route with NextPNR

  • create the bitstream with icepack

  • copy the bitstream on the device with iceprog

    …no worries, a Makefile has you covered.

read more

I got a paper accepted at Wireless Days 2019! The paper is about the physical layer simulation model of a popular network simulation framework for vehicular networks.

It shows that (1) the default parameters used in the simulator are highly unrealistic and (2) adopted by many users.

  • Bastian Bloessl and Aisling O’Driscoll, “A Case for Good Defaults: Pitfalls in VANET Physical Layer Simulations,” Proceedings of IFIP Wireless Days Conference 2019, Manchester, UK, April 2019. [BibTeX, PDF and Details…]