Not in Love with the GNU Radio Scheduler (Part 1)

28 May 2016 GNU Radio / Scheduler

While I usually try to avoid digging into the GNU Radio scheduler, there’s one problem that buggs me again and again: run to completion all too often does not run to completion but causes the flow graph to hang forever, waiting for someone to CTRL-C. This is especially annoying with unit tests, simulations, and, generally, with flow graphs that are intended to run without user interaction.

The problem with run to completion flow graphs was also reported in GNU Radio Issue #797. This post is supposed to provide some context and a more verbose description of the pull request I made several weeks ago.

With the introduction of asynchronous messages, the scheduler became much more involved. The current implementation is full of reduntant code, (throttled) busy waiting, goto instructions, and synchronization uglynesses. While it could clearly need some cleanup, the code is at the core of GNU Radio, its highly complex, and it seems like nobody wants to touch it.

System Handler

One issue that came up several time is shutting down of flow graphs. At some point, a system port (or system handler) was introduced to simplify the process. The idea is that each block has a system message port, hidden in GNU Radio Companion, that can be used to signal termination. If a special message is received through the handler, it sets the d_finished variable through:

void
block::system_handler(pmt::pmt_t msg)
{
  //std::cout << "system_handler " << msg << "\n";
  pmt::pmt_t op = pmt::car(msg);
  if(pmt::eqv(op, pmt::mp("done"))){
      d_finished = pmt::to_long(pmt::cdr(msg));
      global_block_registry.notify_blk(alias());
  } else {
      std::cout << "WARNING: bad message op on system port!\n";
      pmt::print(msg);
  }
}

The first strange thing is that it expects a long, only to cast it to a boolean. I guess this would better be a boolean, or in this case a pmt::PMT_T, so I changed it in the pull request. d_finish is also a boolean, so why confuse people here.

The second thing is that it calls notify_blk and, thus, notifies itself that someting interesting happened. It does that by signaling input and output condition variables. If the block would be sleeping, while blocked on input or output this would make sense, but since it’s currently processing a message that’s basically a void and, therefore, deleted in the pull request. (Maybe this made sense in a prior version when the system handler was a public function that could be called by another block, i.e., from another thread.)

OK, so the whole story was about setting d_finished.

block::d_finished

Despite its name, d_finished doesn’t mean that the block is finished, i.e., stops processing samples. Quite the contrary, the value of d_finished is always ignored, except for blocks that don’t have any stream ports, i.e., are pure message blocks. In this particular case d_finished is an indicator to shutdown the block. This is implemented through:

bool
block::finished()
{
  if((detail()->ninputs() != 0) || (detail()->noutputs() != 0))
    return false;
  else
    return d_finished;
}

This has some unfortunate implications, as blocks with at least one stream port cannot be shutdown through the message port. In Issue #797, Paul Graver linked a minimal example that I created to highlight the problem. The flow graph looks like this:

The vector source outputs 20 complex samples, which are subsequently split in two PDUs á 10 samples, converted back to a stream, and dropped. For me it was unexpected that this, rather simple flow graph, cannot be shut down. The problem is the PDU to Tagged Stream block, which ignores the done message. It could only be terminated through the stream port, which is, however, not possible in this scenario.

While there is only this single block in GNU Radio, such flow graphs are frequently used in simulations and unit tests. The GNU Radio unit tests, for example, use some strange workarounds:

self.tb.start()

# post the message
src.to_basic_block()._post(port, msg) # eww, what's that smell?

while dbg.num_messages() < 1:
    time.sleep(0.1)
self.tb.stop()
self.tb.wait()

In my opinion, it would be a good idea to allow shutting down a block through the system port, if it has only message inputs and if it does not produce anymore. That means its work function was called, but it didn’t produce any samples even though there was space available in its output buffers. And this is exactly what the pull request is supposed to implement through

if(block->finished() && s == block_executor::READY_NO_OUTPUT) {
    s = block_executor::DONE;
    d->set_done(true);
}

Waiting for READY_NO_OUTPUT asserts that the remaining samples are processed also if the shutdown message was already produced. The rest of the pull request is only whitespace and using the possibility to make the unit tests nicer. The code above, for example, is then

src.to_basic_block()._post(port, msg)
src.to_basic_block()._post(pmt.intern("system"), pmt.cons(pmt.intern("done"), pmt.PMT_T))

self.tb.start()
self.tb.wait()

There are some more things, which, I think, should be addressed, but I will write about them in a future post.