Lambdatask and Monitors in C++: Combining Advanced Language Features for a Simple Interface

Austin Hicks

2014-07-17 20:00

Lambdatask is a project that I was hoping to put off for a while. C++11 provides many useful thread primitives, but has major shortcomings. The most notable of these is the lack of concurrent and threadsafe data structures in the standard library. I'm not sure who thought it'd be a good idea not to seriously consider at least making std::queue threadsafe, but it's not and we get to live with that.

Unfortunately, among a bunch of features that Libaudioverse needs, threadsafe queues is one of them. More unfortunately, it needs them now so I can implement callbacks properly. I have identified three reusable things that it would be incredibly useful to have, the first two of which are surprisingly simple: a monitor, a threadsafe queue, and a library for the automatic parallelization of tasks with dependencies on other tasks. These libraries exist in various forms, but my research failed to turn up any that use only C++11 features. A library using only C++11 features will run on the 5 major platforms, and more are being added as we speak.

It is monitors that I want to talk about today. The code in lambdatask marks my transition from intermediate to advanced C++ developer. This alone isn't worth a blog post, but how the Monitor<T> works in Lambdatask is. The code here took me about 3 hours to work out, and I can thank here for the idea and this Stack Overflow question for the last piece of the puzzle.

What's a Monitor and What Does It Look Like in Lambdatask?

An object is a monitor if it is guaranteed that only one thread at a time is allowed to be executing code inside it. A class which is said to be a monitor will conceptually have an internal lock which is acquired at the beginning of every function and released at the end. Things like `std::lock_guard make this easy enough, even if we exit because of an exception.

What Lambdatask's monitor does is quite simple. std::vector<int> is the type of the standard C++ vector, which is only guaranteed to be safe for concurrent reading. lambdatask::Monitor<std::vector<int>> exposes an interface much like that of std::shared_ptr<std::vector<int>>, but only one thread is allowed inside at a time. Here's an example of something that is safe with a Lambdatask monitor but not a regular std::vector.

lambdatask::Monitor<std::vector<int>> mon;
//We have no guarantee on order of arrival.
//But we do have a guarantee that they will all arrive.
#pragma omp parallel for
for(int i = 0; i < 1000; i++) mon->push_back(i);

(Yes, I'm cheating. I don't want to type out an entire example of std::thread, so you get an OpenMP pragma instead)

Which is safe only by virtue of the lambdatask monitor. The astute C++ programmer who doesn't know all the things I'm about to talk about is going to wonder how implementing this is even possible, and in truth there are two shortcomings I have not yet fixed (I have ideas, though). You cannot access overloaded operators on whatever the monitor wraps and any object returned by whatever the monitor wraps is not itself a monitor. This will make STL containers threadsafe, but it will not make STL iterators threadsafe. The first may be fixable; the second one is not something that can be done without an addition to the interface.

So let's talk about how this kind of thing is even possible. Here's a direct link to the source. I suggest following along there: the features described herein are hard to contrive short examples for.

So How Is It Possible?

Time for C++ features that you probably never want to use. Or at least, you don't unless you want to do exactly what I did.

Some Interesting Facts About `operator->`

The first key has to do with operator->. Most people who haven't worked with C++11 smart pointers don't know that it can be overloaded, but the fact I'm going to introduce here is even more obscure. I didn't know about it until yesterday.

If operator-> returns a pointer to class, the compiler will do the normal method lookup, exactly as if foo->bar() were (*foo).bar(). The only difference is that it calls your function to get a pointer to do this lookup on.

But if this is instead a class, the compiler recursively applies operator->. Which may again return a class. And again, for as many times as you want. There can be side effects and everything that's allowed in a normal function here, i.e. you might log the calling thread or create a temporary or wait on a lock for 5 minutes. So long as it's a class that's returned, the compiler will keep looking for more operator-> overloads.

Some Guarantees About Destructors and Temporaries

When you execute an expression that involves temporaries, there is a guarantee that the standard makes. All temporaries will live until the end of the outermost expression, whereupon they will have their destructors called. I did not dig far enough to find out exactly where this boundary is, but it is definitely far enough that the function that operator-> is trying to call will get called as part of it and not so far that we end up holding locks for unreasonable amounts of time.

This means that we can overload operator->, return a temporary, and expect the temporary to live until after the function that gets called by the original expression. The key to the implementation is that the above features of operator-> together with these guarantees provide a way to almost magically inject code: a helper class locks a lock in its constructor and unlocks the lock in its destructor. The lock shall be held for at least as long as the function call. It is technically possible for these locks to be held a bit longer in complex expressions, but the cases where this matters are rare. It is not common to have an expression that calls 5 or 10 things that want to block for a few milliseconds each, and the convenience of this interface far outweighs that particular bad programming practice (no, seriously. If you're doing that, I despair for the general debuggability and comprehensibility of your code).

We Can Nest Classes Inside Classes Even When Outer Classes Are Templates

the above is sufficient to implement a monitor that can wrap exactly one type, but that's not useful in the general case. The last piece of this puzzle is that, if you nest a class inside a class and the outer class is a template, you "know" about the outer class's private members and template parameters.

the latter is actually profound: by moving your helper class inside of another class that happens to be a template, you get the ability to treat it like it's not and have it "always do the right thing". In Lambdatask's monitor, there is a helper class as a nested class of Monitor<T>: LockedMonitor. The complete specification of the type would be lambdatask::Monitor<T>::LockedMonitor, so different values of T produce different types of LockedMonitor. But LockedMonitor is inn scope for the body of Monitor<T>, so we can just use it as LockedMonitor in the implementation. LockedMonitor is still allowed to say T*, in which case we are referring to the monitor's template parameter.

The High-Level Overview

So I've just deluged everyone reading this with some really obscure C++ that assumes a lot of background knowledge not included here. To that end, let me list the steps that happen when you use the overloaded operator-> of a Monitor<T>.

First, the compiler sees that we're going to return a LockedMonitor instead of a pointer, so it applies two operator-> calls into one: the one on our class and the one on the LockedMonitor. This gives the return type of the expression, a T*, so the compiler can do the usual compile-time checks. There are no tricks here, and everything is still 100% type-safe.

Next, we get to runtime and the first Operator-> gets called. The number of temporaries that can be created here is constrained by the fact that I explicitly deleted everything that allows moving a LockedMonitor, so the compiler is limited to at most one. The compiler is limited to at least one by common sense: if it chose not to create the temporary, it wouldn't be able to call the second operator->. LockedMonitor's constructor acquires the lock and ensures that it will be deleted at destruction through std::unique_lock. This temporary gets allocated somewhere and queued up to be deleted at the end of the current expression. The particular constructor invocation in question will block until all other threads inside the monitor exit, whereupon it finally grabs the mutex.

The method you wanted to call gets called next. There is nothing special here.

And then the lock gets released. While it is true that LockedMonitor has a trivial default destructor, the std::unique_lock member variable doesn't. The LockedMonitor goes away, immediately followed by its member variables. When std::unique_lock vanishes, the mutex is unlocked and other threads may again enter.

This isn't the most performant thing in all the world, but it's surprisingly powerful. So long as you don't need to use overloaded operators on some type T, you can now instantly make a threadsafe version. At the moment this is only default constructible classes, but that's merely for lack of time: a quick bit of variatic templates (which I still consider insanely complicated) can fix that up nicely.

As for performance, well, there is an extra temporary. It is possible that locks will get held a bit longer than expected. But I've got a test that makes a configurable number of threads and a counter, wraps the counter in a monitor, and makes the counter count to a configurable number. It usually finishes the fast test before I can blink. There is unfortunately a slower one that runs after that one that makes sure it really works. The slow one takes several minutes because it inserts sleeps everywhere, but such is the price of being confident in code intended to provide thread safety.