lauantai 13. toukokuuta 2017

Emulating the Rust borrow checker with C++ move-only types

Perhaps the greatest thing to come out of C++11 is the notion of move semantics. Originally it was designed to make it efficient to return big objects like matrices from functions. In move operations the destination "steals the guts" of the source object rather than making a copy of it. Usually this means taking hold of some pointer to a reserved memory block and assigning the source's pointer to nullptr.

This mechanism is not reserved to pointers and can be used for any data type. As an example let's write a move-only integer class. A typical use for this would be to store file descriptors to ensure that they are closed. The code is straightforward, let's go through it line by line:

class MoveOnlyInt final {
    int i;

    void release() { /* call to release function here */ }

This is basic boilerplate. The release function would typically be close, but can be anything.

    explicit MoveOnlyInt(int i) : i(i) {}
    ~MoveOnlyInt() { release(); }

We can construct this from a regular integer and destroying this object means calling the release function. Simple. Now to the actual meat and bones.

    MoveOnlyInt() = delete;
    MoveOnlyInt(const MoveOnlyInt &) = delete;
    MoveOnlyInt& operator=(const MoveOnlyInt &) = delete;

These declarations specify that this object can not be copied (which is what C++ does by default). Thus creating two objects that hold the same underlying integer is not possible unless you explicitly create two objects with the same bare integer.

    MoveOnlyInt(MoveOnlyInt &&other) { i = other.i; other.i = -1; }
    MoveOnlyInt& operator=(MoveOnlyInt &&other) { release(); i = other.i; other.i = -1; return *this; }

Here we define the move operators. A move means releasing the currently held integer (if it exists) and grabbing the source's integer instead. This is all we need to have a move only type. The last thing to add is a small helper function.

    operator int() const { return i; }

This means that this object is convertible to a plain integer. In practice this means that if you have a function like this:

void do_something(int x);

then you can call it like this:

MoveOnlyInt x(42);

You could achieve the same by creating a get() function that returns the underlying integer but this is nicer and more usable. This is not quite as useful for integers but extremely nice when using plain C libraries with opaque structs. Then your move only wrapper class can be used directly when calling plain C functions.

What does this allow us to do?

All sorts of things, the object behaves much in the same way as Rust's movable types (but is not 100% identical). You can for example return it from a function, which transfers ownership in a compiler enforced way:

MoveOnlyInt returnObject() {
    MoveOnlyInt retval(42);
    return retval;

If you try to pass an object as an argument like this:

int byValue(MoveOnlyInt mo);

a regular class would get copied but for a move-only type you get a compiler error:

./prog.cpp:39:13: error: call to deleted constructor of 'MoveOnlyInt'
../prog.cpp:13:5: note: 'MoveOnlyInt' has been explicitly marked deleted here
    MoveOnlyInt(const MoveOnlyInt &) = delete;
../prog.cpp:22:28: note: passing argument to parameter here
int byValue(MoveOnlyInt mo) {

Instead you have explicitly tell the compiler to move the object:


Some of you might have spotted a potential issue. Since a MoveOnly object can be converted to an int and there is a constructor that takes an int, that could create two objects for the same underlying integer. Like this:

MoveOnlyInt mo(42);
MoveOnlyInt other(mo);

The compiler output looks like the following:

../prog.cpp:37:14: error: call to deleted constructor of 'MoveOnlyInt'
    MoveOnlyInt other(mo);
             ^     ~~
../prog.cpp:13:5: note: 'MoveOnlyInt' has been explicitly marked deleted here
    MoveOnlyInt(const MoveOnlyInt &) = delete;

The compiler prevents creating invalid objects.

The wrapper object has zero memory overhead compared to a plain int and code generation changes are minimal. The interested reader is encouraged to play around with the compiler explorer to learn what those are.

Is this just as good as native Rust then?

No. Rust's borrow checker does more and is stricter. For example in C++ you can use a moved-from object, which may yield a null pointer dereference if your underlying type was a pointer. You won't get a use-after-free error, though. On the other hand this class is 18 lines of code that can be applied to any existing C++ code base immediately whereas Rust is a whole new programming language, framework and ecosystem.

Anyone is also free to do the wrong thing and take a copy of the integer value without telling anyone but this issue remains in every other language as well, even in unsafe Rust.

2 kommenttia:

  1. Hi,
    thanks for the article!

    As a sidenote, it is a good idea to mark your `move` operators with `noexpect`. See [] for details.

  2. Sure, but it was left out for clarity as it has no direct bearing on the issue at hand.