Tuesday, November 8, 2016

Building code faster and why recursive Make is so slow

One of the most common reactions to Meson we have gotten has been that build time is dominated by the compiler, thus trying to make the build system faster is pointless. After all, Make just launches subproject processes and waits for them to finish, so there's nothing to be gained.

This is a reasonable assumption but also false. Even if we ignore the most obvious slow parts, such as Configure that can take up to 15 minutes to run on an embedded device to Libtool, which slows everything down for no good reason, there are still gains to be had.

Meson uses Ninja as its main backend. One of the (many) cool things about it is that it is being actively developed and maintained. A recent addition is the Ninja tracer framework. It takes the output of Ninja's log file and converts it to the timing format understood by Chrome's developer tools. It creates output that looks like this (thanks to Nirbheek for gathering the data).


This is a trace of building all of GStreamer using their new Meson based aggregate repo called gst-build. This build has been done using six parallel processes making up the six horizontal lanes in the diagram. Each colored square represents one build task such as compiling or linking with time increasing from left to right. This picture shows us both why Meson is so efficient and also why Make based builds are slow (though the latter requires some analysis).

The main reason for speed is that Ninja keeps all processor cores working all the time. It can do that because it has a global view of the problem. As an example if we build two targets A and B and B links to A, Ninja can start compiling the source code of B before it has finished linking A. This allows it to keep all cores pegged.

The perceptive reader might have noticed the blank space at the left end of the diagram. During that time only one core is doing anything, all others are idle. The process that is running is the GIR tool processing the gstreamer-1.0 library, which all other subprojects depend on. In theory we should be able to run compiles on other projects but for reliability reasons Meson assumes that some source might include the output of the tool as a header so it does not start compilations until the file is generated. This may seem weird, but there are projects that do these kinds of things and require that they work.

The white gap is also what causes Make to be so slow. Most projects write recursive makefiles because maintaining a non-recursive makefile for even a moderate sized project is a nightmare. When building recursively Make goes into each subdirectory in turn, builds it to completion and only then goes to the next one. The last step in almost every case is linking, which can only be done using one process. If we assume that linking takes one second, then for a project that has 20 subdirectories that adds up to 20 wasted seconds of wall time. For an n core machine it corresponds to (n-1)*(num_directories) seconds of CPU time. Those things add up pretty quickly. In addition to Make, the same issue crops up in Visual Studio project files and pretty much any build system that does not have a global view of the dependency tree.

Those interested in going through the original data can do so. Just open this link with either Chromium or Chrome and the chart will be displayed in devtools. Unfortunately this won't work with Firefox.

Bonus challenge

The output format that Chrome's dev tools understands is straightforward. It would be interesting if someone modified Make to output the same format and ran it on a moderate sized project using Make. GStreamer is an obvious candidate as it has both Meson and Autotools setups in master, though gst-build only works with Meson.

Wednesday, November 2, 2016

Exposing a C++ library with a stable plain C API

There has been a lot of talk recently about using Rust to create shared libraries that have a plain C ABI. This allows you to transition libraries piecewise to Rust. This is seen as a contrast to C++ which has an unstable ABI and thus is hard to maintain and to integrate with other programs and libraries (Python, Ruby etc) that only work with plain C calling conventions.

However just like you can export Rust code with a C ABI by writing the required boilerplate, the same can be done in C++. Since C++ has native support for C, this is very simple. As an example a class that looks like this (lot of stuff omitted for clarity):

class Foo {
public:
    Foo();
    ~Foo();

    void poke(int x);
};

becomes this:

struct CFoo;

struct CFoo* foo_create();
void foo_destroy(struct CFoo *f);
void foo_poke(struct CFoo *f, int x, char **error);

The implementation for each of these just casts CFoo* into Foo* and then calls the corresponding method. The last function handles errors roughly in the same way as GLib's GError. In case of an error the error message is put in the error parameter and it is the responsibility of the caller to check it. For simplicity this is a string, but it could be an error object as well.

This is straightforward but the big problem comes in reporting errors. Writing code to convert exceptions to error messages for each and every function is tedious and error-prone. Fortunately there is a simple solution called an hourglass shaped interface using a Lippincott function. With it the implementation of foo_poke looks like this:

void convert_exception(char **error) noexcept {
    char *msg;
    try {
        throw; // The magic happens here
    } catch(const std::exception &e) {
        msg = strdup(e.what());
    } catch(...) {
        msg = strdup("An unknown error happened.");
    }
    *error = msg;
}

void foo_poke(struct CFoo* f, int x, char **error) {
    try {
        reinterpret_cast<Foo*>(f)->poke(x);
    } catch(...) {
        convert_exception(error);
    }
}

Here we have moved all error handling away from the wrapper into a standalone helper function. All interface functions have now been reduced into just calling to the real function and, in case of errors, calling the converter. The reason this works is that C++ stores the exception that is currently in flight and allows it to be rethrown. You can think of it as an invisible argument to the converter function.

This simple construct allows us to expose C++ libraries with a plain C ABI with all the stability guarantees and interoperability goodness that entails. A full sample project can be found here. It also contains a C++ "unwrapper" on top of the plain C API that makes exceptions travel transparently over the interface (that is, using the exact same ABI).