torstai 15. kesäkuuta 2017

Of XML, GUIDs and installers

Every now and then I have to use Windows. I'd really like to use Emacs for editing so I need to install it. Unfortunately the official releases from the FSF are zip archives that you need to manually unpack, add shortcuts and all that stuff. This gets tedious and it would be nice if there were MSI installer packages to do all that for you. So I figured I'd create some.

Conceptually what the installer needs to do is the following:

  1. Create the c:\Program Files\GNU Emacs directory
  2. Unzip contents of the official release zip file in said directory
Seems simple, right? There are two main tools for creating installers. The first one is NSIS, but it only creates exe installers, not msi. It also has a scripting language designed by a shaman who has licked a few frogs too many. So let's ignore that.

The other tool is WiX. It creates nice installers but it is Enterprise. Very, Very Enterprise. And XML. But mostly Enterprise. For starters the installer needs to have a GUID (basically a 128 bit random number). It also needs an "upgrade GUID". And a "package GUID". The first two must be the same over all installer versions but the latter must not be.

Having conjured the necessary amount of GUIDs (but not too many), you are ready to tackle copying files around. As you probably guessed, each one of them needs its own GUID. But if you though that each one would require an XML element of their own, you are sadly mistaken. Every file needs two nested XML elements. The files also need a container. And a container-container.

Did I mention that the documentation consists almost entirely of XML fragments? So it is all but impossible to tell which tag should follow which and which ones should be nested?

The Emacs distribution has a lot of files, so obviously writing the XML by hand is out of the question. Unfortunately the WiX toolkit ships a helper utility called heat to generate the XML from a directory tree. Yes, I really did say unfortunately. While the script pretends to work in reality it does not. Following the documentation you might try doing something like this (simplified for purposes of exposition):

heat <directory with unpacked files> <output xml file> other_flags

This does create an installer which installs all the files. But it also does a little bit extra. If your unpack directory was called unpack, then the files will be installed to c:\Program Files\GNU Emacs\unpack. How might we get rid of that extra directory segment? The correct answer is that the system is optimized for in-source Visual Studio builds and trying to do anything else is doomed to fail. But let's try it anyway.

First one might look for a command line switch like -P1 for patch to discard the first path segment. It does not seem to exist.

Next one might try to be clever and cd inside the unpack dir and do this (again simplified):

heat . <output xml file> other_flags

The system reports no error but the output is identical to the previous attempt. The script helpfully notices that you are using a period for the directory name and will do a lookup in the parent directory to see what it would be called and substitutes it in. Because!

Since there are only a few directories in the top level one might try something along the lines of:

heat dir1 dir2 dir3 dir4 <output xml file> other args

Which again succeeds without error. However the output installer only has files from dir1. The tool parses all the input and then dutifully throws away all entries but the first without so much as a warning.

The way to make this work is to generate the XML files and then monkey patch all paths in them before passing them on to the installer generator. For That Is the Enterprise Way! Just be glad there are no files at the root directory.

Join us next time to learn how a randomly generated daddy GUID comes together with a randomly generated mommy GUID to produce a new entry in the start menu.

Is this available somewhere?

The script is here. There are no downloadable binaries because I don't have the resources to maintain those. It would be cool if someone (possibly even the FSF) would provide them.

keskiviikko 7. kesäkuuta 2017

Optimizing code: even the simplest things are unbelievably complex

In the previous post we looked at optimizing this simple function.

uint64_t result = 0;
for(size_t i=0; i<bufsize; i++) {
  if(buf[i] >= 128) {
    result += buf[i];
  }
}

We shall now do more measurements with real world hardware and compilers. The algorithms we use are the following:

  • simple: the above loop as-is
  • lookup: create a lookup table where entries less than 128 have value zero and the rest have the same value as the index
  • bit fiddling: convert the if into a branchless bitmask operation
  • partition: run std::partition on the data and only add the first half
  • zeroing: go over the data and set values not matching to zero, then add all
  • bucket: keep an array of 255 entries and count the number of times each number appears
  • multiply: convert if to a multiplication by 0 or 1, then add
  • parallel add: add several chars in a single packed 64 bit addition
Those interested in the actual implementations should look it up in the repo.

The hardware used is the following:

  • Raspberry Pi, Raspbian, GCC 4.9.2, Clang 3.5.0
  • Ubuntu zesty, GCC 6.3.0, Clang 4.0.0
  • Macbook Pro i5, XCode 8
  • Windows 10, Visual Studio 2017, run in VirtualBox
The test suite runs all available compilers with a selection of optimization types, CPU features (SSE, AVX, Neon etc) and measures the times taken.

The results


Let's start by looking at the simplest build setup.

This seems quite reasonable. Parallel addition is the fastest, others are roughly as fast and the two algoritms that reorder the input array are the slowest. For comparison Raspberry Pi looks like this:
Everything is much flatter as one would expect. Since everything is going smoothly, let's look at the first measurement again, except this time we sort the input data before evaluating. One would expect that the simple loop becomes faster because the branch predictor has an easier task, partitioning becomes faster and nothing becomes noticeably slower.
Well ... ummm ... one out of three ain't bad, I guess. At this point I should probably confess that I don't have a proper grasp on why these things are happening. Any speculation to follow might be completely wrong. The reason bucket slows down is the easier of these two to explain. Since the input is sorted, consecutive iterations of the loop attempt to write to the same memory address, which leads to contention. When the data was random, each iteration wrote to a random location which leads to fewer collisions.

The reason why the simple loop does not get faster may be caused by the processor evaluating both branches of the if clause in any case and thus having better branch prediction does not matter. On the other hand Visual Studio does this:

Bucket is slower for sorted as above, but the simple loop is an order of magnitude slower on unsorted data. Ideas on what could be the cause of that are welcome.

What is the fastest combination?

The fastest combination for each hardware platform is the following.
  • Ubuntu: bit fiddle, g++, release build, -msse, unsorted
  • Raspi: bit fiddle, g++, release build, -mfpu=neon, sorted
  • OSX: simple loop, Clang++, debugoptimized build, -msse4.2, sorted
  • VS2017: lut, debugoptimized build, unsorted
This is basically random. There does not seem to be any one algorithm that is consistently the fastest, every one of them is noticeably slower than others under some circumstances. Even weirder, things that you would expect to be straightforward and true are not. Here are some things to scratch your head over:
  • AVX instructions are never the fastest, and on an i7 the fastest is plain old SSE (for the record MMX was not tested)
  • With Clang, enabling Neon instructions makes everything a lot slower
  • On the Raspberry Pi doing a read only table lookup using Neon is slower than with regular instructions
  • On an i7 multiplication is sometimes faster than arithmetic shifting

keskiviikko 31. toukokuuta 2017

Gee, optimization sure is hard

In a recent Reddit discussion the following piece of code was presented:

for (unsigned c = 0; c < arraySize; ++c) {
    if (data[c] >= 128)
        sum += data[c];
}

This code snippet seems to be optimal but is it? There is a hard to predict branch and a data dependency, both of which can cause slowdowns. To see if this can be implemented faster I created a test repo with a bunch of alternative implementations. Let's see how they stack up.

The implementations

The simplest is the simple loop as described above.

A version using a lookup table has a helper table of 256 entries. Values larger or equal to 128 have identity value and values smaller than 128 have the value zero. The body of the loop then becomes result += lut[buf[i]], which is branch free.

The bit fiddling approach goes through the values one by one and first calculates a mask value which is b >> 7. This is an arithmetic right shift whose outcome is either all zeros or all ones depending whether the value of b is less than 128. Then we add the value to the result by ANDing it with the mask. The core of this loop is result += buf[i] & (((int8_t)buf[i]) >> 7).

The partitioning approach divides the data array with std::partition and then only the part with values we want are added.

The zeroing approach goes over the array and sets all values that are less than 128 to zero. Then it goes over it again and adds all entries unconditionally. This goes over the data twice (and mutates it) but there are no data dependencies.

The bucket approach has an array of 256 entries. The loop goes over the data and increments the count for each input byte like this: ++counts[buf[i]]. The result is then obtained by going over entries 128-255 and evaluating result += i*counts[i].

So which one is the fastest?

Well that depends. A lot. On pretty much everything.

On x86_64 lookup table and bucket are the fastest, but only when using -O2. And GCC. For some reason Clang can't optimize the bucket version and it is consistently slower. On O3 bit fiddling becomes faster.

On a Raspberry Pi and -O2, the fastest are bit fiddling and bucket. But when using -O3, the simple loop is the fastest by a noticeable margin.

There does not seem to be a version that is consistently the fastest. However the versions that mutate their input array (partition and zeroing) are consistently the slowest.

What should I do to get teh fastest codez for my program?

This is a difficult question and depends on many things. The most straightforward solution is to write the simplest code you possibly can. It is easiest to understand and modify and is also the simplest for the compilers to optimize for the largest number of targets.

After that find out if you have a bottleneck. Measure. Measure again. Write an optimized version if you really need it. Measure. Measure with many different compilers and platforms. Keep measuring and fixing issues until the project reaches end of life.

maanantai 15. toukokuuta 2017

Emulating the Rust borrow checker with C++ part II: the borrowining

The most perceptive among you might have noticed that the last blog post did not actually do any borrowing, only single owner semantics with moves. Let's fix that. But first a caveat.

As far as I can tell, it is not possible to emulate Rust's borrow checker fully in compile time with C++. You can get pretty close (for some features at least) but there is some runtime overhead and violations are detected only at runtime, not compile time. See the end of this post for a description why that is. Still, it's better than nothing.

At the core is a resource whose ownership we want to track. We either want to allow either one accessor that can alter the object or many read-only accessors. We put the item inside a class that looks like this:

template<typename T>
class Owner final {
 private:
    T i;
    int roBorrows = 0;
    bool rwBorrow = false;
<rest of class omitted for brevity>

The holder object does not give out pointers or references to the underlying object. Instead you can only request either a read only or read-write (or mutable) reference to it. The read only function looks like this (the rw version is almost identical):

template<typename T>
RoBorrow<T> Owner<T>::borrowReadOnly() {
    if(rwBorrow) {
        throw std::runtime_error("Tried to create read only borrow when a rw borrow exists.");
    } else {
        return ::RoBorrow<T>(this); // increments borrow count
    }
}

Creating read only borrows only succeeds if there are no rw borrows. This code throws an exception but it could also call std::terminate or something else. A rw borrow would fail in a similar way if there are any ro borrows.

The interesting piece is the implementation of the proxy object RoBorrow<T>. It is the kind of move-only type that was described in part I. When it goes out of scope its destructor decrements the owner's borrow count:

~RoBorrow() { if (o) { o->decrementRoBorrows();} }

The magic lies in the conversion operator that is slightly different than the original version:

operator const T&() const { return owner->i; }

The conversion operator only gives out a const reference to the underlying object, which means only const operations can be invoked on it. Obviously this does not work for e.g. raw file descriptors because a "const int" does not protect the stat of the kernel fd object.  In order to call non-const functions you first need to get an RwBorrow whose conversion operator gives a constless reference, and Owner will only provide one if there are no other borrows outstanding.

When using this mechanism the following code fails (at runtime):

 auto b1 = owner.borrowReadOnly();
 auto b2 = owner.borrowReadWrite();
 
But this code works:

{
    auto b1 = owner.borrowReadOnly();
    auto b2 = owner.borrowReadOnly();
}
{
    auto b3 = owner.borrowReadWrite();
}
{
    auto b4 = owner.borrowReadOnly();
}

because the borrows go out of scope and are destroyed decrementing borrow counts to zero.

Why can't this be done at compile time?

I'm not a TMP specialist so maybe it can but as far as I know it is not possible. I'd love to be proven wrong on this one. The issue is due to limitations of constexpr which can be distilled into this dummy function:

template<typename T>
void constexpr foo(boolean b) {
    T x;
    constexpr int refcount = 1;
    if constexpr(b) {
        // do something
        --refcount;
    } else {
        // do something else
        --refcount;
    }
    static_assert(refcount == 0);
}

The static assert is obviously true no matter which code path is executed but the compiler can't prove that. First of all the if clause does not compile because its argument is not a compile-time constant. Second of all, the refcount decrements do not compile because constexpr variables can not be mutated during evaluation. You can create a new variable with the value x-1 but it would go out of scope when the subblocks end and there is no phi-node -like concept to get the value out to the outer scope. Finally, destructors can not be constexpr so even if we could mutate the count variable, it would not be done automatically (even though in theory the compiler has all the information it needs to do that).

lauantai 13. toukokuuta 2017

Emulating the Rust borrow checker with C++ move-only types

Perhaps the greatest thing to come out of C++11 is the notion of move semantics. Originally it was designed to make it efficient to return big objects like matrices from functions. In move operations the destination "steals the guts" of the source object rather than making a copy of it. Usually this means taking hold of some pointer to a reserved memory block and assigning the source's pointer to nullptr.

This mechanism is not reserved to pointers and can be used for any data type. As an example let's write a move-only integer class. A typical use for this would be to store file descriptors to ensure that they are closed. The code is straightforward, let's go through it line by line:

class MoveOnlyInt final {
 private:
    int i;

    void release() { /* call to release function here */ }

This is basic boilerplate. The release function would typically be close, but can be anything.

 public:
    explicit MoveOnlyInt(int i) : i(i) {}
    ~MoveOnlyInt() { release(); }

We can construct this from a regular integer and destroying this object means calling the release function. Simple. Now to the actual meat and bones.

    MoveOnlyInt() = delete;
    MoveOnlyInt(const MoveOnlyInt &) = delete;
    MoveOnlyInt& operator=(const MoveOnlyInt &) = delete;

These declarations specify that this object can not be copied (which is what C++ does by default). Thus creating two objects that hold the same underlying integer is not possible unless you explicitly create two objects with the same bare integer.

    MoveOnlyInt(MoveOnlyInt &&other) { i = other.i; other.i = -1; }
    MoveOnlyInt& operator=(MoveOnlyInt &&other) { release(); i = other.i; other.i = -1; return *this; }

Here we define the move operators. A move means releasing the currently held integer (if it exists) and grabbing the source's integer instead. This is all we need to have a move only type. The last thing to add is a small helper function.

    operator int() const { return i; }
};

This means that this object is convertible to a plain integer. In practice this means that if you have a function like this:

void do_something(int x);

then you can call it like this:

MoveOnlyInt x(42);
do_something(x);

You could achieve the same by creating a get() function that returns the underlying integer but this is nicer and more usable. This is not quite as useful for integers but extremely nice when using plain C libraries with opaque structs. Then your move only wrapper class can be used directly when calling plain C functions.

What does this allow us to do?

All sorts of things, the object behaves much in the same way as Rust's movable types (but is not 100% identical). You can for example return it from a function, which transfers ownership in a compiler enforced way:

MoveOnlyInt returnObject() {
    MoveOnlyInt retval(42);
    return retval;
}

If you try to pass an object as an argument like this:

int byValue(MoveOnlyInt mo);
...
byValue(mo);

a regular class would get copied but for a move-only type you get a compiler error:

./prog.cpp:39:13: error: call to deleted constructor of 'MoveOnlyInt'
    byValue(mo);
            ^~
../prog.cpp:13:5: note: 'MoveOnlyInt' has been explicitly marked deleted here
    MoveOnlyInt(const MoveOnlyInt &) = delete;
    ^
../prog.cpp:22:28: note: passing argument to parameter here
int byValue(MoveOnlyInt mo) {

Instead you have explicitly tell the compiler to move the object:

byValue(std::move(mo));

Some of you might have spotted a potential issue. Since a MoveOnly object can be converted to an int and there is a constructor that takes an int, that could create two objects for the same underlying integer. Like this:

MoveOnlyInt mo(42);
MoveOnlyInt other(mo);

The compiler output looks like the following:

../prog.cpp:37:14: error: call to deleted constructor of 'MoveOnlyInt'
    MoveOnlyInt other(mo);
             ^     ~~
../prog.cpp:13:5: note: 'MoveOnlyInt' has been explicitly marked deleted here
    MoveOnlyInt(const MoveOnlyInt &) = delete;

The compiler prevents creating invalid objects.

The wrapper object has zero memory overhead compared to a plain int and code generation changes are minimal. The interested reader is encouraged to play around with the compiler explorer to learn what those are.

Is this just as good as native Rust then?

No. Rust's borrow checker does more and is stricter. For example in C++ you can use a moved-from object, which may yield a null pointer dereference if your underlying type was a pointer. You won't get a use-after-free error, though. On the other hand this class is 18 lines of code that can be applied to any existing C++ code base immediately whereas Rust is a whole new programming language, framework and ecosystem.

Anyone is also free to do the wrong thing and take a copy of the integer value without telling anyone but this issue remains in every other language as well, even in unsafe Rust.


tiistai 9. toukokuuta 2017

A list of common Meson antipatterns

In the vein of a blog post I wrote years ago, here are some Meson antipatterns that should be avoided in build definitions.

Adding -g, -Wall etc manually

The basic design principle of Meson is that common operations and features should work out of the box without any user intervention. That means that debug information should be enabled for debug builds (which are the default). Similarly basic warning flags should always be on, because anyone coding without -Wall is not doing software development, but practicing astrology.

Many people seem to be burned by existing build systems and add these flags manually as the first thing they do. There is no need, we do it for you. Similarly instead of manually adding flags such as -Wextra, look into changing the option warning_level instead.

Adding these flags manually is also bad for portability because there are compilers that do not support them.

Building full paths to files in the source tree

This pattern looks something like this:

myfiles = files('@0@/@1@'.format(meson.current_source_dir(), 'file.c', [other files here])

This makes sense if you are used to the Make's "everything is a string" semantics. Meson is not like that, it tries to take care of mundane things for you. The simpler version of the above is this:

myfiles = files('file.c', [other files here])

This will verify the existance of the file when called and will store the full location. The output can be used in any other subdirectory in any target and Meson will evaluate the correct path for you. There may be some corners where file objects are not accepted where they should. Please file bugs if you encounter one of these.

The former syntax works currently but is already an error in current Git master.

Using target_machine in cross compilation

Meson supports the full canadian cross environment. It is very powerful but unfortunately quite confusing until you wrap your brain around it. Many people will choose to use target_machine instead of host_machine for their cross compilation, even though that is incorrect and leads to interesting bugs. Fortunately there is a simple rule of thumb which is correct 99% of the time:
  • build_machine is the desktop or laptop computer you are using to do your development
  • host_machine is the low powered IoT device, ARM board or equivalent that will run your program
Unless you really know what you are doing and are building one of the very few packages that need to care (Binutils, GCC, Clang, some others), ignore target_machine. Pretend it does not even exist and everything will become simpler and more correct.

Manually setting up dependencies between targets

This happens often when defining custom targets or run targets that operate on the output of other custom targets. In Meson outputs carry dependency information with them. If you use the output of one target as an input to another, Meson will automatically set up all the dependencies. A consistent antipattern that can be seen in projects in the wild is going through strange contortions to build a string representing the path of a target output and then adding the dependency object to the target's extra dependency keyword argument. That is only meant for programs that do not take all their inputs as arguments because, for example, they do file globbing internally.

Like with files there may be cases where using the output object directly fails. These are all bugs, please report them.

maanantai 1. toukokuuta 2017

The Infinity Horse

The early 1900s saw the popularisation of the automobile. Everyone was jumping on the proverbial bandwagon of this new transport technology. It was cool and exciting so people wanted to be part of it, even if they might not have had full technical understanding of the domain.

One of those movers was a company in the business of raising, selling, renting and feeding horses. The managers understood that horses were a dead-end business and that the future was in motorized vehicles. Thus they started putting a lot of money and effort into creating their own car.

Engineers were hired, meetings were had and factories were being prepped for production. After a few months of work the main obstacles were overcome and the plan to build and launch a car brand were looking good. That is when the CEO scheduled a surprise meeting with the chief engineer of the project.

"We had a top management meetin last week and what we came up with was spectacular! We are going to completely revolutionize the way people think about cars!"

"Really," said the surprised engineer.

"Yes! What is the most annoying thing about cars?" he asked and without giving the engineer a chance to reply continued "Running out of fuel. That never happens with horses, so this is really a step backwards."

The engineer felt like pointing out that there is such a thing as a horse that has run out of fuel. It's called a dead horse and the difference between that and a car was that adding fuel to a dead horse does not make it come back alive. But he chose not to mention this issue, but instead wanted to hear this new master plan.

"So what we are going to do - and this is going to be a world first - is to create a car that never runs out of fuel. We shall call it The Infinity Horse. Marketing department is already creating a campaign for this. I have seen some of the sketches and they are just out of this world!"

"Errrm, never runs out of fuel?" the engineer said with a puzzled look on his face.

"Never!"

"That's not really possible, how are you going to do that?"

"That's your job isn't it? Just make it happen, it can't be that hard."

A very unpleasant two weeks followed. The engineer did everything in his power to try to explain that the aim was impossible, which did not work, and trying to suggest alternative solutions, which did not work either. Nothing short of perfection was acceptable. As an example making the gas tank bigger was not an option because while it made the problem occur less often it was not enough.

"The Vision is The Infinity Horse. Unless we have this feature we have nothing," the CEO repeated every time issues were raised.

This drove our engineer to despair. He was being crushed between the unbreakable laws of physics and the immovable opinion of the CEO. Grasping at straws he started going through automotive parts catalogues. There he saw a new product: a warning light that activated when the car was running out of fuel. And then it came to him.

He took the warning light and modified it slightly. Instead of turning on a light it shut down the engine. This would fulfill the requirement. The car would never run out of fuel, because it could be easily proven that there was still gas in the tank. Said fuel could not be used for driving but it was there. The only thing this modification did was to reduce the maximum range making the car worse by every conceivable objective metric.

The CEO loved it. The Infinity Horse had been achieved and it was perfect! If this system caused people to get stranded, well, that's their own fault for not doing due diligence and adding enough fuel to the tank before embarking on their journey.

sunnuntai 16. huhtikuuta 2017

Why don't you just rewrite it in X?

Whenever a new programming language becomes popular its fanpersons start evangelizing its virtues by going to existing projects and filing bug reports that look like this.

Hi, I noticed that this project is written in [programming language X]. You really should just rewrite it in [programming language Y] because it is better at [some feature Z]. Kthxbye!

When put like that this seems like a no-brainer. Being better at Z is good so surely everyone should just port their projects to Y.

Recently there has been movement to convert tooling used by various software projects in the Gnome stack from a mishmash of shell, Awk and Perl into Python 3. The main reasoning for this is that having only one "scripting" dependency to a modern, well maintained project makes it simple to compile applications using Gnome technologies on platforms such as Windows. Moving between projects also becomes easier.

One tool undergoing this transition is GTK-doc. which is a documentation generation tool written mostly in Perl. I have been working together with upstream to convert it to Python 3. This has been an educational experience in many ways. One of the first things learned is that converting between any two languages usually breaks down to three distinct phases.

  1. Manual syntax conversion
  2. Fixing bugs caused by conversion errors
  3. Converting code to idiomatic target language

A Perl to Python conversion is relatively straightforward in the case of gtk-doc. It mostly deals with regular expressions, arrays and dictionaries. All three of these behave pretty much the same in both languages so step one is mostly manual work. Step two consists of fixing all the bugs and behavioural changes introduced in phase one (many caused by typos and lapses of concentration during step one). This phase is basically debugging. The third step is then a question of converting regular expressions and global variables into objects and other sane and readable constructs.

When doing the conversion I have been mostly focusing on step one, while the gtk-doc maintainer has agreed to finalize the work with steps two and three. While converting the 6000+ lines file gtkdoc-mkdb, I did some measurements and it turns out that I could do the conversion at a peak rate of 500 lines an hour, meaning roughly 7 seconds per line of code.

This was achieved only on code that was easy to convert and was basically an exercise in Emacs fingering. Every now and then the code used "fancy" Perl features. Converting those parts was 10x, 100x, and sometimes up to 1000x slower. If the port had required architectural rework (which might happen when converting a lax language to one that has a lifetime checker in the compiler, for example) it would have slowed things down even more.

I don't know how much work steps 2 and 3 entail, but based on comments posted on certain IRC channels, it is probably quite a lot. Let's be generous and say overall these three items come to 250 lines of converted code per hour.

Now comes the truly sad part. This speed of conversion is not sustainable. Manually converting code from one format to another is the most boring, draining and soul-crushing work you can imagine. I could only do this for a maximum of a few hours per day and then I had to stop because all I could see was a flurry of dollar signs, semicolons and curly braces. Based on this we can estimate that a sustained rate of conversion one person can maintain is around 100 lines of code per hour (it is unlikely that this speed can be maintained if the project goes on for weeks but since there are no measurements let's ignore it for now).

The cURL project consists of roughly 100 thousand lines of C code according to Ohloh. If we assume that converting it to a some other language is just as easy as converting simple Perl to Python (which seems unlikely), the conversion would take 1000 person hours. At 8 hours per day that comes to around 5 months of full time work. Once that is done you get to port all the changes made in trunk since starting the conversion. Halting the entire project while converting it from one language to another is not an option.

This gives us a clear answer on why people don't just convert their projects from one language to another:

There is no such thing as "just rewrite it in X".

Post scriptum

There are tools that automatically convert from one language to another. They can help, but only in step one. Steps two and three are still there, and could take more work than manually converted code because usually manual conversion produces more human-understandable code. Sadly Turing-completeness says that we can't have nice things.

keskiviikko 12. huhtikuuta 2017

How Meson is tested

A build system is a very important part of the development workflow and people depend on it to work reliably. In order to achieve this we do a lot of testing on Meson. As this is mostly invisible to end users I thought I'd write some information about our testing setup and practices.

Perhaps the most unconventional thing is that Meson has no unit tests in the traditional sense of the word. The code consists of functions, classes and modules as you would expect but there are no test for these individual items. Instead all testing is done on full projects. The bulk of tests are what could traditionally be called integration tests. That is, we take an entire project that does one thing, such as compile and install a shared library, and then configures it, builds it, runs tests and runs install and checks that the output is correct.

There are also smaller scale tests which are called unit tests. They also configure a project but then inspect the result directly. As an example test might set up a project and build it, and then touch one file and verify that it triggers a rebuild. Some people might claim that these are not unit tests either or that they should instead be tested with mock classes and the like. This is a valid point to make, but this is the terminology we have converged unto.

Enter The (Testing) Matrix

Our testing matrix is big. Really big. At its core are the three main supported platforms, Linux, Windows and OSX. We also support BSD's and the like but we currently don't have CI machines for them.

On Windows our Appveyor setup tests VS2010, 2015 and 2017 and in addition mingw and Cygwin. Apart from Cygwin these all are tested on both 32 and 64 bits variants. Visual studio is tested both with the Ninja and Visual Studio project generator backends.

OSX is the simplest, it is tested only with the Ninja backend using both regular and unity builds.

Linux tests do a lot more. In addition running the basic tests (both in unity and regular modes) it also runs the entire test suite in a cross compilation setup.

All of these tests are only for the core code. Meson supports a large amount of frameworks and libraries, such as GLib, Qt and Doxygen, and many programming languages. Every one of these has an associated test or, usually, several. This means that running the full test suite on Debian requires installing all of these:
  • Boost
  • D, Rust, Java, Fortran, C# and Swift compilers
  • Qt 5, WxWidgets, GTK+ 3
  • Valgrind
  • GNUstep
  • Protocol Buffers
  • Cython
  • GTest and GMock
And a bunch of other packages as well. The CI Docker image that has all of these installed takes 2 gigabytes of space. On many distros all dependencies are not available so packagers have to disable tests. Having a build dependency on all these packages sometimes yields interesting problems. As an example the Rust dependency means that Meson depends on LLVM. Every now and then it breaks on s390x meaning that Meson and every package that uses it to build get flagged for removal from Debian.

Every merge proposal to Meson master is run through all of these tests and is eligible for merging only if they all pass. There are no exceptions to this rule. 

There are some downsides to this, the biggest being that every now and then Appveyor and/or Travis get clogged and getting the green light takes forever. We looked briefly into getting paid instances but for our usage the bill would be in the neighborhood of $300 per month. Given that you can buy your own hardware for that kind of money, this has not been seen as a worthwhile investment. 

lauantai 8. huhtikuuta 2017

Meson project status update

The text below is a copy of this post to the Meson development mailing list.

Hello everyone

The last few weeks have been an amazing ride for the Meson project. We
have gone from "interesting but niche" to being seriously considered
for such core infrastructure projects as Mesa, Wayland, Xorg and even
systemd. I would like to thank everyone who has contributed in making
this possible. Thanks to all contributors, evangelists, those who have
converted their projects, or even proposed it. We would not be here
without you.

However having this much growth brings with it new problems. The main
one of these, as most of you have probably noticed, is the growth in
pull request backlog. I know there are MRs that have been waiting for
quite a while and that this is very frustrating to those people who
have filed them. My apologies to you, we are trying to make this
better.

The second issue is the perennial problem of documentation.
Interestingly these two issues are quite tied together in unexpected
ways.

One of the things which is taking a lot of time for me personally is
making sure that all the documentation is up to date after merge
requests are done. This is a major problem and is limiting the number
of changes that can go in.

We have examined various ways of solving this problem. We have a PoC
for moving all documentation away from the Github wiki and inside the
main repository. This way we can require that all merge requests come
with corresponding documentation changes. This would reduce the review
burden.

A preview of the site can be seen here: http://mesonbuild.com/experimental/

The downside of this approach is that we would lose the ability to do
in-browser wiki editing, and even typo fixes would need to go via pull
requests. This is unfortunate. There does not seem to be a solution
that would work for both use cases. Final decisions on this have not
been made, so if there are people who have good suggestions or prior
experience with this, please speak up.

There are many other ways to help, too. Thus far I have done a large
fraction of, well, everything, ranging from bug reports to review to
designing new features and helping people on the mailing list. All of
these activities are fun and interesting, but there is only so much
time and I feel I'm already spreading myself too thin. This causes
problems such as the above mentioned backlog of merge requests.

We already have a wonderful group of people doing great work on IRC
and elsewhere. Thank you all for your efforts. But the fact of the
matter is that if the project continues seeing more uptake, we are
going to need more helpers, especially reviewers. Currently this task
rests almost entirely on me and Nirbheek. Any help on this would be
greatly appreciated. Just going over the code and saying "based on a
cursory glance this seems ok" would be helpful.

The one elephant in the room is that currently only I have merge
access to master. This is an obvious bottleneck, but if we manage to
fix the other issues listed above it should not be a limiting issue.
Still, if usage and contributions keep growing, we might consider a
multi-maintainer mode. The project is now large enough that we can
have subsystem maintainers, much in the same way as in the Linux
kernel. Example of these could include Windows maintainer, Gnome tools
maintainer, Qt maintainer, Android maintainer, documentation
maintainer and so on. A different approach would be to just have more
people with commit access and allowing them to move around the code as
they have interest. If you have experience or knowledge on how to best
handle this, please let us know.

Thank you all, and keep on compiling!

tiistai 4. huhtikuuta 2017

Inspector Gadget as an allegory of software engineering profession in the modern age

Many people think that Inspector Gadget is a quirky 80s animation series about a bumbling, but well meaning cyborg police inspector fighting an evil organization called MAD. This is not true. In reality the show is an allegory about the day to day work life in a modern software development company. It tells this story with foresight and accuracy that rivals, and sometimes even exceeds, Silicon Valley and its ilk.

This may seem like a bold claim, but let's us examine a typical episode and we shall see how facts stand firmly by this assertion.

The rockstar coder

The supposed hero of this series is Inspector Gadget himself. At the beginning of every episode he is spending his free time and is then unexpectedly called into work. Some sort of disaster has struck and he is the only one who can fix it. The Inspector then says Don't worry, chief, I'm always on duty and heads off.

From this simple exchange we know that the Inspector is the hotshot rockstar developer. The Ninja! The Cowboy Coder! The man who has no concept of work-life balance! He drops whatever it is he is doing and goes off to solve the problem.

Once at work we quickly discover that he is more than a rockstar. He is, in fact, an incompetent fool who thinks he is the greatest thing ever. During the course of the entire show, Inspector Gadget does not manage to make a single thing better. All he does is run around from one place to another and try arbitrary things that are not in any way helpful or even rational. Examples include swinging a mallet around randomly (while breaking expensive pieces of arg), inflating his coat for no rhyme or reason (usually knocking innocent people over) or even helping the bad guys perpetrate their crimes because they just ask for his help.

At no point does anyone call Inspector Gadget out for clearly being a useless idiot who only does harm. The entire organisation would work more efficiently if he was kicked out. But yet he remains, having reached a high enough level on the police force's organisation chart that he can't be fired, or even challenged.

The firefighter

The second person in our group of people is not actually a person, but a dog. Brain is the pet of Inspector Gadget. Whenever the inspector heads off to solve a case, Brain is given the same task: follow Inspector Gadget around and fix everything he breaks.

Here we find the only major disparity between Inspector Gadget the series and real life. In the latter, the incompetent rockstar (also known as Senior Systems Architect) breaks so many things that it takes more than one person to fix them. In extreme cases the group can consists of every other developer in the company. Perhaps this is the reason why Brain is portrayed as a dog. He does not represent a single person, but all of them.

If ew observe how Brain fixes the Inspector's bumblings, a common pattern arises. He always has to fix things in a way that the Inspector does not notice anything. Presumably if he were to notice these fixes, he would go back and change things back the way he left them. That is, completely broken.

Brain never does anything else, his life is a continuous stream of fighting fires that should never have been set alight. An illuminating example of this is the "pineapple incident" in episode Volcano Island, between 13m 54s and 14m 45s.

The fixer

The task of actually solving all problems is left to Penny, Gadget's niece. She does not fool around with pointless Rube Goldberg gadgets. She is highly competent and efficient in solving any problem. In an average episode Penny spends less than five minutes looking at the problem, finding the root cause, fixing it and wrapping everything up. She is a competent and smart person with solid engineering and problem solving skills. She single handedly thwarts MAD and is the main reason the entire world does not fall into anarchy.

Penny is never given any credit for this. In fact whenever she tries to talk some sense to the other characters, she is actively dismissed and belittled. Once the case is solved everyone goes to Inspector Gadget and congratulates him for doing a good job. The Inspector replies with something like I solved the case? I guess I did. Like any incompetent rockstar he is willing to take all the praise without doing any of the work.

In light of this it should be obvious why the character of Penny is a woman.

The wild card

The final person in this gallery is Dr. Claw, the leader of MAD. During the course of the show he does only two things. One: he looks at live video streams on his computer while scratching a cat. Two: he complains that everything done by everyone else is wrong (while never doing anything himself).

There are two differing theories on what he is supposed to represent. The first claims that he represents the company founder/owner who struck gold but since then has lost grasp of the present and just lashes out to his underlings at random.

The other theory claims that the creators of the show were incredibly foresighted and created Dr Claw as a prototype of modern day internet trolls 25 years before they actually appeared.

sunnuntai 26. maaliskuuta 2017

Debian package dependency browser

The Debian package browser is great but it is somewhat limited. As an example even though you can get dependencies and build dependencies for any package but not reverse dependencies (i.e. list all packages that depend or build-depend on this package).

I wrote a simple program that downloads the package repository file, parses it and creates an SQLite DB with all the link information and a simple Flask web app to browse it. This is what it looks like when showing the information page on Flex.


In addition you can do all sorts of analysis on the data. As an example here are the 20 packages that are depended on the most followed by the number of packages that depend on them:

libc6 19759
libstdc++6 6362
libgcc1 5814
libglib2.0-0 2906
python 2802
zlib1g 2620
libcairo2 1410
libgdk-pixbuf2.0-0 1385
libpango-1.0-0 1303
libpangocairo-1.0-0 1139
libqt5core5a 1122
libatk1.0-0 1075
libgtk2.0-0 1010
libxml2 979
libfreetype6 976
perl 880
libqt5gui5 845
libfontconfig1 834
python3 825
libqtcore4 810

And here is the same for build-dependencies:

debhelper 27189
cdbs 3130
dh-autoreconf 3098
dh-python 2959
pkg-config 2698
autotools-dev 2331
python-setuptools 2193
python-all 1888
cmake 1643
python3-setuptools 1539
dpkg-dev 1446
python3-all 1412
perl 1372
zlib1g-dev 1317
dh-buildinfo 1303
python 1302
libglib2.0-dev 1133
gem2deb 1104
default-jdk 1078
libx11-dev 973

The numbers are not exact because the parser is not perfect but as rough estimates these should be reliable. Just don't use them for anything requiring actual accuracy, ok?

The code is here. I probably won't work on it any more due to other obligations, but feel free to play around with it if you wish. 

torstai 16. maaliskuuta 2017

Is static linking the solution to all of our problems?

Almost all programming languages designed in the last couple of years have a strong emphasis on static linking. Their approach to dependencies is to have them all in source which is compiled for each project separately. This provides many benefits, such as binaries that can be deployed everywhere and not needing to have or maintain a stable ABI in the language. Since everything is always recompiled and linked from scratch (apart from the standard library), ABI is not an issue.

The proponents of static linking often claim that shared libraries are unnecessary. Recompiling is fast and disks are big, thus it makes more sense to link statically than define and maintain ABI for shared libraries, which is a whole lot of ungrateful and hard work.

To see if this is the case, let's do an approximation experiment.

Enter the Dost!

Let's assume a new programming language called Dost. This language is special in that it provides code that is just as performant as the equivalent C code and takes the same amount of space (which is no small feat). It has every functionality anyone would ever need, does not require a garbage collector and whose syntax is loved by all. The only thing it does not do is dynamic linking. Let us further imagine that, by magic, all open source projects in the world get rewritten in Dost overnight. How will this affect a typical Linux distro?

Take for example the executables in /usr/bin. They are all implemented in Dost, and thus are linked statically. They are probably a bit larger than their original C versions which were linked dynamically. But by how much? How would we find out?

Science to the rescue

Getting a rough ballpark estimate is simple. Running ldd /usr/bin/executable gives a list of all libraries the given executable links against. If it were linked statically, the executable would have a duplicate copy of all these libraries. Said in another way, each executable grows by the size of its dependencies. Then it is a matter of writing a script that goes through all the executables, looks up their dependencies, removes language standard libraries (libc, stdlibc++, a few others) and adds up how much extra space these duplicated libraries would take.

The script to do this can be downloaded from this Github repo. Feel free to run it on your own machines to verify the results.

Measurement results

Running that script on a Raspberry Pi with Rasbian used for running an IRC client and random compile tests says that statically linked binaries would take an extra 4 gigabytes of space.

Yes, really.

Four gigabytes is more space than many people have on their Raspi SD card. Wasting all that on duplicates of the exact same data does not seem like the best use of those bits. The original shared libraries take only about 5% of this, static linking expands them 20 fold. Running the measurement script on a VirtualBox Ubuntu install says that on that machine the duplicates would take over 10 gigabytes. You can fit an entire Ubuntu install in that space. Twice. Even if this were not in issue for disk space, it would be catastrophic for instruction caches.

A counterargument people often make is that static linking is more efficient than dynamic linking because the linker can throw away those parts of dependencies that are not used. If we assume that the linker did this perfectly, executables would need to use only 5% of the code in their dependencies for static linking to take less space than dynamic linking. This seems unlikely to be the case in practice.

In conclusion

Static linking is great for many use cases. These include embedded software, firmwares and end user applications. If your use case is running a single application in a container or VM, static linking is a great solution that simplifies deployment and increases performance.

On the other hand claiming that a systems programming language that does not provide a stable ABI and shared libraries can be used to build the entire userland of a Linux distribution is delusional. 

maanantai 13. maaliskuuta 2017

Dependencies and unity builds with Meson

Prebuilt dependencies provided by the distro are awesome. You just install them and start working on your own project. Unfortunately there are many cases where distro packages are either not available or too old. This is especially common when compiling on non-Linux platforms such as Windows but happens on Linux as well when using jhbuild, Flatpak, Snappy or one of the many other aggregators. Dependencies obtained via Meson subprojects also fall into this category.

Unity builds

There is a surprisingly simple way of compiling projects faster: unity builds. The basic principle is that if your target has files foo.cpp, bar.cpp and baz.cpp, you don't compile them. Instead you generate a file called target-unity.cpp, whose contents are this:

#include<foo.cpp>
#include<bar.cpp>
#include<baz.cpp>

Then you compile just this file. This makes the compilation go faster. A lot faster. As an example converting Qt Creator to compile as a Unity build made it compile 90% faster. Counterintuitively it is even faster to compile a unity file with one core than to use four cores to build the files individually. If this is the first time you have encountered unity builds, this probably feels like a sham, something that just can't be possible. Not only is this possible, but unity builds are used in production in many places, especially in game development. As a different kind of example SQLite ships as a single "amalgamation file", which is the same thing as a unity build. Unity builds also act as a sort of poor man's link time optimization, which works even on compilers that do not support LTO natively.

Unity builds have their own limitations and problems. Some of them are discussed at the end of this article.

Dependencies

Meson has had unity build support available for a long time. The unfortunate downside is that incremental builds take a lot more time. This makes the edit-compile-debug cycle slower which is annoying. There are now two merge requests outstanding (number one, number two) that aim to make the user experience better.

With these changes you can tell Meson to unity build only subprojects, not the main project. This means that all your deps build really fast but the master project is built incrementally. In most use cases subprojects are read only. Only the master project is edited. For most people dependencies are slower to build than projects using them, so this gives a nice productivity boost. The other merge request enables finer grained control by allowing the user to override unityness for each target separately.

Please try out the branches and write your feedback and comments to the merge requests.

Problems of unity builds

Not all projects can be built as unity builds out of the box. The most common problem is having static variables and functions with the same name in different source files. In a unity build these will clash and not compile. There are other problems of similar nature, especially for projects that do crazy things with the preprocessor. These problems are fixable but take some effort. Most open source projects probably won't compile as unity builds out of the box.

Any project that wants to provide for a unity build must, in practice, have gating CI that compiles the source as a unity build. Otherwise it will break every now and then.

maanantai 13. helmikuuta 2017

Looking into what Rust can do that other languages can't ... or can they

A recent blog post What Rust Can Do That Other Languages Can't the following statement is presented:

struct X {
  y: Y
}
impl X {
  fn y(&self) -> &Y { &self.y }
}

This defines an aggregate type containing a field y of type Y directly (not as a separate heap object). Then we define a getter method that returns the field by reference. Crucially, the Rust compiler will verify that all callers of the getter prevent the returned reference from outliving the X object. It compiles down to what you'd expect a C compiler to produce (pointer addition) with no space or time overhead.

Let's look into this a bit more deeply. But, since this is The Internet, first a pre-emptive statement.

Things not claimed in this blog post

To make things easier for forum discussers who have strong opinions but have hard time reading entire articles, here is a list of things that are not claimed. I repeat: not claimed.
  • that [other language] is as safe as Rust (it is most likely not)
  • that you should use [other language] instead of Rust
  • that you should use Rust instead of [other language]
  • that [other language] is safe out of the box (it probably is not)
  • security feature X of language Y will prevent all bugs of type Z (it does not)

To business

Plain C code corresponding to the setup above is roughly the following:

struct foo {
  int x;
};

struct bar {
  struct foo f;
};

A simple function that demonstrates the bug is the following:

int main(int argc, char **argv) {
  struct foo *f;
  struct bar *expiring = malloc(sizeof(struct bar));
  f = &expiring->f;
  free(expiring);
  do_something(f);
  return 0;
}

This uses an obviously invalid pointer in a function call. Neither GCC nor Clang warns about this when compiling. Scan-build, however, does detect the bug:

expired.c:23:3: warning: Use of memory after it is freed
  do_something(f);
  ^~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.

Address sanitizer also detects the invalid access operation inside do_something. Things get more interesting if we change the function so that expiring is on the stack instead of the heap.

int main(int argc, char **argv) {
  struct foo *b;
  {
    struct bar expiring;
    b = &expiring.f;
  }
  do_something(b);
  return 0;
}

This one is not detected at all. Neither GCC, Clang, Scan-build, Asan, CppCheck nor Valgrind see anything wrong with this. Yet this seems so simple and obvious that it should be detectable automatically.

Recently this has become possible with Address sanitizer. It grew a new argument -fsanitize-address-use-after-scope, which checks for exactly this and finds the bug effortlessly:

==10717==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffe5d0c1920 at pc 0x00000050767f bp 0x7ffe5d0c18a0 sp 0x7ffe5d0c1898

This requires, of course, a run of the program that executes this code path. It would be nice to have this in a static analyzer so it could be found in even more cases (there is already a bug report for this for scan-build).

In conclusion

The security features of Rust are great. However many of them (not all, but many) that were unique to Rust are now possible in other languages as well. If you are managing a project in C or C++ and you are not using asan + scan-build + other error checkers on your code, you should use them all and fix the issues detected. Do it even if you intend to rewrite your code in Rust, Go or any other language. Do it now! There is no excuse.

keskiviikko 8. helmikuuta 2017

Enabling HTTPS is easy

To see how easy I looked how well it is supported in a bunch of free software blog aggregator sites.

The cobbler's shoes have no children.