Monday, September 12, 2016

The unspeakable horror of Visual Studio PDB files

Disclaimer: everything said in the following blog post may be completely wrong. Since PDB files are (until very recently) completely undocumented everything below has been determined by trial and error. Doubt everything, believe nothing.

Debugging information, how hard can it be?

When compiling C-like languages, debug information is not a problem. It gets written in the object file along with code and when objects are linked to form an executable or shared library, each individual file's debug info is combined and written in the result file. If necessary, this information can then be stripped into a standalone file.

This seems like the simplest thing in the world. Which it is. Unless you are Microsoft.

Visual Studio stores debug information in standalone files with the extension .pdb. Whenever the compiler is invoked it writes an .obj file and then opens the pdb file to write the debug info in it (while removing the old one). At first glance this does not seem a bad design, and indeed it wasn't in the 80s when it was created (presumably).

Sadly multicore machines break this model completely. Individual compilation jobs should be executable in parallel (and in Unix they are) but with pdb files they can't be. Every compile process needs to obtain some sort of lock to be able to update the pdb file. Processes that could be perfectly isolated now spend a lot of time fighting with each other over access to the pdb file.

Fortunately you can tell VS to write each object file's pdb info in a standalone file which is then merged into the final pdb. It even works, unless you want to use precompiled headers.

VS writes a string inside the obj file a string pointing to the corresponding pdb file. However if you use precompiled headers it fails because the pdb strings are different in the source object and precompiled header object file and VS wants to join these two when generating the main .obj file. VS will then note that the strings are different and refuse to create the source object file because merging two files with serialised arrays is a known unsolvable problem in computer science. The merge would work if you compiled the precompiled header separately for each object file and gave them the same temporary pdb file name. In theory at least, this particular rat hole is not one I wish to get into so I did not try.

This does work if you give both compilations the same target pdb file (the one for the final target). This gives you the choice between a slowdown caused by locking or a slowdown caused by lack of precompiled headers. Or, if you prefer, the choice of not having debug information.

But then it gets worse.

If you choose the locking slowdown then you can't take object files from one target and use them in other targets. The usual reasons are either to get just one object file for a unit test without needing a recompilation or emulating -Wl,--whole-archive (available natively only in VS2015 or later) by putting all object files in a different target. Trying gets you a linking error due to an incorrect pdb file name.

There was a blog post recently that Microsoft is having problems in their daily builds because compiling Windows is starting to take over 24 hours. I'm fairly certain this is one of the main reasons why.

But then it gets worse.

Debug information for foo.exe is written to a file called foo.pdb. The debug information for a shared library foo.dll is also written to a file called foo.pdb.  That means you can't have a dll and an exe with the same name in the same directory. Unfortunately this is what you almost always want because Windows does not have rpath so you can't instruct an executable to look up its dependencies elsewhere (Though you can fake it with PATH. Yes, really.)

Fortunately you can specify the name of the output pdb, so you can tell VS to generate foo_exe.pdb and foo_lib.pdb. Unfortunately VS will also generate a dozen other files besides the pdb and whose names come from the target basename and which you can not change. No matter what you do, files will be clobbered. Even if they did not clobber the files would still be useless because VS the IDE assumes the file is called foo.pdb and refuses to work if it is not.

All this because writing debug information inside object files, which is where it belongs, is not supported.

But wait, it gets even worse.

Putting debug info in obj files was supported but is now deprecated.

In conclusion

Meson recently landed support for generating PDB files. Thanks to Nirbheek for getting this ball rolling. If you know that the above is complete bull and that it is possible to do all the things discussed above then please file bugs with information or, preferably, patches to fix them.

Friday, September 9, 2016

How to convert an Autotools project to Meson

Converting a project using Autotools into Meson is usually not particularly difficult, just grunt work. The Meson wiki has a guide and lots of other documentation on the subject. Here is an informal rough outline of the steps commonly needed.

Autoconf

Autoconf scripts can often seem impenetrable. Creating a converter script for them is as difficult a task as reimplementing all of Autoconf (and shell and a bunch of other stuff), which is not really feasible. Fortunately there is a sneaky shortcut to eliminate most of the work. The config.h file autoconf produces is easy to parse and autoconvert.

Meson has a helper script for exactly this use and applying it to a project is straightforward. Just copy config.h.in to config.h.meson. Then replace all lines that look like this:

#undef HAVE_ASPRINTF

into this:

#mesondefine HAVE_ASPRINTF

This can't be done automatically because some config headers have other #undef declarations so they can't be changed blindly. Then run the converter script on the resulting file. It will generate a skeleton project file that does the same checks in Meson. The checks it could not understand are left in the source as comments. This script usually needs some manual tweaking but should deal with most of the configure checks for you (unless you are doing something really low level like glib).

Automake

Converting automake is mostly just manual work of converting list of files for each target from Automake syntax to Meson syntax.

Except.

In some projects Automake setup houses the most baffling and incomprehensible parts of the entire build setup. In one project (which shall remain nameless) we had several project developers looking at the Make/shell pipeline/magic substitution/stuff declaration in the makefile just to understand what it was supposed to do (never mind how it was doing it).

Encountering one of these sucks. Fixing it will take a lot of effort, but the end result a notable reduction in technical debt and would make sense even if the project were to keep using Autotools. The most important thing when converting these to Meson is not to write it inside the Meson declaration. Always put these kind of large pipelines in standalone scripts (converted to Python if aiming to support Windows) and just invoke them from Meson. This makes the magic isolated and testable.

Finalisation

That should be the most of it. The rest is polishing things such as checking for dependencies (with pkg-config) and fixing things that break due to the different environment. As an example many autotools projects are only compiled and run in-source. Meson does not permit in-source builds so if any part of the project assumes an in-source build those need to be fixed. The usual culprit is unit tests.

That should be it. If you encounter any problems feel free to report bugs or chat with us at #mesonbuild on Freenode.