?

Log in

No account? Create an account
Warning: much rantage about GCC and Linux linking - 'Twas brillig, and the slithy toves did gyre and gimble in the wabe [entries|archive|friends|userinfo]
Thomas

[ website | Beware the Jabberwock... ]
[ deviantArt | the-boggyb ]
[ FanFiction | Torkell ]
[ Tumblr | torkellr ]

Links
[Random links| BBC news | Vulture Central | Slashdot | Dangerous Prototypes | LWN | Raspberry Pi]
[Fellow blogs| a Half Empty Glass | the Broken Cube | The Music Jungle | Please remove your feet | A letter from home]
[Other haunts| Un4seen Developments | Jazz 2 Online | EmuTalk.net | Feng's shui]

Warning: much rantage about GCC and Linux linking [Monday 23rd January 2012 at 10:51 pm]
Thomas

boggyb
[Tags|, , ]
[Feeling |irritatedirritated]
[Playing |Cube 7 - MGS.ac3]

So after the epic fail that was the last attempt to build GCC, I did a bit of research into the whole mess. It looks like that to have a reasonable chance of success I *do* need the exact version of GCC that was used.

Turns out that when you build GCC, you also build a library for your target (libgcc). This library contains a shedload of built-in functions that the compiler will generate calls to (anything from shifting bits to exception handling), and so will likely end up linked in to anything you build. The GCC folks have chosen to dynamically link it, and to apply a versioning system to not just the library but also to individual symbols exported from the library. The upshot, as best as I can tell short of actually monkeying around with different libgcc versions, is that a program compiled with a new version of GCC is highly unlikely to run on a system where the libraries were built with an older version of GCC. This applies to minor as well as major version increments.

Compare with Windows and MS Visual C++, where there is no such thing. Compiler builtins are created and inserted directly into your code by the compiler. The nice upshot is your code will run on both older and newer versions of Windows, and the limiting factor is generally a combination of what your C runtime is compatible with and which system APIs you call. There's even some fancy side-by-side stuff to allow multiple copies of DLLs with the same name to co-exist, though it does result in some cryptic error messages when you don't have the correct version installed.

Another reason for needing the exact GCC version is that libstdc++ is most definitely not compatible across different GCC versions. That's more understandable, given that the way C++ libraries are linked is horrid and nasty and requires rebuilding everything for what would appear to be completely isolated changes (like adding a private field to your class - oops, need to rebuild everything because this changes how big your class objects are and that's relied on by anything that uses objects of your class). It looks like smartmontools uses C++, so I'd have run into this problem eventually without the whole libgcc issue. In fairness to GCC, the Windows situation is similar - I believe Visual C++ 6 was based on a pre-release version of the standard and their C++ ABI also changed with later versions. Then again, on Windows you have the option of COM which makes all these ABI issues disappear.

Anyway, the other part of the puzzle is crti.o and crtn.o. I'm still not entirely clear as to what they do, but a blog posting did reveal at least one purpose: to handle static initializers. On Linux, executables and libraries can contain a .init section. This section contains code that is run when the program is loaded. When you link your program, the linker merrily concatenates all of the init sections together. The linker then needs some additional code to place at the start and the end of the combined init section to turn the machine code into something that can be sensibly called by the loader (namely a function prologue and epilogue). This additional code is provided by crti.o for the start and crtn.o for the end, which are magic files that the linker always includes regardless of what you actually tell it to link with. These are used both when linking executable binaries and when linking libraries.

Compare with Windows and MSVC, where there's no such craziness or magic start/end libraries and instead your entry point is a single function specified in the executable. Usually this is something internal to the C runtime which will handle initialising itself, creating your static objects (by walking a table of function pointers in a special code segment), and eventually calling main or wmain or WinMain (depending on what you called it). For libraries the loader will again call the C runtime's entry point, although since the loader holds the loader lock there's little you can do without potentially deadlocking your program.

Rant aside (and I'm sure someone will point out how to them the Windows model is crazy and the Linux one makes perfect sense), it turns out that the thing that normally builds crti.o and crtn.o ... is glibc. Annoying on some level this all makes perfect sense - glibc builds crti.o and crtn.o because they're conceptually part of the C library. gcc builds libgcc because (once you get over the daftness of an external library for internal compiler functions) when you're generating that many functions it makes sense to stick them in a library. And building libgcc needs crti.o and crtn.o because that's a not unreasonable way of dealing with the crazy way Linux handles initializers. Unfortuantly it ceases to be even slightly sensible and becomes utterly braindead when you're trying to build a shiny new toolchain from scratch and hit the wonderful catch-22 situation. Hence all the horrible hacks required to be able to build gcc so that you can build glibc so that you can build gcc (I personally think the ideal solution is for gcc to not rely on or create any compiler-specific libraries for your target).

Fortuantly this is a common problem when building cross-compiler toolchains, and there are existing workarounds that should do the trick. The two solutions I've seen are either to hack gcc so that it doesn't use libc (at least I think that's what inhibit_libc is supposed to do), or hack gcc so that it (and not glibc) builds crti.o and crtn.o. In both cases, once I've hacked the source and built a semi-working compiler that can build glibc, I then need to un-hack the sources and rebuild gcc to get a real compiler.

Link | Previous Entry | Share | Next Entry[ Penny for your thoughts? ]