When building a user space binary, the
-lc that gcc
inserts into the final link seems pretty straight forward link
the C library. As with all things system-library related there is
more to investigate.
If you look at
/usr/lib/libc.so; the "library" that gets
linked when you specify
-lc, it is not a library as such, but
a link script which specifies the libraries to link, which also
includes the dynamic linker itself:
$ cat /usr/lib/libc.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED ( /lib/ld-linux-x86-64.so.2 ) )
On
very
old glibc's the
AS_NEEDED didn't appear; so every binary
really did have a
DT_NEEDED entry for the dynamic linker
itself. This can be somewhat confusing if you're ever doing forensics
on an old binary which seems to have these entries for not apparent
reason. However, we can see that that
/lib/libc.so itself
does actually require symbols from the dynamic linker:
$ readelf --dynamic /lib/libc.so.6 grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
Although you're unlikely to encounter it, a broken toolchain can
cause havoc if it gets the link order wrong here, because the glibc
dynamic linker defines
minimal
versions of
malloc and friends -- you've got a
chicken-and-egg problem using libc's
malloc before you've
loaded it! You can simulate this havoc with something like:
$ cat foo.c
#include <stdio.h>
#include <syslog.h>
int main(void)
syslog(LOG_DEBUG, "hello, world!");
return 0;
$ cat libbroken.so
GROUP ( /lib/ld-linux-x86-64.so.2 )
$ gcc -o -Wall -Wl,-rpath=. -L. -lbroken -g -o foo foo.c
$ ./foo
Inconsistency detected by ld.so: dl-minimal.c: 138: realloc: Assertion ptr == alloc_last_block' failed!
Depending on various versions of things, you might see that assert
or possibly just strange, corrupt output in your logs as
syslog calls the wrong
malloc. You could debug
something like this by asking the dynamic linker to show you its
bindings as it resolves them:
$ LD_DEBUG_OUTPUT=foo.txt LD_DEBUG=bindings ./foo
Inconsistency detected by ld.so: dl-minimal.c: 138: realloc: Assertion ptr == alloc_last_block' failed!
$ cat foo.txt.11360 grep "\ malloc'"
11360: binding file /lib/libc.so.6 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol malloc' [GLIBC_2.2.5]
11360: binding file /lib64/ld-linux-x86-64.so.2 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol malloc' [GLIBC_2.2.5]
11360: binding file /lib/libc.so.6 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol malloc' [GLIBC_2.2.5]
Above, because the dynamic loader comes first in the link order,
libc.so.6's
malloc has bound to the minimal
implementation it provides, rather the full-featured one it provides
internally.
As an aside, AFAICT, there is really only one reason why a normal
library will link against the dynamic loader -- for the thread-local
storage support function
__tls_get_addr. You can try this
yourself:
$ cat tls.c
char __thread *foo;
char* moo(void)
return foo;
$ gcc -fPIC -o libtls.so -shared tls.c
$ readelf -d ./libtls.so grep NEED
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x00000001 (NEEDED) Shared library: [ld-linux.so.2]
0x6ffffffe (VERNEED) 0x314
0x6fffffff (VERNEEDNUM) 2
Thread-local storage is worthy of a
book of its own,
but the gist is that this support function says "hey, give me an
address of
foo in
libtls.so", the magic being that
if the current thread has never accessed
foo then it may not
actually have any storage for
foo yet, so the dynamic linker
can allocate some memory for it lazily and then return the right
thing. Otherwise, every thread that started would need to reserve
room for
foo "just in case", even if it never cares about
moo.
But looking a little closer at the symbols of
libc.so is
also interesting.
libc.so doesn't actually have many
functions you can override. You can see what it is possible to
override by checking the relocations against the procdure-lookup table
(PLT).
Relocation section '.rela.plt' at offset 0x1e770 contains 8 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000035b000 084600000007 R_X86_64_JUMP_SLO 00000000000a2100 sysconf + 0
00000035b008 02e000000007 R_X86_64_JUMP_SLO 0000000000075e50 calloc + 0
00000035b010 01dd00000007 R_X86_64_JUMP_SLO 0000000000077910 realloc + 0
00000035b018 029300000007 R_X86_64_JUMP_SLO 00000000000661c0 feof + 0
00000035b020 046f00000007 R_X86_64_JUMP_SLO 00000000000768c0 malloc + 0
00000035b028 000400000007 R_X86_64_JUMP_SLO 0000000000000000 __tls_get_addr + 0
00000035b030 01b400000007 R_X86_64_JUMP_SLO 0000000000076dd0 memalign + 0
00000035b038 086000000007 R_X86_64_JUMP_SLO 00000000000767e0 free + 0
i.e. instead of jumping directly to the
malloc defined in
the
libc code section, any internal calls will jump to this
stub which, the first time, asks the dynamic linker to go out and find
the address of
malloc (it then saves it, so the second time
the stub just jumps to the saved location).
This is an interesting list, seemingly without much order.
feof, for example, stands out as worth checking out a bit
closer why would that be there when
fopen isn't, say?
We can track down where it comes from with a bit of detective work;
knowing that the value of the symbol
feof will be placed into
0x35b018 we can disassemble
libc.so to see that this
address is used by the
feof PLT stub at
0x1e870
(luckily,
objdump has done the math to offest from the
rip for us; i.e.
0x1e876 + 0x33c7a2 = 0x35b018)
000000000001e870 <feof@plt>:
1e870: ff 25 a2 c7 33 00 jmpq *0x33c7a2(%rip) # 35b018 <_IO_file_jumps+0xb18>
1e876: 68 03 00 00 00 pushq $0x3
1e87b: e9 b0 ff ff ff jmpq 1e830 <h_errno+0x1e7dc>
From there we can search for anyone jumping to that address, and
find out the caller:
$ objdump --disassemble-all /lib/libc.so.6 grep 1e870
000000000001e870 <feof@plt>:
1e870: ff 25 a2 c7 33 00 jmpq *0x33c7a2(%rip) # 35b018 <_IO_file_jumps+0xb18>
f19f7: e8 74 ce f2 ff callq 1e870 <feof@plt>
$ addr2line -e /lib/libc.so.6 f19f7
/home/aurel32/eglibc/eglibc-2.11.2/sunrpc/bindrsvprt.c:70
Which turns out to be part of a
local
patch which probably gets things a little wrong, as described
below. The
sysconf relocation is from a similar add-on patch
(being used to find the page size, it seems).
libc, like all sensible libraries, uses the
hidden
attribute on symbols to restrict what is exported by the library.
The benefit of this is that when the linker knows you are referencing
a hidden symbol it knows that the value can never be overridden, and
thus does not need to emit extra code to do indirection just in case
you ever wish to redirect the symbols. In the above, it appears that
feof has never been marked as hidden, probably because no
internal glibc functions used it until that add-on patch, and since it
is not considered an internal function the linker must allow for the
possibility that it will be overridden at run time and provide a PLT
slot for it. There are consequences; if this was on a fast-path then
the extra jumps required to go via the PLT may matter for performance
and it may also cause strange behaviour if somebody had preloaded
something that took over
feof.
Note, this is different from saying that your library can
override symbols provided by
libc.so; such as when you
LD_PRELOAD a library to wrap an
open call. What you
can not override is the
open call that say, the internal
libc.so function
getusershell does to read
/etc/shells.
Having the
malloc related calls as preemptable seems
intentional and sane; although I can not find a comment to the effect
of "we deliberately leave these like this so that users may use
alternative malloc implementations", it makes sense so that
libc.so is internally using the same malloc as everything
else if you choose to use something such as
tc-malloc.
tl;dr? Digging into your system libraries is always interesting.
Be careful with your link order when creating toolchains, and be
careful about symbol visibility when you're working with
libraries.