[Ltrace-devel] Getting prototypes from debug information

Thu May 1 21:21:46 UTC 2014

Dima Kogan <lists at dima.secretsauce.net> writes:

>   ==22679== 16 bytes in 1 blocks are still reachable in loss record 70 of 152
>   ==22679==    at 0x4C274A0: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>   ==22679==    by 0x56533A4: register_state (regex_internal.c:972)
>   ==22679==    by 0x5656790: re_acquire_state_context (regex_internal.c:1726)
>   ==22679==    by 0x55AE338: build_trtable (regexec.c:3439)
>   ==22679==    by 0x565C12F: re_search_internal (regexec.c:2307)
>   ==22679==    by 0x56609D4: regexec@@GLIBC_2.3.4 (regexec.c:253)
>   ==22679==    by 0x408F6F: re_match_or_error (filter.c:114)
>   ==22679==    by 0x409194: filter_matches_symbol (filter.c:183)
>   ==22679==    by 0x42B131: import_subprogram (dwarf_prototypes.c:887)
>   ==22679==    by 0x42B3D7: process_die_compileunit (dwarf_prototypes.c:930)
>   ==22679==    by 0x42B4EE: import (dwarf_prototypes.c:953)
>   ==22679==    by 0x42B67A: import_DWARF_prototypes (dwarf_prototypes.c:981)

This looks as if regexec itself is leaking memory.

> 2. The prototypes my DWARF parser produces leak a bit. Those are stored
> with protolib_add_prototype(). Is that sufficient? I.e. do protolibs
> eventually clean out their prototypes?
>
>   ==22679== 624 bytes in 13 blocks are definitely lost in loss record 142 of 152
>   ==22679==    at 0x4C29590: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>   ==22679==    by 0x42AA08: get_prototype (dwarf_prototypes.c:787)

I don't see a calloc call on line 787 of dwarf_prototypes.c.  Possibly
it's been optimized out severely, but maybe you've just changed it in
the mean time.  Maybe try building with -O0 and post a hopefully more
accurate bstack trace.  We might lack some own-flags somewhere or leak
in protolib, hard to tell.  In any case, this should be fixable in
ltrace itself.

By the way, looking into dwarf_prototypes.c, I see a number of style
points that will need addressing before this is accepted:

- Tabs should be 8 characters.  E.g.:
  > static struct arg_type_info* get_array( Dwarf_Die* parent, struct protolib* plib,
  >                                                                                 struct dict* type_dieoffset_hash)

- Lines should be no longer than 80 characters.  Same example as above.

- Pointer star belongs to the variable, not the type, i.e. Dwarf_Die
  *parent.  That's not quite as arbitrary as it may seem, two pointers
  can be defined by saying char *p, *q; but not by char* p, q;.  Same
  example as above.

  The above example should thus look like this:
  > static struct arg_type_info *get_array(Dwarf_Die *parent, struct protolib *plib,
  >                                        struct dict *type_dieoffset_hash)
  or even this:
  > static struct arg_type_info *
  > get_array(Dwarf_Die *parent, struct protolib *plib,
  >           struct dict *type_dieoffset_hash)
  The latter is how most of ltrace is written, but I don't mind the
  former.

- Lines shouldn't be formatted into tables arbitrarily.  E.g. this:
  >    struct arg_type_info*       result                          = NULL;
  >    struct arg_type_info*       member_type                     = NULL;
  >    int                         newly_allocated_member_type     = 0;
  should really be formatted thus:
  >    struct arg_type_info *result = NULL;
  >    struct arg_type_info *member_type = NULL;
  >    int newly_allocated_member_type = 0;

- if and while should get a space before the paren.

>   ==22679== 448 bytes in 8 blocks are definitely lost in loss record 140 of 152
>   ==22679==    at 0x4C29590: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>   ==22679==    by 0x42AB14: get_prototype (dwarf_prototypes.c:801)
>   ==22679==    by 0x42B281: import_subprogram (dwarf_prototypes.c:903)
>   ==22679==    by 0x42B3D7: process_die_compileunit (dwarf_prototypes.c:930)
>   ==22679==    by 0x42B4EE: import (dwarf_prototypes.c:953)
>   ==22679==    by 0x42B67A: import_DWARF_prototypes (dwarf_prototypes.c:981)
>   ==22679==    by 0x41CD04: library_get_prototype (output.c:222)
>   ==22679==    by 0x41CDC1: find_proto_cb (output.c:245)
>   ==22679==    by 0x407003: proc_each_library (proc.c:1005)
>   ==22679==    by 0x41CE84: lookup_symbol_prototype (output.c:261)
>   ==22679==    by 0x41D888: output_left (output.c:549)
>   ==22679==    by 0x41C0E1: handle_breakpoint (handle_event.c:756)
>
> I'm not sure how much we care, however. These should all be one-time
> allocations, and everything is cleared out when the process exits
> anyway.

We do care somewhat.  If it's not ridiculously hard to free things, we
should strive to free them to keep valgrind runs clean.  Ideally if the
test suite passes, it passes with --enable-valgrind as well (it used to
at some point, but I don't know anymore).

> Note that nanosleep didn't get its prototype parsed even though the
> DWARF is available. It turns out that the exported symbol is indeed
> called "nanosleep", but the DWARF definitions have a DW_AT_name of
> "__nanosleep" and a DW_AT_linkage_name "__GI___nanosleep". The DWARF has
> no mention at all of "nanosleep", which is the main issue. How can we
> infer a connection between those two?

I think what should work here is to look at DW_AT_low_pc of the
DW_TAG_subprogram (dwarf_lowpc in libdw seems to be handling this) and
cross-match it with ELF symbol tables, where each address will have a
number of alias symbols.  I guess you could just walk through ltrace's
own data structures, struct library and struct library_symbol, and look
into ::enter_addr of the latter to figure out what symbol name ltrace
assigned to this address.

> Another issue. I have debug symbols for libjpeg installed, so I wanted
> to run a program that uses this library to see if the DWARF is parsed
> correctly. Not only is the DWARF not being loaded, the library isn't
> being traced at all:
>
>  dima at shorty:~/projects/ltrace$ ldd /usr/bin/geeqie | grep jpeg
>
>          libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007fc5cd25e000)
>
>  dima at shorty:~/projects/ltrace$ ./ltrace -l 'libjpeg.so*' /usr/bin/geeqie /tmp/001.jpg
>
>  Could not init LIRC support
>  +++ exited (status 0) +++
>
> I know that functions from this library are actually being called (and
> the DWARF can be loaded) because gdb breakpoints are hit. I haven't
> debugged this thoroughly. Anybody have any obvious ideas in the
> meantime?

Not sure what could be going wrong.  My geeqie isn't linked vs. libjpeg,
but for me the above works for other libraries that geeqie is linked
against.  Does it break for you for, say, libm as well?  That gets a
number of calls in my case.

Thanks,
PM