[Ltrace-devel] Getting prototypes from debug information

Sat Apr 19 00:41:21 UTC 2014

Dima Kogan <lists at dima.secretsauce.net> writes:

> I think it would be most useful to read in a combination of DWARF and
> .conf files, instead of just one of the two. For instance, the code I
> have can parse FILE structures, so when ltrace decodes an fprintf()
> call, you get the full dissection of the FILE argument. This is less
> than useful. In cases like this I can imagine a .conf file can define a
> lens that says how to display a FILE, and the DWARF parsing can then be
> used to pull out the function prototype. I'm thinking that the .conf
> files should be parsed first, and those should take precedence over any
> DWARF data. Any thoughts on this? Implementation suggestions?

Yes, such mixing and matching of prototypes would be ideal.  I'm not
sure you can reliably assume that type FILE will be defined under this
name in conf.  But I guess, if it isn't, then that's just too bad,
nothing we can do...  Making this cooperation possible might be very
beneficial.

Currently you create a new protolib for each Dwarf.  If it should work
as you describe, Dwarf would add stuff to an existing protolib that
comes from a conf (or is a default empty protolib if there wasn't a
conf).

> Currently I parse the DWARF from all our DSOs, and there's a lot of
> overlapping DWARF data there. It's all imported into separate plibs.
> This sounds like it's probably fine (although inefficient). Thoughts?

So, extracting common parts of Dwarf and storing them separately into
helper protolibs, and importing those, would certainly be possible.  But
I'm not sure it's worth it.  I'm toying with dwgrep to get numbers of
functions, parameters and types for a huge libsclx.so.debug, but am
hitting some missing features, and have to do certain things in a
roundabout way.  It seems there's 145896 subprograms that are not
declarations and have a name.  Slicing the same problem differently,
there's 22996 subprograms that have DW_AT_low_pc or DW_AT_high_pc.
Either of these sets is peanuts and not really worth optimizing in my
opinion.

But there's 3321848 DW_TAG_subprogram's with DW_AT_name total, and it is
suspicious that only some 150K of those would be actual functions (the
rest presumably declarations brought in through header files), so maybe
I'm missing something.

When I take this full set of subprograms, we would need a bit over 700MB
to represent all the prototypes, their arguments, and types.  That's
quite a lot...  But it's for the biggest debuginfo file that I could
find, so for practical-sized problems, it won't be remotely as bad.

> The code I have tries to consult the ltrace filters to parse the DWARF
> only from functions and libraries the user asked about. I can't tell if
> this works or not. The ltrace documentation is very unclear on the
> difference between -e, -x and -l, so I don't yet know if that code is
> working correctly. Does it look like it's correct? I'm talking about the
> filter_matches_symbol() and filter_matches_library() calls.

-e traces PLT calls (i.e. inter-library calls), -x traces symbol entry
points.  -l traces PLT calls done to a symbol defined by a library in
-l.

Filters are currently meant to be used for determining what to trace.
Prototype libraries are loaded en bloc, no filtering applied.  That's
not necessarily the best way, but it's simple and hasn't been a problem
so far.  Pre-filtering based on -e and -x filters is probably OK though.
If a library being loaded matches -l filter, you need to keep at least
all the symbols from .dynsym, as a library that's loaded later may have
PLT slots with prototypes from that library, and will need its
prototypes.

Thanks,
PM