[Ltrace-devel] Getting prototypes from debug information

Wed Apr 23 11:02:22 UTC 2014

All righty.

I have some newer code in the same git repo mentioned earlier. The main
changes are

- Style updated as you have requested. Maybe I missed some things, but
  mostly it should be fine

- added a disabled stub for complex float support. ltrace itself doesn't
  have complex lenses yet. I'd like to get the core of this new DWARF
  functionality working, then build extra things on it. Thus I'd like to
  punt on this one until later

- reworked attr_numeric() to report failures. This was clearly missing
  before. The previous tree was more of a proof of concept, but I'm now
  cleaning it up bit by bit. I'll go through the whole thing again when
  it stabilizes

- enums are no longer assumed to have type 'int'. See below

- I now import functions using their linkage name. This was needed to
  trap C++ calls correctly

- moved the DWARF-reading invocation to library_get_prototype()

More comments inline

Petr Machata <pmachata at redhat.com> writes:

>> - in getEnum, you assume that the underlying type is INT.  That may not
>>   be the case.  In C++11, you can actually choose the underlying enum
>>   type.  DW_TAG_enumeration_type's DW_AT_type (if present) will point to
>>   that underlying type.
>
> I looked into this, and GCC currently doesn't seem to emit DW_AT_type
> for enums with specified base.  Instead it denotes signedness by
> encoding enumerators with DW_FORM_udata or DW_FORM_sdata.  Size of the
> underlying type is encoded in DW_TAG_enumeration_type's DW_AT_byte_size,
> which might be important for enums in structs (I don't think it will
> matter in naked parameters, unless in weird cases like __int128, which
> ltrace doesn't handle anyway).

I ran some tests. Indeed, there's no DW_AT_type, but I always see
DW_FORM_sdata, even if it's supposedly unsigned. This is gcc 4.8.2. In
any case, the current code in my tree assumes signed. I think this is
fine for the first pass.

> Next, about the memory management.  That's rather spartan.  All the
> "own" values that you pass to all sorts of initializers determine
> whether the initialized structure owns the passed-in pointer.  Most of
> the time they will, because their lifetimes are generally different, and
> you end up cloning stuff anyway.  So the own == 0 case is really only
> useful for referencing static data, and is not terribly helpful.

OK. I haven't done this yet, but I'm going to simply flip all the 'own'
values to 1, since I'm already allocating on the heap. I suspect this
won't work well since objects are reused in ways that they don't know,
so they can easily deallocate themselves while something is using them.
I'll look to see if this is an issue and go from there.

> I'm wondering about this idiom, that you use in a couple places:
>
> +	case DW_TAG_pointer_type:
> [...]
> +		*info = calloc( 1, sizeof(struct arg_type_info) );
> [...]
> +		type_init_pointer(*info, NULL, 0);
> +
> +		complain(type_die, "Storing pointer type: %p", *info);
> +		dict_insert( &type_hash, &die_offset, info );
> +		return getType( &(*info)->u.ptr_info.info, &next_die );
>
> What you end up inserting is an incomplete type.  If getType later
> fails, it will never properly finish initializing info.  Later on, a
> DW_TAG_const_type referencing that type will pick it up, incomplete
> (presumably) and return a legitimately-looking arg_type_info that will
> kill ltrace.  I would prefer it written this way:
>
> +               struct arg_type_info *ptr;
> +               if (getType(&ptr, &next_die)) {
> +                       type_init_pointer(*info, ptr, 1);
> +       		complain(type_die, "Storing pointer type: %p", *info);
> +               	dict_insert( &type_hash, &die_offset, info );
> +                       return true;
> +               } else {
> +                       free(*info);
> +                       return false;
> +               }

This was done intentionally to support recursive types. Let's say you
have struct S { struct S* s; };. The code

- sees the struct definition,
- then looks at the contents, seeing a pointer
- looks at the pointee, sees the struct S
- It then finds the struct S type in the dict (since it was added BEFORE
  getType was called), and stops the recursion.

I THINK (correct me if I'm wrong) that if implemented the way you
suggest, the getType() call would never complete. The code as it stands
is a bit of a proof of concept, and various error chacking and cleanup
pieces such as this are missing. I will add them.

> Later in process_die_compileunit:
>
> +			const char* function_name = dwarf_diename(&die);
>
> This could concievably return NULL for DW_AT_name-less DIE.

As part of another change I'm no longer reading functions that have no
name. Makes sense?

> +static bool import( struct protolib* plib, struct library* lib, Dwfl* dwfl )
> +{
> +	dict_init(&type_hash, sizeof(Dwarf_Off), sizeof(struct arg_type_info*),
> +			  dwarf_die_hash, dwarf_die_eq, NULL );
>
> type_hash should be a local variable in import and the pointer should be
> passed to other functions.  Is there a reason for making it global?

Only that it makes the code simpler. It's a static global, so its scope
is limited to dwarf_prototypes.c. In my opinion this is an acceptable
tradeoff. If you really want it to be local to some function, and pass
the pointer around instead, tell me and I'll do it that way.

>
> +	if( dwfl != NULL &&
> +		( filter_matches_library(options.plt_filter,	lib ) ||
> +		  filter_matches_library(options.static_filter,	lib ) ||
> +		  filter_matches_library(options.export_filter,	lib ) ) )
> +	{
> +		import_DWARF_prototypes( lib->protolib, lib, dwfl );
>
> I'm not sure where to place this best.  It seems it should be together
> with the rest of this logic in library_get_prototype in output.c.  The
> Dwarf related to this module could be stored at struct library
> (contingent on availability of libdw), where library_get_prototype will
> get it.
>
> The integration with conf-based protolibs is kinda tricky.  What we
> currently do for somelib.so.1.2.3 is look for somelib.so.1.2.3.conf,
> then somelib.so.1.2.conf, then somelib.so.1.conf, and then
> somelib.so.conf.  I think it's only there that we should fall back on
> dwarf.  So in pseudocode:
>
> @@ -207,6 +207,10 @@ library_get_prototype(struct library *lib, const char *name)
>  			 && lib->type == LT_LIBTYPE_DSO
>  			 && snip_period(buf));
>  
> +		if (lib->protolib == NULL && lib->dw != NULL) {
> +			lib->protolib = import from Dwarf, cache, etc.
> +		}
> +
>  		if (lib->protolib == NULL)
>  			lib->protolib = protolib_cache_default(&g_protocache,
>  							       buf, 0);

OK. I did this, but I'm not certain it's right, so please check. In
library_get_prototype() I now do the import if needed. The
import_DWARF_prototypes() function itself calls protolib_cache_default()
to init the protolib, then reads the DWARF data on top of that. Is this
right? I DO want to read the config files first to have pieces take
precedence over the DWARF, so overall this should be good, but the
details are likely wrong.

> -e traces PLT calls (i.e. inter-library calls), -x traces symbol entry
> points.  -l traces PLT calls done to a symbol defined by a library in
> -l.

Can we add this to the manpage? I don't feel qualified to do this myself
yet. Relatedly, say I have the following 1-file program:

 int f(double x) { return (int)x; }
 int main(void)
 {
     return f(5.0);
 }

Can ltrace be used to see the f() call? I can't figure out the right
-l/-e/-x to make that work. This also related to static functions. If we
can't instrument this, than we probably can't do static functions
either.

OK. My near-term todo list:

- Try to get prototypes from both DWARF and conf files.

- Go through the code to make sure error checking, memory allocation is
  handled properly

dima