[Ltrace-devel] Tracing multi-threaded processes
Joe Damato
ice799 at gmail.com
Tue Jun 21 16:49:32 UTC 2011
On Tue, Jun 21, 2011 at 12:10 PM, <pmachata at redhat.com> wrote:
> Hi there,
>
> there is some support for multi-threaded processes in ltrace, but so far
> it was incomplete. Everything works if the threads stay away of each
> other, but as soon as they end up in the same area of code, it all
> breaks.
>
> The problem is due to return breakpoints. When two threads take the
> same function call, ltrace places two breakpoints over each other,
> because it has no concept of shared address space. There are many
> problems with this, and ltrace ends up seeing unexpected breakpoints,
> and SIGSEGVing the process.
>
> The way to solve this, ltrace must first learn that there is any such
> thing as task and thread group. Then it needs to store all the
> breakpoints in the structure shared by all the tasks in the thread
> group. To prevent races, before any breakpoint is temporarily disabled
> (for re-enablement, namely continue_after_breakpoint), all tasks in the
> thread group must be stopped.
>
> There is a code on the branch pmachata/threads that implements this.
> Here's what the branch roughly does:
>
> - Process * leader; was added to struct Process. This points to a
> process that is a thread group leader of a thread group that this
> process is a member of.
>
> - proper interfaces were added for handling the set of processes and
> their tasks (add_process, remove_process, each_process, each_task).
> The iteration interfaces (each_*) use call-backs to do the real work.
>
> - interfaces were added for accessing the information about the
> processes (process_leader, process_tasks, process_stopped,
> process_status).
>
> - a new interface task_kill is a wrapper for the SYS_tkill system call
> that is not wrapped by glibc. We use this to stop or continue a
> single task.
>
> - when we need to stop tasks for breakpoint re-enablement, we send
> SIGSTOP. This SIGSTOP has to be caught and sunk. While we wait for
> the signal to be delivered, we pump all incoming events to an event
> queue that was created for this purpose (each_qd_event, enque_event).
> The interface next_event takes events from the queue if there are
> any.
>
> - all this, the event interception, sinking of SIGSTOP etc., is very
> platform specific. So thread group now can have a registered event
> handler (install_event_handler, destroy_event_handler). If present,
> this is called at the beginning of handle_event. The registered
> handler can do whatever it wishes with the event in question, and
> return either NULL (if the event was handled or sunk) or the original
> (possibly modified) event that is then handled by the default handler
> as usual.
>
> - there have also been some small cleanups.
>
> For some reason, attaching to running multi-threaded task doesn't work
> (this was one of the first things that I fixed, but apparently it got
> broken in the meantime), so that's what I'll be doing next.
>
> Then comes cleaning it all up and making the git history of my branch a
> bit less messy, at which point I'd ask some of you to review the (rather
> large) patch. I also need to verify that it works on non-x86
> architectures, so far I was only working with x86_64. I'll keep you
> posted as my work progresses.
Sounds great, I look forward to taking a look at the code when it is ready.
More information about the Ltrace-devel
mailing list