[Pkg-openmpi-maintainers] Bug#598553: r-cran-rmpi: slave processes eat CPU when they have nothing to do

Manuel Prinz manuel at debian.org
Sat Oct 2 13:01:01 UTC 2010


> On 29 September 2010 at 18:22, Zack Weinberg wrote:
> | (on an 8-core machine), CPU utilization jumps *immediately* from 98% idle
> | to 20% user, 70% system, 12% idle.  strace reveals that each slave is
> | spinning through poll() calls with timeout zero, rather than blocking
> | until a message arrives, as the documentation for mpi.probe() suggests
> | should happen.
> | 
> | I suppose this might be a problem in libopenmpi instead of the R binding,
> | I haven't tried to reproduce it with anything lower-level.
 
On Wed, Sep 29, 2010 at 09:28:06PM -0500, Dirk Eddelbuettel wrote:
> Very much so. It is "permanent polling" in Open MPI that does that --- and
> Rmpi can do little about it.  So I think after some discussion we may want to
> reassign or close this.
> 
> Manuel, any idea if that happened?  Wasn't Open MPI 1.4 supposed to take care
> of this?  Is there a new option?

Well, no. Actually, this behavior is by design. I'm not sure about the details
exactly but can get back to Jeff if you're interested in those. This is coming
up every now and then in the BTS or the user list. Open MPI is basically burning
every free cycle that is not used for computation (busy wait). There are no
immediate plans of changing that, as far as I know. If you're program is running
correctly but your load is high, that's not bug. If Open MPI eats up cycles that
you need for computation, that's a bug in Open MPI. If you need MPI for a program
that just idles, that's clearly a bug in your application. It's HPC after all,
isn't it?! ;)

Hope I could shed some light into this!

Best regards,
Manuel




More information about the Pkg-openmpi-maintainers mailing list