[Pkg-openmpi-maintainers] Bug#598553: Bug#598553: r-cran-rmpi: slave processes eat CPU when they have nothing to do
Jeff Squyres
jsquyres at cisco.com
Mon Oct 11 19:26:59 UTC 2010
Sorry for the delay in answering. I'll try to address all points:
1. Yes, the busy-poll design is intentional in Open MPI. :-(
1a. Yes, it probably does cause some performance degradation when used with TCP.
1b. It quite definitely is a (major) performance win for non-TCP networks. That's (unfortunately) why it's there -- you can't poll/select/epoll/whatever for these non-TCP kinds of networks (E.g., openfabrics networks) without killing performance. So you have to busy poll those networks with their native poll functions and then periodically select/poll/epoll/whatever all file descriptors. This unfortunately became a central architecture point for Open MPI's progression engine (because it's in the performance-critical code path).
2. The behavior you're seeing with yield_when_idle is also intentional. We're busy polling but we're yielding so that we play well with others. It does not in any way reduce the CPU utilization; it just make Open MPI share the CPU better. But it got somewhat weakened when sched_yield() lost its meaning in recent kernels.
3. We do know how to make our progression engine switch between blocking and busy-polling (i.e., we've had many discussions about it over the years -- shared memory message passing is the Big Problem). But no one has ever had the time / resources / motivation to implement it. If anyone has some time, I would love to explain what would need to be done (it's not rocket science, but it is a bit tricky and will require getting into some minutia in the guts of Open MPI :-\ ).
Does that help at least explain why the code is the way it is?
On Oct 2, 2010, at 6:30 PM, Manuel Prinz wrote:
> On Sat, Oct 02, 2010 at 01:37:42PM -0700, Zack Weinberg wrote:
>> I wrote a test MPI program that just calls MPI_Probe() once - this
>> should block forever, since there are no sends happening. When run
>> with
>>
>> $ mpirun -np 2 ./a.out
>>
>> MPI_Probe never returns and the processes spin through poll(), which
>> is what I originally reported. So far so good. If I change the
>> invocation to
>>
>> $ mpirun -np 2 --mca mpi_yield_when_idle 1 ./a.out
>>
>> the behavior is the same, except that the processes alternate between
>> poll() and sched_yield(). This doesn't help anything; the scheduler
>> is still being thrashed, and the CPU is not allowed to go idle. [In
>> fact, my understanding of the Linux scheduler is that a zero-timeout
>> poll() counts as a yield, so "Aggressive" mode isn't even doing
>> anything constructive!]
>>
>> The desired behavior is for an idle cluster's processes to BLOCK in
>> poll(). So mpi_yield_when_idle does not do what I want.
>>
>> Also, putting "mpi_yield_when_idle = 1" into
>> ~/.openmpi/mca-params.conf has no effect, contra the documentation --
>> this perhaps ought to be its own bug. (I can set MCA parameters for R
>> with environment variables, but that's not nearly as convenient as the
>> host file.)
>
> I'm out of ideas here. Jeff, could you please comment on the issue?
> You can find the full log here:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=598553
>
> Thanks in advance!
>
> Best regards,
> Manuel
--
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
More information about the Pkg-openmpi-maintainers
mailing list