[Pkg-openmpi-maintainers] Bug#584699: Bug#584699: programs freeze on first MPI op. when run on multihomed IPv6 hosts
Ivan Shmakov
ivan at main.uusia.org
Mon Dec 20 18:27:48 UTC 2010
>>>>> Manuel Prinz <manuel at debian.org> writes:
> thanks for the report! I also took this upstream, but unfortunately
> neither upstream nor I can reproduce the bug since we do not have
> multi-homed IPv6 hosts available for testing.
“Fortunately,” it appears that you don't need one, as the
problem apparently arises on multi-IPv4-homed hosts as well.
Trying to work-around the problem, I've tried both the
--mca oob_tcp_disable_family 6 \
--mca btl_tcp_disable_family 6 \
options' combination, and building the package without the IPv6
support:
--- openmpi-1.4.2/debian/rules
+++ openmpi-1.4.2/debian/rules
@@ -57,6 +57,7 @@
--includedir=\$${prefix}/lib/openmpi/include \
--with-devel-headers \
--enable-heterogeneous \
+ --disable-ipv6 \
$(TORQUE)
# Thread support disabled because it's broken, see bug #435581
To my surprise, it didn't help!
Then, however, I observed that the system is IPv4-multihomed
just as well:
$ ip -4
…
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
inet 192.168.57.XX/24 scope global eth0
inet 192.168.57.ZZ/24 scope global eth0
…
$
As soon as I have removed one of the addresses (with
# ip addr del), the problem was gone. (As long as IPv6 is
turned off, — I cannot drop the extra IPv6 addresses on that
host without running into issues.)
To reproduce the problem, one can try, e. g. (assuming A.B.C.D
is an unused address in the network, MASK is the netmask, and
ethN is the network interface):
root# ip addr add A.B.C.D/MASK dev ethN
root#
$ mkdir -- test
$ cd test/
$ cp -- /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt
$ rm -f -- hpccoutf.txt
$ mpirun.openmpi \
--mca btl_base_verbose 30 \
--mca oob_tcp_debug 1 \
--mca oob_tcp_disable_family 6 \
--mca btl_tcp_disable_family 6 \
hpcc \
< /dev/null
While normally this would create ‘hpccoutf.txt’ almost
immediately, the problem being discussed will make ‘hpcc’ stuck
before it'll try to open (create) the file.
Removing the extra IP addresses should eliminate the problem.
[…]
--
Long Happy Life.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-openmpi-maintainers/attachments/20101221/8f34c167/attachment-0001.pgp>
More information about the Pkg-openmpi-maintainers
mailing list