[Pkg-openmpi-maintainers] Bug#754524: Bad atomic ops on Alpha cause segfaults and mpi4py FTBFS

Michael Cree mcree at orcon.net.nz
Sat Jul 12 02:49:29 UTC 2014


Source: openmpi
Version: 1.6.5-8
Severity: important
Tags: patch
User: debian-alpha at lists.debian.org
Usertags: alpha
Justification: Causes FTBFS in other packages that built in the past.

The atomic operations defined for Alpha in openmpi can cause weird
behaviour including segfaults leading to failures to build packages
that build-depend on openmpi binary packages. This probably only
arises now with a binNMU of openmpi and compilation with a newer gcc
that has tighter requirements of asm constructs.

The current definition of opal_atomic_cmpset_32() in
opal/include/opal/sys/alpha/atomic.h has the following statement:

   __asm __volatile__ (
                       "1:  ldl_l %0, %1        \n\t"
                       "cmpeq %0, %2, %0        \n\t"
                       "beq %0, 2f              \n\t"
                       "mov %3, %0              \n\t"
                       "stl_c %0, %1            \n\t"
                       "beq %0, 1b              \n\t"
                       "jmp 3f                  \n"
                       "2:  mov $31, %0         \n"
                       "3:                      \n"
                       : "=&r" (ret), "+m" (*addr)
                       : "r" (oldval), "r" (newval)
                       : "memory");

however the "jmp 3f" instruction is an assembler macro and expands to
two CPU instructions that use CPU register t12 ($26) and the global
pointer ($29) to construct the address to label 3 and jump there. This
modification of t12 is not in the clobber list of the asm statement,
and the use of the global pointer is not listed in the asm inputs.

So, for example, in the function orte_plm_base_check_job_completed()
defined in the source file orte/mca/plm/base/plm_base_launch_support.c
a segfault results because the compiler sets up t12 as the pointer
to a struct, inserts the atomic asm code above, then accesses a field
in the structure via t12, which has been corrupted by the included
atomic asm.  This leads to the created executable /usr/bin/orted to
segfault in the test suite of mpi4py, hence the build failure of
mpi4py on Alpha [1].

Further segfaults occur in functions such as opal_atomic_add_32()
defined in the source file opal/include/opal/sys/atomic_impl.h
because the compiler determines that they are leaf functions that
do not make use of the global pointer so does not initialise the
global pointer, but the inserted asm code does in fact use the
global pointer, so, kaboom!

The asm code above is weird anyway.  The "mov $31, %0" statement,
which clears the output %0 to zero, is superfluous as %0 must already
be zero because the only entry to label 2 is from the "beq %0, 2f"
and that statement will only branch to label 2 if the output %0
is zero!  So I recommend the following more efficient version of
the asm code:

   __asm__ __volatile__ (
                       "1:  ldl_l %0, %1        \n\t"
                       "cmpeq %0, %2, %0        \n\t"
                       "beq %0, 2f              \n\t"
                       "mov %3, %0              \n\t"
                       "stl_c %0, %1            \n\t"
                       "beq %0, 1b              \n\t"
                       "2:                      \n"
                       : "=&r" (ret), "+m" (*addr)
                       : "r" (oldval), "r" (newval)
                       : "memory");

I attach a patch that fixes both opal_atomic_cmpset_32() and
opal_atomic_cmpset_64() on Alpha.  With that openmpi builds to
completion and with the fixed openmpi mpi4py also builds to
completion.

Cheers
Michael.

[1] http://buildd.debian-ports.org/status/package.php?p=mpi4py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-alpha-atomic-cmpset.patch
Type: text/x-diff
Size: 1419 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-openmpi-maintainers/attachments/20140712/ad930bd4/attachment-0001.patch>


More information about the Pkg-openmpi-maintainers mailing list