[Pkg-openmpi-maintainers] Too-generic SONAME for OpenMPI?

Nicholas Breen nbreen at ofb.net
Thu Jan 10 08:26:46 UTC 2008


I've been trying to track down one last linking error that crops up when
using gfortran and the various MPIs, and I believe I may have found a
root cause of the problem in #456721 and friends.  These are the SONAMES
of the three libraries that each MPI package installs as an alternative
for /usr/lib/libmpi.so:

% for a in /usr/lib/liblam.so /usr/lib/mpich/lib/shared/libmpich.so \
/usr/lib/openmpi/lib/libmpi.so.0 ; do objdump -p $a|grep SONAME ; done
  SONAME      liblam.so.4
  SONAME      libmpich.so.1.0
  SONAME      libmpi.so.0

Both LAM and MPICH use distinguishable SONAMES, so a binary being
compiled can link directly to them without worrying about alternatives.
However, OpenMPI's SONAME means that (as I understand it) the linker
resolves to the first matching linkage file in the search path, which
will be /usr/lib/libmpi.so.  But, since this filename is controlled by
update-alternatives, it can be from a non-OpenMPI implementation!

% readlink -f /usr/lib/libmpi.so ; readlink -f /usr/lib/libmpi.so.0
/usr/lib/liblam.so.4.0
/usr/lib/openmpi/lib/libmpi.so.0.0.0

This manifests itself as an ld failure when compiling a program, since
of course liblam doesn't have the OpenMPI symbols.  Specifically, I'm
getting a reoccurence of #451991 if GROMACS is compiled with FORTRAN
code enabled (by default that's only used on alpha, but it's
reproducible on other arches).  A portion of the build log:

------
/usr/bin/mpicc.openmpi -O3 -fomit-frame-pointer -finline-functions -Wall
-Wno-unused -malign-double -funroll-all-loops -o .lib
s/mdrun glaasje.o gctio.o init_sh.o ionize.o do_gct.o relax_sh.o
repl_ex.o xutils.o compute_io.o md.o mdrun.o genalg.o  ../mdl
ib/.libs/libmd_mpi_d_openmpi.so ../gmxlib/.libs/libgmx_mpi_d_openmpi.so
-lnsl /usr/lib/libfftw3.so -L/usr/lib/gcc/i486-linux-g
nu/4.2.3 -L/usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib
-L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/i486-linux-gnu/4.2
.3/../../.. -lgfortranbegin -lgfortran -lm
../gmxlib/.libs/libgmx_mpi_d_openmpi.so: undefined reference to
`ompi_request_null'
../mdlib/.libs/libmd_mpi_d_openmpi.so: undefined reference to
`ompi_mpi_int'
../mdlib/.libs/libmd_mpi_d_openmpi.so: undefined reference to
`ompi_mpi_double'
../mdlib/.libs/libmd_mpi_d_openmpi.so: undefined reference to
`ompi_mpi_comm_world'
../gmxlib/.libs/libgmx_mpi_d_openmpi.so: undefined reference to
`ompi_mpi_byte'
collect2: ld returned 1 exit status
make[2]: *** [mdrun] Error 1
------

If the links on libmpi.so are set up to allow the compilation to finish,
the resulting binary will work fine - none of the different SONAMEs
overlap, so there's no shared-library conflict at runtime.  It's only
the overlap on the .so file that's troublesome, since the linker is
given nothing more than a "-lmpi" flag to identify the appropriate
library.


It's worth mentioning that, as of the most recent OpenMPI upload,
gromacs-openmpi compiles error-free on every arch _but_ alpha, and I
haven't identified why only the addition of FORTRAN (so far) causes
problems.  (Untested theory: GROMACS without FORTRAN may only directly
call functions from the other OpenMPI libs?  Those are all linked
properly to libmpi.so.0.)  I need to work up a test case that doesn't
generate 500kB build logs!

My knowledge of the intricacies of library linkages is not at all
complete, and I could easily be misunderstanding something.  With that
disclaimer in hand... I think the solution is to change OpenMPI's SONAME
to something unique (libopenmpi.* ?), but that's a major change, and
would have to be an upstream decision or Debian would end up
binary-incompatible with the rest of the world.  Have the OpenMPI
developers seen this before?  (Am I completely off-base?)


Please Cc: me on replies, I'm not subscribed to the list.  Thanks.

-- 
Nicholas Breen
nbreen at ofb.net



More information about the Pkg-openmpi-maintainers mailing list