[Pkg-openmpi-maintainers] Bug#592326: Bug#592326: Failure of AZTEC test case run.

Rachel Gordon rgordon at techunix.technion.ac.il
Thu Sep 2 12:06:31 UTC 2010


Dear Jeff,

The cluster has only the openmpi version of MPI and the mpi.h file is 
installed in /shared/include/mpi.h

Anyhow, I omitted the COMM size parameter and recompiled/linked the case 
using:

mpif77 -O   -I../lib  -c -o az_tutorial_with_MPI.o az_tutorial_with_MPI.f
mpif77 az_tutorial_with_MPI.o -O -L../lib -laztec      -o sample

But when I try running 'sample' I get the same:

[cluster:00377] *** Process received signal ***
[cluster:00377] Signal: Segmentation fault (11)
[cluster:00377] Signal code: Address not mapped (1)
[cluster:00377] Failing at address: 0x100000098
[cluster:00377] [ 0] /lib/libpthread.so.0 [0x7f6b55040a80]
[cluster:00377] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e) 
[0x7f6b564d834e]
[cluster:00377] [ 2] sample(parallel_info+0x24) [0x41d2ba]
[cluster:00377] [ 3] sample(AZ_set_proc_config+0x2d) [0x408417]
[cluster:00377] [ 4] sample(az_set_proc_config_+0xc) [0x407b85]
[cluster:00377] [ 5] sample(MAIN__+0x54) [0x407662]
[cluster:00377] [ 6] sample(main+0x2c) [0x44e8ec]
[cluster:00377] [ 7] /lib/libc.so.6(__libc_start_main+0xe6) 
[0x7f6b54cfd1a6]
[cluster:00377] [ 8] sample [0x407459]
[cluster:00377] *** End of error message ***
--------------------------------------------------------------------------

Rachel



On Thu, 2 Sep 2010, Jeff Squyres (jsquyres) wrote:

> If you're segv'ing in comm size, this usually means you are using the wrong mpi.h.  Ensure you are using ompi's mpi.h so that you get the right values for all the MPI constants.
>
> Sent from my PDA. No type good.
>
> On Sep 2, 2010, at 7:35 AM, Rachel Gordon <rgordon at techunix.technion.ac.il> wrote:
>
>> Dear Manuel,
>>
>> Sorry, it didn't help.
>>
>> The cluster I am trying to run on has only the openmpi MPI version. So, mpif77 is equivalent to mpif77.openmpi and mpicc is equivalent to mpicc.openmpi
>>
>> I changed the Makefile, replacing gfortran by mpif77 and gcc by mpicc.
>> The compilation and linkage stage ran with no problem:
>>
>>
>> mpif77 -O   -I../lib -DMAX_MEM_SIZE=16731136 -DCOMM_BUFF_SIZE=200000 -DMAX_CHUNK_SIZE=200000  -c -o az_tutorial_with_MPI.o az_tutorial_with_MPI.f
>> mpif77 az_tutorial_with_MPI.o -O -L../lib -laztec      -o sample
>>
>>
>> But again when I try to run 'sample' I get:
>>
>> mpirun -np 1 sample
>>
>>
>> [cluster:24989] *** Process received signal ***
>> [cluster:24989] Signal: Segmentation fault (11)
>> [cluster:24989] Signal code: Address not mapped (1)
>> [cluster:24989] Failing at address: 0x100000098
>> [cluster:24989] [ 0] /lib/libpthread.so.0 [0x7f5058036a80]
>> [cluster:24989] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e) [0x7f50594ce34e]
>> [cluster:24989] [ 2] sample(parallel_info+0x24) [0x41d2ba]
>> [cluster:24989] [ 3] sample(AZ_set_proc_config+0x2d) [0x408417]
>> [cluster:24989] [ 4] sample(az_set_proc_config_+0xc) [0x407b85]
>> [cluster:24989] [ 5] sample(MAIN__+0x54) [0x407662]
>> [cluster:24989] [ 6] sample(main+0x2c) [0x44e8ec]
>> [cluster:24989] [ 7] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f5057cf31a6]
>> [cluster:24989] [ 8] sample [0x407459]
>> [cluster:24989] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 24989 on node cluster exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>> Thanks for your help and cooperation,
>> Sincerely,
>> Rachel
>>
>>
>>
>> On Wed, 1 Sep 2010, Manuel Prinz wrote:
>>
>>> Hi Rachel,
>>>
>>> I'm not very familiar with Fortran, so I'm most likely of not too much
>>> help here. I added Jeff to CC, maybe he can shed some lights into this.
>>>
>>> Am Montag, den 09.08.2010, 12:59 +0300 schrieb Rachel Gordon:
>>>> package:  openmpi
>>>>
>>>> dpkg --search openmpi
>>>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi/copyright
>>>> gromacs-dev: /usr/lib/libmd_mpi_openmpi.la
>>>> gromacs-dev: /usr/lib/libgmx_mpi_d_openmpi.la
>>>> gromacs-openmpi: /usr/share/lintian/overrides/gromacs-openmpi
>>>> gromacs-openmpi: /usr/lib/libmd_mpi_openmpi.so.5
>>>> gromacs-openmpi: /usr/lib/libmd_mpi_d_openmpi.so.5.0.0
>>>> gromacs-dev: /usr/lib/libmd_mpi_openmpi.so
>>>> gromacs-dev: /usr/lib/libgmx_mpi_d_openmpi.so
>>>> gromacs-openmpi: /usr/lib/libmd_mpi_openmpi.so.5.0.0
>>>> gromacs-openmpi: /usr/bin/mdrun_mpi_d.openmpi
>>>> gromacs-openmpi: /usr/lib/libgmx_mpi_d_openmpi.so.5.0.0
>>>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi/README.Debian
>>>> gromacs-dev: /usr/lib/libgmx_mpi_d_openmpi.a
>>>> gromacs-openmpi: /usr/bin/mdrun_mpi.openmpi
>>>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi/changelog.Debian.gz
>>>> gromacs-dev: /usr/lib/libmd_mpi_d_openmpi.la
>>>> gromacs-openmpi: /usr/share/man/man1/mdrun_mpi_d.openmpi.1.gz
>>>> gromacs-dev: /usr/lib/libgmx_mpi_openmpi.a
>>>> gromacs-openmpi: /usr/lib/libgmx_mpi_openmpi.so.5.0.0
>>>> gromacs-dev: /usr/lib/libmd_mpi_d_openmpi.so
>>>> gromacs-openmpi: /usr/lib/libmd_mpi_d_openmpi.so.5
>>>> gromacs-dev: /usr/lib/libgmx_mpi_openmpi.la
>>>> gromacs-openmpi: /usr/share/man/man1/mdrun_mpi.openmpi.1.gz
>>>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi
>>>> gromacs-dev: /usr/lib/libmd_mpi_openmpi.a
>>>> gromacs-dev: /usr/lib/libgmx_mpi_openmpi.so
>>>> gromacs-openmpi: /usr/lib/libgmx_mpi_openmpi.so.5
>>>> gromacs-openmpi: /usr/lib/libgmx_mpi_d_openmpi.so.5
>>>> gromacs-dev: /usr/lib/libmd_mpi_d_openmpi.a
>>>>
>>>>
>>>> Dear support,
>>>> I am trying to run a test case of AZTEC library named
>>>> az_tutorial_with_MPI.f . The example uses gfortran + MPI. The
>>>> compilation and linkage stage goes O.K., generating an executable
>>>> 'sample'. But when I try to run sample (on 1 or more
>>>> processors) the run crushes immediately.
>>>>
>>>> The compilation and linkage stage is done as follows:
>>>>
>>>> gfortran -O  -I/shared/include -I/shared/include/openmpi/ompi/mpi/cxx
>>>> -I../lib -DMAX_MEM_SIZE=16731136
>>>> -DCOMM_BUFF_SIZE=200000 -DMAX_CHUNK_SIZE=200000  -c -o
>>>> az_tutorial_with_MPI.o az_tutorial_with_MPI.f
>>>> gfortran az_tutorial_with_MPI.o -O -L../lib -laztec  -lm -L/shared/lib
>>>> -lgfortran -lmpi -lmpi_f77 -o sample
>>>
>>> Generally, when compiling programs for use with MPI, you should use the
>>> compiler wrappers which do all the magic. In Debian's case this is
>>> mpif77.openmpi and mpi90.openmpi, respectively. Could you give that a
>>> try?
>>>
>>>> The run:
>>>> /shared/home/gordon/Aztec_lib.dir/app>mpirun -np 1 sample
>>>>
>>>> [cluster:12046] *** Process received signal ***
>>>> [cluster:12046] Signal: Segmentation fault (11)
>>>> [cluster:12046] Signal code: Address not mapped (1)
>>>> [cluster:12046] Failing at address: 0x100000098
>>>> [cluster:12046] [ 0] /lib/libc.so.6 [0x7fd4a2fa8f60]
>>>> [cluster:12046] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e)
>>>> [0x7fd4a376c34e]
>>>> [cluster:12046] [ 2] sample [0x4178aa]
>>>> [cluster:12046] [ 3] sample [0x402a07]
>>>> [cluster:12046] [ 4] sample [0x402175]
>>>> [cluster:12046] [ 5] sample [0x401c52]
>>>> [cluster:12046] [ 6] sample [0x448edc]
>>>> [cluster:12046] [ 7] /lib/libc.so.6(__libc_start_main+0xe6)
>>>> [0x7fd4a2f951a6]
>>>> [cluster:12046] [ 8] sample [0x401a49]
>>>> [cluster:12046] *** End of error message ***
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that process rank 0 with PID 12046 on node cluster exited
>>>> on signal 11 (Segmentation fault).
>>>>
>>>> Here is some information about the machine:
>>>>
>>>> uname -a
>>>> Linux cluster 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64
>>>> GNU/Linux
>>>>
>>>>
>>>> lsb_release -a
>>>> No LSB modules are available.
>>>> Distributor ID: Debian
>>>> Description:    Debian GNU/Linux 5.0.5 (lenny)
>>>> Release:        5.0.5
>>>> Codename:       lenny
>>>>
>>>> gcc --version
>>>> gcc (Debian 4.3.2-1.1) 4.3.2
>>>>
>>>> gfortran --version
>>>> GNU Fortran (Debian 4.3.2-1.1) 4.3.2
>>>>
>>>> ldd sample
>>>>         linux-vdso.so.1 =>  (0x00007fffffffe000)
>>>>         libgfortran.so.3 => /usr/lib/libgfortran.so.3 (0x00007fd29db16000)
>>>>         libm.so.6 => /lib/libm.so.6 (0x00007fd29d893000)
>>>>         libmpi.so.0 => /shared/lib/libmpi.so.0 (0x00007fd29d5e7000)
>>>>         libmpi_f77.so.0 => /shared/lib/libmpi_f77.so.0
>>>> (0x00007fd29d3af000)
>>>>         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fd29d198000)
>>>>         libc.so.6 => /lib/libc.so.6 (0x00007fd29ce45000)
>>>>         libopen-rte.so.0 => /shared/lib/libopen-rte.so.0
>>>> (0x00007fd29cbf8000)
>>>>         libopen-pal.so.0 => /shared/lib/libopen-pal.so.0
>>>> (0x00007fd29c9a2000)
>>>>         libdl.so.2 => /lib/libdl.so.2 (0x00007fd29c79e000)
>>>>         libnsl.so.1 => /lib/libnsl.so.1 (0x00007fd29c586000)
>>>>         libutil.so.1 => /lib/libutil.so.1 (0x00007fd29c383000)
>>>>         libpthread.so.0 => /lib/libpthread.so.0 (0x00007fd29c167000)
>>>>         /lib64/ld-linux-x86-64.so.2 (0x00007fd29ddf1000)
>>>>
>>>>
>>>> Let me just mention that the C+MPI test case of the AZTEC library
>>>> 'az_tutorial.c' runs with no problem.
>>>> Also, az_tutorial_with_MPI.f runs O.K. on my 32bit LINUX cluster running
>>>> gcc,g77 and MPICH, and on my SGI 16 processors
>>>> Ithanium 64 bit machine.
>>>
>>> The IA64 architecture is supported by Open MPI, so this should be OK.
>>>
>>>> Thank you for your help,
>>>
>>> Best regards,
>>> Manuel
>>>
>>>
>>>
>





More information about the Pkg-openmpi-maintainers mailing list