[Pkg-openmpi-maintainers] Bug#592326: Bug#592326: Failure of AZTEC test case run.

Rachel Gordon rgordon at techunix.technion.ac.il
Thu Sep 2 11:35:37 UTC 2010


Dear Manuel,

Sorry, it didn't help.

The cluster I am trying to run on has only the openmpi MPI version. So, 
mpif77 is equivalent to mpif77.openmpi and mpicc is equivalent to 
mpicc.openmpi

I changed the Makefile, replacing gfortran by mpif77 and gcc by mpicc.
The compilation and linkage stage ran with no problem:


mpif77 -O   -I../lib -DMAX_MEM_SIZE=16731136 -DCOMM_BUFF_SIZE=200000 
-DMAX_CHUNK_SIZE=200000  -c -o az_tutorial_with_MPI.o 
az_tutorial_with_MPI.f
mpif77 az_tutorial_with_MPI.o -O -L../lib -laztec      -o sample


But again when I try to run 'sample' I get:

mpirun -np 1 sample


[cluster:24989] *** Process received signal ***
[cluster:24989] Signal: Segmentation fault (11)
[cluster:24989] Signal code: Address not mapped (1)
[cluster:24989] Failing at address: 0x100000098
[cluster:24989] [ 0] /lib/libpthread.so.0 [0x7f5058036a80]
[cluster:24989] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e) 
[0x7f50594ce34e]
[cluster:24989] [ 2] sample(parallel_info+0x24) [0x41d2ba]
[cluster:24989] [ 3] sample(AZ_set_proc_config+0x2d) [0x408417]
[cluster:24989] [ 4] sample(az_set_proc_config_+0xc) [0x407b85]
[cluster:24989] [ 5] sample(MAIN__+0x54) [0x407662]
[cluster:24989] [ 6] sample(main+0x2c) [0x44e8ec]
[cluster:24989] [ 7] /lib/libc.so.6(__libc_start_main+0xe6) 
[0x7f5057cf31a6]
[cluster:24989] [ 8] sample [0x407459]
[cluster:24989] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 24989 on node cluster exited 
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Thanks for your help and cooperation,
Sincerely,
Rachel



On Wed, 1 Sep 2010, Manuel Prinz wrote:

> Hi Rachel,
>
> I'm not very familiar with Fortran, so I'm most likely of not too much
> help here. I added Jeff to CC, maybe he can shed some lights into this.
>
> Am Montag, den 09.08.2010, 12:59 +0300 schrieb Rachel Gordon:
>> package:  openmpi
>>
>> dpkg --search openmpi
>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi/copyright
>> gromacs-dev: /usr/lib/libmd_mpi_openmpi.la
>> gromacs-dev: /usr/lib/libgmx_mpi_d_openmpi.la
>> gromacs-openmpi: /usr/share/lintian/overrides/gromacs-openmpi
>> gromacs-openmpi: /usr/lib/libmd_mpi_openmpi.so.5
>> gromacs-openmpi: /usr/lib/libmd_mpi_d_openmpi.so.5.0.0
>> gromacs-dev: /usr/lib/libmd_mpi_openmpi.so
>> gromacs-dev: /usr/lib/libgmx_mpi_d_openmpi.so
>> gromacs-openmpi: /usr/lib/libmd_mpi_openmpi.so.5.0.0
>> gromacs-openmpi: /usr/bin/mdrun_mpi_d.openmpi
>> gromacs-openmpi: /usr/lib/libgmx_mpi_d_openmpi.so.5.0.0
>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi/README.Debian
>> gromacs-dev: /usr/lib/libgmx_mpi_d_openmpi.a
>> gromacs-openmpi: /usr/bin/mdrun_mpi.openmpi
>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi/changelog.Debian.gz
>> gromacs-dev: /usr/lib/libmd_mpi_d_openmpi.la
>> gromacs-openmpi: /usr/share/man/man1/mdrun_mpi_d.openmpi.1.gz
>> gromacs-dev: /usr/lib/libgmx_mpi_openmpi.a
>> gromacs-openmpi: /usr/lib/libgmx_mpi_openmpi.so.5.0.0
>> gromacs-dev: /usr/lib/libmd_mpi_d_openmpi.so
>> gromacs-openmpi: /usr/lib/libmd_mpi_d_openmpi.so.5
>> gromacs-dev: /usr/lib/libgmx_mpi_openmpi.la
>> gromacs-openmpi: /usr/share/man/man1/mdrun_mpi.openmpi.1.gz
>> gromacs-openmpi: /usr/share/doc/gromacs-openmpi
>> gromacs-dev: /usr/lib/libmd_mpi_openmpi.a
>> gromacs-dev: /usr/lib/libgmx_mpi_openmpi.so
>> gromacs-openmpi: /usr/lib/libgmx_mpi_openmpi.so.5
>> gromacs-openmpi: /usr/lib/libgmx_mpi_d_openmpi.so.5
>> gromacs-dev: /usr/lib/libmd_mpi_d_openmpi.a
>>
>>
>> Dear support,
>> I am trying to run a test case of AZTEC library named
>> az_tutorial_with_MPI.f . The example uses gfortran + MPI. The
>> compilation and linkage stage goes O.K., generating an executable
>> 'sample'. But when I try to run sample (on 1 or more
>> processors) the run crushes immediately.
>>
>> The compilation and linkage stage is done as follows:
>>
>> gfortran -O  -I/shared/include -I/shared/include/openmpi/ompi/mpi/cxx
>> -I../lib -DMAX_MEM_SIZE=16731136
>> -DCOMM_BUFF_SIZE=200000 -DMAX_CHUNK_SIZE=200000  -c -o
>> az_tutorial_with_MPI.o az_tutorial_with_MPI.f
>> gfortran az_tutorial_with_MPI.o -O -L../lib -laztec  -lm -L/shared/lib
>> -lgfortran -lmpi -lmpi_f77 -o sample
>
> Generally, when compiling programs for use with MPI, you should use the
> compiler wrappers which do all the magic. In Debian's case this is
> mpif77.openmpi and mpi90.openmpi, respectively. Could you give that a
> try?
>
>> The run:
>> /shared/home/gordon/Aztec_lib.dir/app>mpirun -np 1 sample
>>
>> [cluster:12046] *** Process received signal ***
>> [cluster:12046] Signal: Segmentation fault (11)
>> [cluster:12046] Signal code: Address not mapped (1)
>> [cluster:12046] Failing at address: 0x100000098
>> [cluster:12046] [ 0] /lib/libc.so.6 [0x7fd4a2fa8f60]
>> [cluster:12046] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e)
>> [0x7fd4a376c34e]
>> [cluster:12046] [ 2] sample [0x4178aa]
>> [cluster:12046] [ 3] sample [0x402a07]
>> [cluster:12046] [ 4] sample [0x402175]
>> [cluster:12046] [ 5] sample [0x401c52]
>> [cluster:12046] [ 6] sample [0x448edc]
>> [cluster:12046] [ 7] /lib/libc.so.6(__libc_start_main+0xe6)
>> [0x7fd4a2f951a6]
>> [cluster:12046] [ 8] sample [0x401a49]
>> [cluster:12046] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 12046 on node cluster exited
>> on signal 11 (Segmentation fault).
>>
>> Here is some information about the machine:
>>
>> uname -a
>> Linux cluster 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64
>> GNU/Linux
>>
>>
>> lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Debian
>> Description:    Debian GNU/Linux 5.0.5 (lenny)
>> Release:        5.0.5
>> Codename:       lenny
>>
>> gcc --version
>> gcc (Debian 4.3.2-1.1) 4.3.2
>>
>> gfortran --version
>> GNU Fortran (Debian 4.3.2-1.1) 4.3.2
>>
>> ldd sample
>>          linux-vdso.so.1 =>  (0x00007fffffffe000)
>>          libgfortran.so.3 => /usr/lib/libgfortran.so.3 (0x00007fd29db16000)
>>          libm.so.6 => /lib/libm.so.6 (0x00007fd29d893000)
>>          libmpi.so.0 => /shared/lib/libmpi.so.0 (0x00007fd29d5e7000)
>>          libmpi_f77.so.0 => /shared/lib/libmpi_f77.so.0
>> (0x00007fd29d3af000)
>>          libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fd29d198000)
>>          libc.so.6 => /lib/libc.so.6 (0x00007fd29ce45000)
>>          libopen-rte.so.0 => /shared/lib/libopen-rte.so.0
>> (0x00007fd29cbf8000)
>>          libopen-pal.so.0 => /shared/lib/libopen-pal.so.0
>> (0x00007fd29c9a2000)
>>          libdl.so.2 => /lib/libdl.so.2 (0x00007fd29c79e000)
>>          libnsl.so.1 => /lib/libnsl.so.1 (0x00007fd29c586000)
>>          libutil.so.1 => /lib/libutil.so.1 (0x00007fd29c383000)
>>          libpthread.so.0 => /lib/libpthread.so.0 (0x00007fd29c167000)
>>          /lib64/ld-linux-x86-64.so.2 (0x00007fd29ddf1000)
>>
>>
>> Let me just mention that the C+MPI test case of the AZTEC library
>> 'az_tutorial.c' runs with no problem.
>> Also, az_tutorial_with_MPI.f runs O.K. on my 32bit LINUX cluster running
>> gcc,g77 and MPICH, and on my SGI 16 processors
>> Ithanium 64 bit machine.
>
> The IA64 architecture is supported by Open MPI, so this should be OK.
>
>> Thank you for your help,
>
> Best regards,
> Manuel
>
>
>






More information about the Pkg-openmpi-maintainers mailing list