[Pkg-openmpi-maintainers] Bug#592326: Bug#592326: Failure of AZTEC test case run.
Rachel Gordon
rgordon at techunix.technion.ac.il
Fri Sep 3 07:05:17 UTC 2010
Dear Jeff, Ralf and Manuel
There are some good news,
I added -pthread to both the compilation and link for running
az_tutorial_with_MPI.f, and I also compiled aztec with -pthread
Now the code runs O.K for np=1,2.
Now bad news: when I try running with 3,4 or more processors I get a
similar error message:
mpirun -np 3 sample
[cluster:25805] *** Process received signal ***
[cluster:25805] Signal: Segmentation fault (11)
[cluster:25805] Signal code: (128)
[cluster:25805] Failing at address: (nil)
[cluster:25805] [ 0] /lib/libpthread.so.0 [0x7fbe20cb5a80]
[cluster:25805] [ 1] /shared/lib/libmpi.so.0 [0x7fbe221325f7]
[cluster:25805] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38)
[0x7fbe22160a48]
[cluster:25805] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25805] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25805] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25805] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25805] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25805] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25805] [ 9] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7fbe209721a6]
[cluster:25805] [10] sample [0x4073b9]
[cluster:25805] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 25805 on node cluster exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
When I try running on 4 4pcessors I get a double message (from 2
processors).
mpirun -np 4 sample
[cluster:25946] *** Process received signal ***
[cluster:25946] Signal: Segmentation fault (11)
[cluster:25946] Signal code: (128)
[cluster:25946] Failing at address: (nil)
[cluster:25947] *** Process received signal ***
[cluster:25947] Signal: Segmentation fault (11)
[cluster:25947] Signal code: (128)
[cluster:25947] Failing at address: (nil)
[cluster:25946] [ 0] /lib/libpthread.so.0 [0x7f4ae4c6ba80]
[cluster:25946] [ 1] /shared/lib/libmpi.so.0 [0x7f4ae60e85f7]
[cluster:25946] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38)
[0x7f4ae6116a48]
[cluster:25946] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25946] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25947] [ 0] /lib/libpthread.so.0 [0x7f7dc5350a80]
[cluster:25946] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25946] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25946] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25946] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25946] [ 9] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7f4ae49281a6]
[cluster:25946] [10] sample [0x4073b9]
[cluster:25946] *** End of error message ***
[cluster:25947] [ 1] /shared/lib/libmpi.so.0 [0x7f7dc67cd5f7]
[cluster:25947] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38)
[0x7f7dc67fba48]
[cluster:25947] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25947] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25947] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25947] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25947] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25947] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25947] [ 9] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7f7dc500d1a6]
[cluster:25947] [10] sample [0x4073b9]
[cluster:25947] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 25946 on node cluster exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Attached is the file found in AZTEC named: md_wrap_mpi_c.c
This might give you some further hint.
Rachel
Dr. Rachel Gordon
Senior Research Fellow Phone: +972-4-8293811
Dept. of Aerospace Eng. Fax: +972 - 4 - 8292030
The Technion, Haifa 32000, Israel email: rgordon at tx.technion.ac.il
On Thu, 2 Sep 2010, Ralf Wildenhues wrote:
> Hello Rachel, Jeff,
>
> * Rachel Gordon wrote on Thu, Sep 02, 2010 at 01:35:37PM CEST:
>> The cluster I am trying to run on has only the openmpi MPI version.
>> So, mpif77 is equivalent to mpif77.openmpi and mpicc is equivalent
>> to mpicc.openmpi
>>
>> I changed the Makefile, replacing gfortran by mpif77 and gcc by mpicc.
>> The compilation and linkage stage ran with no problem:
>>
>> mpif77 -O -I../lib -DMAX_MEM_SIZE=16731136 -DCOMM_BUFF_SIZE=200000
>> -DMAX_CHUNK_SIZE=200000 -c -o az_tutorial_with_MPI.o
>> az_tutorial_with_MPI.f
>> mpif77 az_tutorial_with_MPI.o -O -L../lib -laztec -o sample
>
> Can you retry but this time add -pthread to both compile and link
> command?
>
> There were other reports on the OpenMPI devel list that some pthread
> flags have gone missing somewhere. It might well be that that caused
> its libraries to already be built wrongly, or just the application,
> I'm not sure. But the segfault inside libpthread is suspicious.
>
> Thanks,
> Ralf
>
>> But again when I try to run 'sample' I get:
>>
>> mpirun -np 1 sample
>>
>>
>> [cluster:24989] *** Process received signal ***
>> [cluster:24989] Signal: Segmentation fault (11)
>> [cluster:24989] Signal code: Address not mapped (1)
>> [cluster:24989] Failing at address: 0x100000098
>> [cluster:24989] [ 0] /lib/libpthread.so.0 [0x7f5058036a80]
>> [cluster:24989] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e)
>> [0x7f50594ce34e]
>> [cluster:24989] [ 2] sample(parallel_info+0x24) [0x41d2ba]
>> [cluster:24989] [ 3] sample(AZ_set_proc_config+0x2d) [0x408417]
>> [cluster:24989] [ 4] sample(az_set_proc_config_+0xc) [0x407b85]
>> [cluster:24989] [ 5] sample(MAIN__+0x54) [0x407662]
>> [cluster:24989] [ 6] sample(main+0x2c) [0x44e8ec]
>> [cluster:24989] [ 7] /lib/libc.so.6(__libc_start_main+0xe6)
>> [0x7f5057cf31a6]
>> [cluster:24989] [ 8] sample [0x407459]
>> [cluster:24989] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 24989 on node cluster
>> exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>
>
-------------- next part --------------
/*====================================================================
* ------------------------
* | CVS File Information |
* ------------------------
*
* $RCSfile: md_wrap_mpi_c.c,v $
*
* $Author: tuminaro $
*
* $Date: 1998/12/21 19:36:24 $
*
* $Revision: 5.3 $
*
* $Name: $
*====================================================================*/
#ifndef lint
static char *cvs_wrapmpi_id =
"$Id: md_wrap_mpi_c.c,v 5.3 1998/12/21 19:36:24 tuminaro Exp $";
#endif
/*******************************************************************************
* Copyright 1995, Sandia Corporation. The United States Government retains a *
* nonexclusive license in this software as prescribed in AL 88-1 and AL 91-7. *
* Export of this program may require a license from the United States *
* Government. *
******************************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
int gl_rbuf = 3;
int gl_sbuf = 3;
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int the_proc_name = -1;
void get_parallel_info(int *proc, int *nprocs, int *dim)
{
/* local variables */
int i;
MPI_Comm_size(MPI_COMM_WORLD, nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, proc);
*dim = 0;
the_proc_name = *proc;
} /* get_parallel_info */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_read(char *buf, int bytes, int *source, int *type, int *flag)
{
int err, buffer = 1;
MPI_Status status;
if (*type == -1) *type = MPI_ANY_TAG;
if (*source == -1) *source = MPI_ANY_SOURCE;
if (bytes == 0) {
err = MPI_Recv(&gl_rbuf, 1, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
&status);
}
else {
err = MPI_Recv(buf, bytes, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
&status);
}
if (err != 0) (void) fprintf(stderr, "MPI_Recv error = %d\n", err);
MPI_Get_count(&status,MPI_BYTE,&buffer);
*source = status.MPI_SOURCE;
*type = status.MPI_TAG;
if (bytes != 0) bytes = buffer;
return bytes;
} /* md_read */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_write(char *buf, int bytes, int dest, int type, int *flag)
{
int err;
if (bytes == 0) {
err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
else {
err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
if (err != 0) (void) fprintf(stderr, "MPI_Send error = %d\n", err);
return 0;
} /* md_write */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_iread(void *buf, int bytes, int *source, int *type,
MPI_Request *request)
/*******************************************************************************
Machine dependent wrapped message-reading communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
source: Source processor number.
type: Message type
*******************************************************************************/
{
int err = 0;
if (*type == -1) *type = MPI_ANY_TAG;
if (*source == -1) *source = MPI_ANY_SOURCE;
if (bytes == 0) {
err = MPI_Irecv(&gl_rbuf, 1, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
request);
}
else {
err = MPI_Irecv(buf, bytes, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
request);
}
return err;
} /* md_wrap_iread */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_write(void *buf, int bytes, int dest, int type, int *flag)
/*******************************************************************************
Machine dependent wrapped message-sending communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
*******************************************************************************/
{
int err = 0;
if (bytes == 0) {
err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
else {
err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
return err;
} /* md_wrap_write */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_wait(void *buf, int bytes, int *source, int *type, int *flag,
MPI_Request *request)
/*******************************************************************************
Machine dependent wrapped message-wait communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
*******************************************************************************/
{
int err, count;
MPI_Status status;
if ( MPI_Wait(request, &status) ) {
(void) fprintf(stderr, "MPI_Wait error\n");
exit(-1);
}
MPI_Get_count(&status, MPI_BYTE, &count);
*source = status.MPI_SOURCE;
*type = status.MPI_TAG;
/* return the count, which is in bytes */
return count;
} /* md_wrap_wait */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_iwrite(void *buf, int bytes, int dest, int type, int *flag,
MPI_Request *request)
/*******************************************************************************
Machine dependent wrapped message-sending (nonblocking) communication
routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
*******************************************************************************/
{
int err = 0;
if (bytes == 0) {
err = MPI_Isend(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD,
request);
}
else {
err = MPI_Isend(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD,
request);
}
return err;
} /* md_wrap_write */
/********************************************************************/
/* NEW WRAPPERS to handle MPI Communicators */
/********************************************************************/
void parallel_info(int *proc,int *nprocs,int *dim, MPI_Comm comm)
{
/* local variables */
int i;
MPI_Comm_size(comm, nprocs);
MPI_Comm_rank(comm, proc);
*dim = 0;
the_proc_name = *proc;
} /* get_parallel_info */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_iread(void *buf, int bytes, int *source, int *type,
MPI_Request *request, int *icomm)
/*******************************************************************************
Machine dependent wrapped message-reading communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
source: Source processor number.
type: Message type
icomm: MPI Communicator
*******************************************************************************/
{
int err = 0;
MPI_Comm *comm;
comm = (MPI_Comm *) icomm;
if (*type == -1) *type = MPI_ANY_TAG;
if (*source == -1) *source = MPI_ANY_SOURCE;
if (bytes == 0) {
err = MPI_Irecv(&gl_rbuf, 1, MPI_BYTE, *source, *type, *comm,
request);
}
else {
err = MPI_Irecv(buf, bytes, MPI_BYTE, *source, *type, *comm,
request);
}
return err;
} /* md_mpi_iread */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_write(void *buf, int bytes, int dest, int type, int *flag,
int *icomm)
/*******************************************************************************
Machine dependent wrapped message-sending communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
icomm: MPI Communicator
*******************************************************************************/
{
int err = 0;
MPI_Comm *comm;
comm = (MPI_Comm *) icomm;
if (bytes == 0) {
err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, *comm);
}
else {
err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, *comm);
}
return err;
} /* md_wrap_write */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_wait(void *buf, int bytes, int *source, int *type, int *flag,
MPI_Request *request, int *icomm)
/*******************************************************************************
Machine dependent wrapped message-wait communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
icomm: MPI Communicator
*******************************************************************************/
{
int err, count;
MPI_Status status;
if ( MPI_Wait(request, &status) ) {
(void) fprintf(stderr, "MPI_Wait error\n");
exit(-1);
}
MPI_Get_count(&status, MPI_BYTE, &count);
*source = status.MPI_SOURCE;
*type = status.MPI_TAG;
/* return the count, which is in bytes */
return count;
} /* md_mpi_wait */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_iwrite(void *buf, int bytes, int dest, int type, int *flag,
MPI_Request *request, int *icomm)
/*******************************************************************************
Machine dependent wrapped message-sending (nonblocking) communication
routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
icomm: MPI Communicator
*******************************************************************************/
{
int err = 0;
MPI_Comm *comm;
comm = (MPI_Comm *) icomm ;
if (bytes == 0)
err = MPI_Isend(&gl_sbuf, 1, MPI_BYTE, dest, type, *comm, request);
else
err = MPI_Isend(buf, bytes, MPI_BYTE, dest, type, *comm, request);
return err;
} /* md_mpi_write */
More information about the Pkg-openmpi-maintainers
mailing list