[Pkg-openmpi-maintainers] Bug#592326: Bug#592326: Failure of AZTEC test case run.

Rachel Gordon rgordon at techunix.technion.ac.il
Fri Sep 3 07:05:17 UTC 2010


Dear Jeff, Ralf and  Manuel

There are some good news,
I added -pthread  to both the compilation and link for running
az_tutorial_with_MPI.f, and I also compiled aztec with -pthread
Now the code runs O.K for np=1,2.

Now bad news: when I try running with 3,4 or more processors I get a 
similar error message:

mpirun -np 3 sample

[cluster:25805] *** Process received signal ***
[cluster:25805] Signal: Segmentation fault (11)
[cluster:25805] Signal code:  (128)
[cluster:25805] Failing at address: (nil)
[cluster:25805] [ 0] /lib/libpthread.so.0 [0x7fbe20cb5a80]
[cluster:25805] [ 1] /shared/lib/libmpi.so.0 [0x7fbe221325f7]
[cluster:25805] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38) 
[0x7fbe22160a48]
[cluster:25805] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25805] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25805] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25805] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25805] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25805] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25805] [ 9] /lib/libc.so.6(__libc_start_main+0xe6) 
[0x7fbe209721a6]
[cluster:25805] [10] sample [0x4073b9]
[cluster:25805] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 25805 on node cluster exited 
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

When I try running on 4 4pcessors I get a double message (from 2 
processors).
     mpirun -np 4 sample

[cluster:25946] *** Process received signal ***
[cluster:25946] Signal: Segmentation fault (11)
[cluster:25946] Signal code:  (128)
[cluster:25946] Failing at address: (nil)
[cluster:25947] *** Process received signal ***
[cluster:25947] Signal: Segmentation fault (11)
[cluster:25947] Signal code:  (128)
[cluster:25947] Failing at address: (nil)
[cluster:25946] [ 0] /lib/libpthread.so.0 [0x7f4ae4c6ba80]
[cluster:25946] [ 1] /shared/lib/libmpi.so.0 [0x7f4ae60e85f7]
[cluster:25946] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38) 
[0x7f4ae6116a48]
[cluster:25946] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25946] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25947] [ 0] /lib/libpthread.so.0 [0x7f7dc5350a80]
[cluster:25946] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25946] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25946] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25946] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25946] [ 9] /lib/libc.so.6(__libc_start_main+0xe6) 
[0x7f4ae49281a6]
[cluster:25946] [10] sample [0x4073b9]
[cluster:25946] *** End of error message ***
[cluster:25947] [ 1] /shared/lib/libmpi.so.0 [0x7f7dc67cd5f7]
[cluster:25947] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38) 
[0x7f7dc67fba48]
[cluster:25947] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25947] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25947] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25947] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25947] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25947] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25947] [ 9] /lib/libc.so.6(__libc_start_main+0xe6) 
[0x7f7dc500d1a6]
[cluster:25947] [10] sample [0x4073b9]
[cluster:25947] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 25946 on node cluster exited 
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------




Attached is the file found in AZTEC named:  md_wrap_mpi_c.c
This might give you some further hint.



Rachel

   Dr.  Rachel Gordon
   Senior Research Fellow   		Phone: +972-4-8293811
   Dept. of Aerospace Eng.		Fax:   +972 - 4 - 8292030
   The Technion, Haifa 32000, Israel     email: rgordon at tx.technion.ac.il


On Thu, 2 Sep 2010, Ralf Wildenhues wrote:

> Hello Rachel, Jeff,
>
> * Rachel Gordon wrote on Thu, Sep 02, 2010 at 01:35:37PM CEST:
>> The cluster I am trying to run on has only the openmpi MPI version.
>> So, mpif77 is equivalent to mpif77.openmpi and mpicc is equivalent
>> to mpicc.openmpi
>>
>> I changed the Makefile, replacing gfortran by mpif77 and gcc by mpicc.
>> The compilation and linkage stage ran with no problem:
>>
>> mpif77 -O   -I../lib -DMAX_MEM_SIZE=16731136 -DCOMM_BUFF_SIZE=200000
>> -DMAX_CHUNK_SIZE=200000  -c -o az_tutorial_with_MPI.o
>> az_tutorial_with_MPI.f
>> mpif77 az_tutorial_with_MPI.o -O -L../lib -laztec      -o sample
>
> Can you retry but this time add -pthread to both compile and link
> command?
>
> There were other reports on the OpenMPI devel list that some pthread
> flags have gone missing somewhere.  It might well be that that caused
> its libraries to already be built wrongly, or just the application,
> I'm not sure.  But the segfault inside libpthread is suspicious.
>
> Thanks,
> Ralf
>
>> But again when I try to run 'sample' I get:
>>
>> mpirun -np 1 sample
>>
>>
>> [cluster:24989] *** Process received signal ***
>> [cluster:24989] Signal: Segmentation fault (11)
>> [cluster:24989] Signal code: Address not mapped (1)
>> [cluster:24989] Failing at address: 0x100000098
>> [cluster:24989] [ 0] /lib/libpthread.so.0 [0x7f5058036a80]
>> [cluster:24989] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e)
>> [0x7f50594ce34e]
>> [cluster:24989] [ 2] sample(parallel_info+0x24) [0x41d2ba]
>> [cluster:24989] [ 3] sample(AZ_set_proc_config+0x2d) [0x408417]
>> [cluster:24989] [ 4] sample(az_set_proc_config_+0xc) [0x407b85]
>> [cluster:24989] [ 5] sample(MAIN__+0x54) [0x407662]
>> [cluster:24989] [ 6] sample(main+0x2c) [0x44e8ec]
>> [cluster:24989] [ 7] /lib/libc.so.6(__libc_start_main+0xe6)
>> [0x7f5057cf31a6]
>> [cluster:24989] [ 8] sample [0x407459]
>> [cluster:24989] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 24989 on node cluster
>> exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>
>
-------------- next part --------------
/*====================================================================
 * ------------------------
 * | CVS File Information |
 * ------------------------
 *
 * $RCSfile: md_wrap_mpi_c.c,v $
 *
 * $Author: tuminaro $
 *
 * $Date: 1998/12/21 19:36:24 $
 *
 * $Revision: 5.3 $
 *
 * $Name:  $
 *====================================================================*/
#ifndef lint
static char *cvs_wrapmpi_id =
  "$Id: md_wrap_mpi_c.c,v 5.3 1998/12/21 19:36:24 tuminaro Exp $";
#endif


/*******************************************************************************
 * Copyright 1995, Sandia Corporation.  The United States Government retains a *
 * nonexclusive license in this software as prescribed in AL 88-1 and AL 91-7. *
 * Export of this program may require a license from the United States         *
 * Government.                                                                 *
 ******************************************************************************/


#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

int gl_rbuf = 3;
int gl_sbuf = 3;
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int the_proc_name = -1;

void get_parallel_info(int *proc, int *nprocs, int *dim)

{

  /* local variables */

  int i;

  MPI_Comm_size(MPI_COMM_WORLD, nprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, proc);
  *dim = 0;
the_proc_name = *proc;

} /* get_parallel_info */

/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_read(char *buf, int bytes, int *source, int *type, int *flag)

{

  int        err, buffer = 1;
  MPI_Status status;

  if (*type   == -1) *type   = MPI_ANY_TAG;
  if (*source == -1) *source = MPI_ANY_SOURCE;

  if (bytes == 0) {
    err = MPI_Recv(&gl_rbuf, 1, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
                   &status);
  }
  else {
    err = MPI_Recv(buf, bytes, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
                   &status);
  }

  if (err != 0) (void) fprintf(stderr, "MPI_Recv error = %d\n", err);
  MPI_Get_count(&status,MPI_BYTE,&buffer);
  *source = status.MPI_SOURCE;
  *type   = status.MPI_TAG;
  if (bytes != 0) bytes = buffer;

  return bytes;

} /* md_read */


/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_write(char *buf, int bytes, int dest, int type, int *flag)

{

  int err;

  if (bytes == 0) {
    err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD);
  }
  else {
    err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD);
  }

  if (err != 0) (void) fprintf(stderr, "MPI_Send error = %d\n", err);

  return 0;

} /* md_write */



/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_wrap_iread(void *buf, int bytes, int *source, int *type,
                  MPI_Request *request)


/*******************************************************************************

  Machine dependent wrapped message-reading communication routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.

  source:          Source processor number.

  type:            Message type

*******************************************************************************/

{

  int err = 0;

  if (*type   == -1) *type   = MPI_ANY_TAG;
  if (*source == -1) *source = MPI_ANY_SOURCE;

  if (bytes == 0) {
    err = MPI_Irecv(&gl_rbuf, 1, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
                    request);
  }
  else {
    err = MPI_Irecv(buf, bytes, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
                    request);
  }

  return err;

} /* md_wrap_iread */


/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_wrap_write(void *buf, int bytes, int dest, int type, int *flag)

/*******************************************************************************

  Machine dependent wrapped message-sending communication routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.

  dest:            Destination processor number.

  type:            Message type

  flag:

*******************************************************************************/

{

  int err = 0;

  if (bytes == 0) {
    err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD);
  }
  else {
    err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD);
  }

  return err;

} /* md_wrap_write */



/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_wrap_wait(void *buf, int bytes, int *source, int *type, int *flag,
                 MPI_Request *request)

/*******************************************************************************

  Machine dependent wrapped message-wait communication routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.
  dest:            Destination processor number.

  type:            Message type

  flag:

*******************************************************************************/

{

  int        err, count;
  MPI_Status status;

  if ( MPI_Wait(request, &status) ) {
    (void) fprintf(stderr, "MPI_Wait error\n");
    exit(-1);
  }

  MPI_Get_count(&status, MPI_BYTE, &count);
  *source = status.MPI_SOURCE;
  *type   = status.MPI_TAG;

  /* return the count, which is in bytes */

  return count;

} /* md_wrap_wait */

/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_wrap_iwrite(void *buf, int bytes, int dest, int type, int *flag,
                  MPI_Request *request)

/*******************************************************************************

  Machine dependent wrapped message-sending (nonblocking) communication 
  routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.

  dest:            Destination processor number.

  type:            Message type

  flag:

*******************************************************************************/

{

  int err = 0;

  if (bytes == 0) {
    err = MPI_Isend(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD,
                  request);
  }
  else {
    err = MPI_Isend(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD,
                  request);
  }

  return err;

} /* md_wrap_write */


/********************************************************************/
/*     NEW WRAPPERS to handle MPI Communicators                     */
/********************************************************************/

void parallel_info(int *proc,int *nprocs,int *dim, MPI_Comm comm)
{

  /* local variables */

  int i;

  MPI_Comm_size(comm, nprocs);
  MPI_Comm_rank(comm, proc);
  *dim = 0;
the_proc_name = *proc;

} /* get_parallel_info */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_mpi_iread(void *buf, int bytes, int *source, int *type,
                  MPI_Request *request, int *icomm)


/*******************************************************************************

  Machine dependent wrapped message-reading communication routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.

  source:          Source processor number.

  type:            Message type

  icomm:           MPI Communicator
*******************************************************************************/

{

  int err = 0;
  MPI_Comm *comm;

  comm = (MPI_Comm *) icomm;

  if (*type   == -1) *type   = MPI_ANY_TAG;
  if (*source == -1) *source = MPI_ANY_SOURCE;

  if (bytes == 0) {
    err = MPI_Irecv(&gl_rbuf, 1, MPI_BYTE, *source, *type, *comm,
                    request);
  }
  else {
    err = MPI_Irecv(buf, bytes, MPI_BYTE, *source, *type, *comm,
                    request);
  }

  return err;

} /* md_mpi_iread */


/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_mpi_write(void *buf, int bytes, int dest, int type, int *flag,
                  int *icomm)

/*******************************************************************************

  Machine dependent wrapped message-sending communication routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.

  dest:            Destination processor number.

  type:            Message type

  flag:

  icomm:           MPI Communicator

*******************************************************************************/

{

  int err = 0;
  MPI_Comm *comm;

  comm = (MPI_Comm *) icomm;

  if (bytes == 0) {
    err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, *comm);
  }
  else {
    err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, *comm);
  }

  return err;

} /* md_wrap_write */

/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_mpi_wait(void *buf, int bytes, int *source, int *type, int *flag,
                 MPI_Request *request, int *icomm)

/*******************************************************************************

  Machine dependent wrapped message-wait communication routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.
  dest:            Destination processor number.

  type:            Message type

  flag:

  icomm:           MPI Communicator

*******************************************************************************/

{

  int        err, count;
  MPI_Status status;

  if ( MPI_Wait(request, &status) ) {
    (void) fprintf(stderr, "MPI_Wait error\n");
    exit(-1);
  }

  MPI_Get_count(&status, MPI_BYTE, &count);
  *source = status.MPI_SOURCE;
  *type   = status.MPI_TAG;

  /* return the count, which is in bytes */

  return count;

} /* md_mpi_wait */

/******************************************************************************/
/******************************************************************************/
/******************************************************************************/

int md_mpi_iwrite(void *buf, int bytes, int dest, int type, int *flag,
                  MPI_Request *request, int *icomm)

/*******************************************************************************

  Machine dependent wrapped message-sending (nonblocking) communication
  routine for MPI.

  Author:          Scott A. Hutchinson, SNL, 9221
  =======

  Return code:     int
  ============

  Parameter list:
  ===============

  buf:             Beginning address of data to be sent.

  bytes:           Length of message in bytes.

  dest:            Destination processor number.

  type:            Message type

  flag:

  icomm:           MPI Communicator

*******************************************************************************/
{

  int err = 0;
  MPI_Comm *comm;

  comm = (MPI_Comm *) icomm ;
  if (bytes == 0)
    err = MPI_Isend(&gl_sbuf, 1, MPI_BYTE, dest, type, *comm, request);
  else
    err = MPI_Isend(buf, bytes, MPI_BYTE, dest, type, *comm, request);

  return err;

} /* md_mpi_write */


More information about the Pkg-openmpi-maintainers mailing list