[Debian-ha-svn-commits] [SCM] corosync Debian packaging branch, master, updated. debian/1.4.1-1-5-gdf79c62

Martin Loschwitz madkiss at debian.org
Wed Oct 19 16:12:55 UTC 2011


The following commit has been merged in the master branch:
commit c814a3e7da1b32cea672046cb7658acc96cc42ae
Author: Martin Loschwitz <madkiss at debian.org>
Date:   Wed Oct 19 14:32:10 2011 +0000

    Imported Upstream version 1.4.2

diff --git a/.tarball-version b/.tarball-version
index 347f583..9df886c 100644
--- a/.tarball-version
+++ b/.tarball-version
@@ -1 +1 @@
-1.4.1
+1.4.2
diff --git a/.version b/.version
index 347f583..9df886c 100644
--- a/.version
+++ b/.version
@@ -1 +1 @@
-1.4.1
+1.4.2
diff --git a/ChangeLog b/ChangeLog
index 4318c05..a81e149 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,160 @@
+2011-09-22  Steven Dake  <sdake at redhat.com>
+
+	Deliver all messages from my_high_seq_recieved to the last gap
+	This patch passes two test cases:
+
+	-------
+	Test #1
+	-------
+	Two node cluster - run cpgbench on each node
+
+	modify totemsrp with following defines:
+	Two test cases:
+
+	-------
+	Test #2
+	-------
+	5 node cluster
+
+	start 5 nodes randomly at about same time, start 5 nodes randomly at about
+	same time, wait 10 seconds and attempt to send a message.  If message blocks
+	on "TRY_AGAIN" likely a message loss has occured.  Wait a few minutes without
+	cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.
+
+	If it doesn't the test case has failed
+
+	Reviewed-by: Reviewed-by: Jan Friesse <jfriesse at redhat.com>
+	(cherry picked from commit 2ec4ddb039b310b308a8748c88332155afd62608)
+
+2011-07-14  Russell Bryant  <russell at russellbryant.net>
+
+	Resolve a deadlock between the timer and serialize locks.
+	This patch resolves a deadlock between the serialize lock (in
+	exec/main.c) and the timer lock (in exec/timer.c).  I observed this
+	deadlock happening fairly quickly on a cluster using the EVT service
+	from OpenAIS.  (OpenAIS 1.1.4, Corosync 1.4.1)
+
+	In prioritized_timer_thread(), it was grabbing:
+	    1) timer lock
+	    2) serialize lock
+
+	In another thread, you have:
+	    1) grab the serialize lock in deliver_fn() of exec/main.c
+	    2) grab the timer lock in corosync_timer_add_duration().
+
+	The patch just swaps the locking order in the timer thread.
+
+	Reviewed-by: Jan Friesse <jfriesse at redhat.com>
+
+2011-09-08  Jan Friesse  <jfriesse at redhat.com>
+
+	totemconfig: change minimum RRP threshold
+	RRP threshold can be lower value then 5.
+
+	Reviewed-by: Fabio M. Di Nitto <fdinitto at redhat.com>
+	(cherry picked from commit f6c2a8dab786c50ece36dd3424e258e93a1000d3)
+
+2011-09-05  Steven Dake  <sdake at redhat.com>
+
+	Ignore memb_join messages during flush operations
+	a memb_join operation that occurs during flushing can result in an
+	entry into the GATHER state from the RECOVERY state.  This results in the
+	regular sort queue being used instead of the recovery sort queue, resulting
+	in segfault.
+
+	Reviewed-by: Jan Friesse <jfriesse at redhat.com>
+	(cherry picked from commit 48ffa8892daac18935d96ae46a72aebe2fb70430)
+
+2011-09-01  Jan Friesse  <jfriesse at redhat.com>
+
+	rrp: Higher threshold in passive mode for mcast
+	There were too much false positives with passive mode rrp when high
+	number of messages were received.
+
+	Patch adds new configurable variable rrp_problem_count_mcast_threshold
+	which is by default 10 times rrp_problem_count_threshold and this is
+	used as threshold for multicast packets in passive mode. Variable is
+	unused in active mode.
+
+	Reviewed by: Steven Dake <sdake at redhat.com>
+	(cherry picked from commit 752239eaa1edd68695a6e40bcde60471f34a02fd)
+
+	rrp: Handle endless loop if all ifaces are faulty
+	If all interfaces were faulty, passive_mcast_flush_send and related
+	functions ended in endless loop. This is now handled and if there is no
+	live interface, message is dropped.
+
+	Reviewed by: Steven Dake <sdake at redhat.com>
+	(cherry picked from commit 0eade8de79b6e5b28e91604d4d460627c7a61ddd)
+
+2011-08-18  Tim Beale  <tim.beale at alliedtelesis.co.nz>
+
+	A CPG client can sometimes lockup if the local node is in the downlist
+	In a 10-node cluster where all nodes are booting up and starting corosync
+	at the same time, sometimes during this process corosync detects a node as
+	leaving and rejoining the cluster.
+
+	Occasionally the downlist that gets picked contains the local node. When the
+	local node sends leave events for the downlist (including itself), it sets
+	its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
+	means it no longer sends CPG events to the CPG client.
+
+	Reviewed-by: Jan Friesse <jfriesse at redhat.com>
+	(cherry picked from commit 08f07be323b777118264eb37413393065b360f8e)
+
+	Display ring-ID consistently in debug
+	Ring ID was being displayed both as hex and decimal in places. Update so
+	it's displayed consistently (I chose hex) to make debugging easier.
+
+	Reviewed-by: Angus Salkeld <asalkeld at redhat.com>
+	(cherry picked from commit 370d9bcecf2716e52c8f729a53e9600fe6cc6aa4)
+
+	Add code comment mapping for message handler defines
+	As a corosync-newbie it can be hard to bridge the gap between where a
+	particular message is sent and where the receive handler processes it,
+	and vice versa.
+
+	Reviewed-by: Angus Salkeld <asalkeld at redhat.com>
+	(cherry picked from commit 5a724a9c39465f7e63888f33375261506f69bd02)
+
+2011-08-17  Jan Friesse  <jfriesse at redhat.com>
+
+	cfg: Handle errors from totem_mcast
+	totem_mcast function can return -1 if corosync is overloaded. Sadly
+	in many calls of this functions was error code ether not handled at
+	all, or handled by assert.
+
+	Commit changes behaviour to ether return CS_ERR_TRY_AGAIN or put
+	error code to later layers to handle it.
+
+	Reviewed-by: Steven Dake <sdake at redhat.com>
+
+	cpg: Handle errors from totem_mcast
+	totem_mcast function can return -1 if corosync is overloaded. Sadly in
+	many calls of this functions was error code ether not handled at all, or
+	handled by assert.
+
+	Commit changes behaviour to ether return CS_ERR_TRY_AGAIN or put error
+	code to later layers to handle it.
+
+	Reviewed-by: Steven Dake <sdake at redhat.com>
+
+	coroipcc: use malloc for path in service_connect
+	Coroipcc appropriately uses PATH_MAX sized variables for various data
+	structures handling files in the initialization of the client.  Due to
+	the use of 12 of these structures declared as stack variables, the
+	application stack balloons to over 12*4k. This is especially problematic
+	if threads are used by long running daemons to restart the connection
+	to corosync so as to be resilient in the face of system services
+	restarting (service corosync restart).
+
+	A simple alternative is to allocate temporary memory to avoid
+	requirements of large thread stacks.
+
+	Original patch by Dan Clark <2clarkd at gmail.com>
+
+	Reviewed-by: Steven Dake <sdake at redhat.com>
+
 2011-07-26  Jan Friesse  <jfriesse at redhat.com>
 
 	main: let poll really stop before totempg_finalize
diff --git a/configure b/configure
index 67f09e8..3cfc2e9 100755
--- a/configure
+++ b/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.63 for corosync 1.4.1.
+# Generated by GNU Autoconf 2.63 for corosync 1.4.2.
 #
 # Report bugs to <openais at lists.osdl.org>.
 #
@@ -596,8 +596,8 @@ SHELL=${CONFIG_SHELL-/bin/sh}
 # Identity of this package.
 PACKAGE_NAME='corosync'
 PACKAGE_TARNAME='corosync'
-PACKAGE_VERSION='1.4.1'
-PACKAGE_STRING='corosync 1.4.1'
+PACKAGE_VERSION='1.4.2'
+PACKAGE_STRING='corosync 1.4.2'
 PACKAGE_BUGREPORT='openais at lists.osdl.org'
 
 ac_unique_file="lib/coroipcc.c"
@@ -1373,7 +1373,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures corosync 1.4.1 to adapt to many kinds of systems.
+\`configure' configures corosync 1.4.2 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1443,7 +1443,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of corosync 1.4.1:";;
+     short | recursive ) echo "Configuration of corosync 1.4.2:";;
    esac
   cat <<\_ACEOF
 
@@ -1560,7 +1560,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-corosync configure 1.4.1
+corosync configure 1.4.2
 generated by GNU Autoconf 2.63
 
 Copyright (C) 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001,
@@ -1574,7 +1574,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by corosync $as_me 1.4.1, which was
+It was created by corosync $as_me 1.4.2, which was
 generated by GNU Autoconf 2.63.  Invocation command line was
 
   $ $0 $@
@@ -2424,7 +2424,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='corosync'
- VERSION='1.4.1'
+ VERSION='1.4.2'
 
 
 cat >>confdefs.h <<_ACEOF
@@ -12354,7 +12354,7 @@ exec 6>&1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by corosync $as_me 1.4.1, which was
+This file was extended by corosync $as_me 1.4.2, which was
 generated by GNU Autoconf 2.63.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -12421,7 +12421,7 @@ Report bugs to <bug-autoconf at gnu.org>."
 _ACEOF
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_version="\\
-corosync config.status 1.4.1
+corosync config.status 1.4.2
 configured by $0, generated by GNU Autoconf 2.63,
   with options \\"`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\"
 
diff --git a/exec/timer.c b/exec/timer.c
index 69f9a95..02ca51b 100644
--- a/exec/timer.c
+++ b/exec/timer.c
@@ -131,13 +131,13 @@ static void *prioritized_timer_thread (void *data)
 		if (fds < 0) {
 			return NULL;
 		}
-		pthread_mutex_lock (&timer_mutex);
 		timer_serialize_lock_fn ();
+		pthread_mutex_lock (&timer_mutex);
 
 		timerlist_expire (&timers_timerlist);
 
-		timer_serialize_unlock_fn ();
 		pthread_mutex_unlock (&timer_mutex);
+		timer_serialize_unlock_fn ();
 	}
 }
 
diff --git a/exec/totemconfig.c b/exec/totemconfig.c
index 80ca182..a475bb3 100644
--- a/exec/totemconfig.c
+++ b/exec/totemconfig.c
@@ -82,7 +82,7 @@
 #define MISS_COUNT_CONST			5
 #define RRP_PROBLEM_COUNT_TIMEOUT		2000
 #define RRP_PROBLEM_COUNT_THRESHOLD_DEFAULT	10
-#define RRP_PROBLEM_COUNT_THRESHOLD_MIN		5
+#define RRP_PROBLEM_COUNT_THRESHOLD_MIN		2
 #define RRP_AUTORECOVERY_CHECK_TIMEOUT		1000
 
 static char error_string_response[512];
@@ -213,6 +213,8 @@ static void totem_volatile_config_read (
 
 	objdb_get_int (objdb,object_totem_handle, "rrp_problem_count_threshold", &totem_config->rrp_problem_count_threshold);
 
+	objdb_get_int (objdb,object_totem_handle, "rrp_problem_count_mcast_threshold", &totem_config->rrp_problem_count_mcast_threshold);
+
 	objdb_get_int (objdb,object_totem_handle, "rrp_autorecovery_check_timeout", &totem_config->rrp_autorecovery_check_timeout);
 
 	objdb_get_int (objdb,object_totem_handle, "heartbeat_failures_allowed", &totem_config->heartbeat_failures_allowed);
@@ -667,12 +669,21 @@ int totem_config_validate (
 	if (totem_config->rrp_problem_count_threshold == 0) {
 		totem_config->rrp_problem_count_threshold = RRP_PROBLEM_COUNT_THRESHOLD_DEFAULT;
 	}
+	if (totem_config->rrp_problem_count_mcast_threshold == 0) {
+		totem_config->rrp_problem_count_mcast_threshold = totem_config->rrp_problem_count_threshold * 10;
+	}
 	if (totem_config->rrp_problem_count_threshold < RRP_PROBLEM_COUNT_THRESHOLD_MIN) {
 		snprintf (local_error_reason, sizeof(local_error_reason),
 			"The RRP problem count threshold (%d problem count) may not be less then (%d problem count).",
 			totem_config->rrp_problem_count_threshold, RRP_PROBLEM_COUNT_THRESHOLD_MIN);
 		goto parse_error;
 	}
+	if (totem_config->rrp_problem_count_mcast_threshold < RRP_PROBLEM_COUNT_THRESHOLD_MIN) {
+		snprintf (local_error_reason, sizeof(local_error_reason),
+			"The RRP multicast problem count threshold (%d problem count) may not be less then (%d problem count).",
+			totem_config->rrp_problem_count_mcast_threshold, RRP_PROBLEM_COUNT_THRESHOLD_MIN);
+		goto parse_error;
+	}
 	if (totem_config->rrp_token_expired_timeout == 0) {
 		totem_config->rrp_token_expired_timeout =
 			totem_config->token_retransmit_timeout;
diff --git a/exec/totemrrp.c b/exec/totemrrp.c
index 83292ad..616d0d5 100644
--- a/exec/totemrrp.c
+++ b/exec/totemrrp.c
@@ -890,14 +890,17 @@ static void passive_monitor (
 	unsigned int max;
 	unsigned int i;
 	unsigned int min_all, min_active;
+	unsigned int threshold;
 
 	/*
 	 * Monitor for failures
 	 */
 	if (is_token_recv_count) {
 		recv_count = passive_instance->token_recv_count;
+		threshold = rrp_instance->totem_config->rrp_problem_count_threshold;
 	} else {
 		recv_count = passive_instance->mcast_recv_count;
+		threshold = rrp_instance->totem_config->rrp_problem_count_mcast_threshold;
 	}
 
 	recv_count[iface_no] += 1;
@@ -959,8 +962,7 @@ static void passive_monitor (
 
 	for (i = 0; i < rrp_instance->interface_count; i++) {
 		if ((passive_instance->faulty[i] == 0) &&
-			(max - recv_count[i] >
-			rrp_instance->totem_config->rrp_problem_count_threshold)) {
+		    (max - recv_count[i] > threshold)) {
 			passive_instance->faulty[i] = 1;
 			poll_timer_add (rrp_instance->poll_handle,
 				rrp_instance->totem_config->rrp_autorecovery_check_timeout,
@@ -1015,12 +1017,16 @@ static void passive_mcast_flush_send (
 	unsigned int msg_len)
 {
 	struct passive_instance *passive_instance = (struct passive_instance *)instance->rrp_algo_instance;
+	int i = 0;
 
 	do {
 		passive_instance->msg_xmit_iface = (passive_instance->msg_xmit_iface + 1) % instance->interface_count;
-	} while (passive_instance->faulty[passive_instance->msg_xmit_iface] == 1);
+		i++;
+	} while ((i <= instance->interface_count) && (passive_instance->faulty[passive_instance->msg_xmit_iface] == 1));
 
-	totemnet_mcast_flush_send (instance->net_handles[passive_instance->msg_xmit_iface], msg, msg_len);
+	if (i <= instance->interface_count) {
+		totemnet_mcast_flush_send (instance->net_handles[passive_instance->msg_xmit_iface], msg, msg_len);
+	}
 }
 
 static void passive_mcast_noflush_send (
@@ -1029,13 +1035,16 @@ static void passive_mcast_noflush_send (
 	unsigned int msg_len)
 {
 	struct passive_instance *passive_instance = (struct passive_instance *)instance->rrp_algo_instance;
+	int i = 0;
 
 	do {
 		passive_instance->msg_xmit_iface = (passive_instance->msg_xmit_iface + 1) % instance->interface_count;
-	} while (passive_instance->faulty[passive_instance->msg_xmit_iface] == 1);
-
+		i++;
+	} while ((i <= instance->interface_count) && (passive_instance->faulty[passive_instance->msg_xmit_iface] == 1));
 
-	totemnet_mcast_noflush_send (instance->net_handles[passive_instance->msg_xmit_iface], msg, msg_len);
+	if (i <= instance->interface_count) {
+		totemnet_mcast_noflush_send (instance->net_handles[passive_instance->msg_xmit_iface], msg, msg_len);
+	}
 }
 
 static void passive_token_recv (
@@ -1070,14 +1079,18 @@ static void passive_token_send (
 	unsigned int msg_len)
 {
 	struct passive_instance *passive_instance = (struct passive_instance *)instance->rrp_algo_instance;
+	int i = 0;
 
 	do {
 		passive_instance->token_xmit_iface = (passive_instance->token_xmit_iface + 1) % instance->interface_count;
-	} while (passive_instance->faulty[passive_instance->token_xmit_iface] == 1);
+		i++;
+	} while ((i <= instance->interface_count) && (passive_instance->faulty[passive_instance->token_xmit_iface] == 1));
 
-	totemnet_token_send (
-		instance->net_handles[passive_instance->token_xmit_iface],
-		msg, msg_len);
+	if (i <= instance->interface_count) {
+		totemnet_token_send (
+		    instance->net_handles[passive_instance->token_xmit_iface],
+		    msg, msg_len);
+	}
 
 }
 
diff --git a/exec/totemsrp.c b/exec/totemsrp.c
index fda271c..5b61f3b 100644
--- a/exec/totemsrp.c
+++ b/exec/totemsrp.c
@@ -110,10 +110,6 @@
  * SEQNO_START_TOKEN is the starting sequence number after a new configuration
  *	for a token.  This should remain zero, unless testing overflow in which
  *	case 07fffff00 or 0xffffff00 are good starting values.
- *
- * SEQNO_START_MSG is the starting sequence number after a new configuration
- *	This should remain zero, unless testing overflow in which case
- *	0x7ffff000 and 0xfffff000 are good values to start with
  */
 #define SEQNO_START_MSG 0x0
 #define SEQNO_START_TOKEN 0x0
@@ -625,12 +621,12 @@ void main_iface_change_fn (
 struct message_handlers totemsrp_message_handlers = {
 	6,
 	{
-		message_handler_orf_token,
-		message_handler_mcast,
-		message_handler_memb_merge_detect,
-		message_handler_memb_join,
-		message_handler_memb_commit_token,
-		message_handler_token_hold_cancel
+		message_handler_orf_token,            /* MESSAGE_TYPE_ORF_TOKEN */
+		message_handler_mcast,                /* MESSAGE_TYPE_MCAST */
+		message_handler_memb_merge_detect,    /* MESSAGE_TYPE_MEMB_MERGE_DETECT */
+		message_handler_memb_join,            /* MESSAGE_TYPE_MEMB_JOIN */
+		message_handler_memb_commit_token,    /* MESSAGE_TYPE_MEMB_COMMIT_TOKEN */
+		message_handler_token_hold_cancel     /* MESSAGE_TYPE_TOKEN_HOLD_CANCEL */
 	}
 };
 
@@ -862,6 +858,9 @@ int totemsrp_initialize (
 		"RRP threshold (%d problem count)\n",
 		totem_config->rrp_problem_count_threshold);
 	log_printf (instance->totemsrp_log_level_debug,
+		"RRP multicast threshold (%d problem count)\n",
+		totem_config->rrp_problem_count_mcast_threshold);
+	log_printf (instance->totemsrp_log_level_debug,
 		"RRP automatic recovery check timeout (%d ms)\n",
 		totem_config->rrp_autorecovery_check_timeout);
 	log_printf (instance->totemsrp_log_level_debug,
@@ -1795,7 +1794,36 @@ static void memb_state_operational_enter (struct totemsrp_instance *instance)
 		sizeof (struct srp_addr) * instance->my_memb_entries);
 
 	instance->my_failed_list_entries = 0;
-	instance->my_high_delivered = instance->my_high_seq_received;
+	/*
+	 * TODO Not exactly to spec
+	 *
+	 * At the entry to this function all messages without a gap are
+	 * deliered.
+	 *
+	 * This code throw away messages from the last gap in the sort queue
+	 * to my_high_seq_received
+	 *
+	 * What should really happen is we should deliver all messages up to
+	 * a gap, then delier the transitional configuration, then deliver
+	 * the messages between the first gap and my_high_seq_received, then
+	 * deliver a regular configuration, then deliver the regular
+	 * configuration
+	 *
+	 * Unfortunately totempg doesn't appear to like this operating mode
+	 * which needs more inspection
+	 */
+	i = instance->my_high_seq_received + 1;
+	do {
+		void *ptr;
+
+		i -= 1;
+		res = sq_item_get (&instance->regular_sort_queue, i, &ptr);
+		if (i == 0) {
+			break;
+		}
+	} while (res);
+
+	instance->my_high_delivered = i;
 
 	for (i = 0; i <= instance->my_high_delivered; i++) {
 		void *ptr;
@@ -2023,7 +2051,7 @@ static void memb_state_recovery_enter (
 		log_printf (instance->totemsrp_log_level_debug,
 			"position [%d] member %s:\n", i, totemip_print (&addr[i].addr[0]));
 		log_printf (instance->totemsrp_log_level_debug,
-			"previous ring seq %lld rep %s\n",
+			"previous ring seq %llx rep %s\n",
 			memb_list[i].ring_id.seq,
 			totemip_print (&memb_list[i].ring_id.rep));
 
@@ -4418,7 +4446,7 @@ void main_iface_change_fn (
 		memb_ring_id_create_or_load (instance, &instance->my_ring_id);
 		log_printf (
 			instance->totemsrp_log_level_debug,
-			"Created or loaded sequence id %lld.%s for this ring.\n",
+			"Created or loaded sequence id %llx.%s for this ring.\n",
 			instance->my_ring_id.seq,
 			totemip_print (&instance->my_ring_id.rep));
 
diff --git a/exec/totemudp.c b/exec/totemudp.c
index 96849b7..0c12b56 100644
--- a/exec/totemudp.c
+++ b/exec/totemudp.c
@@ -90,6 +90,8 @@
 #define BIND_STATE_REGULAR	1
 #define BIND_STATE_LOOPBACK	2
 
+#define MESSAGE_TYPE_MCAST	1
+
 #define HMAC_HASH_SIZE 20
 struct security_header {
 	unsigned char hash_digest[HMAC_HASH_SIZE]; /* The hash *MUST* be first in the data structure */
@@ -1172,6 +1174,7 @@ static int net_deliver_fn (
 	int res = 0;
 	unsigned char *msg_offset;
 	unsigned int size_delv;
+	char *message_type;
 
 	if (instance->flushing == 1) {
 		iovec = &instance->totemudp_iov_recv_flush;
@@ -1234,6 +1237,16 @@ static int net_deliver_fn (
 	}
 
 	/*
+	 * Drop all non-mcast messages (more specifically join
+	 * messages should be dropped)
+	 */
+	message_type = (char *)msg_offset;
+	if (instance->flushing == 1 && *message_type != MESSAGE_TYPE_MCAST) {
+		iovec->iov_len = FRAME_SIZE_MAX;
+		return (0);
+	}
+	
+	/*
 	 * Handle incoming message
 	 */
 	instance->totemudp_deliver_fn (
diff --git a/include/corosync/totem/totem.h b/include/corosync/totem/totem.h
index f3ac9cc..4dce3b3 100644
--- a/include/corosync/totem/totem.h
+++ b/include/corosync/totem/totem.h
@@ -143,6 +143,8 @@ struct totem_config {
 
 	unsigned int rrp_problem_count_threshold;
 
+	unsigned int rrp_problem_count_mcast_threshold;
+
 	unsigned int rrp_autorecovery_check_timeout;
 
 	char rrp_mode[TOTEM_RRP_MODE_BYTES];
diff --git a/lib/coroipcc.c b/lib/coroipcc.c
index b7a3db9..de1af53 100644
--- a/lib/coroipcc.c
+++ b/lib/coroipcc.c
@@ -86,6 +86,15 @@ struct ipc_instance {
 	pthread_mutex_t mutex;
 };
 
+struct ipc_path_data {
+	mar_req_setup_t req_setup;
+	mar_res_setup_t res_setup;
+	char control_map_path[PATH_MAX];
+	char request_map_path[PATH_MAX];
+	char response_map_path[PATH_MAX];
+	char dispatch_map_path[PATH_MAX];
+};
+
 void ipc_hdb_destructor (void *context);
 
 DECLARE_HDB_DATABASE(ipc_hdb,ipc_hdb_destructor);
@@ -581,12 +590,7 @@ coroipcc_service_connect (
 	union semun semun;
 #endif
 	int sys_res;
-	mar_req_setup_t req_setup;
-	mar_res_setup_t res_setup;
-	char control_map_path[PATH_MAX];
-	char request_map_path[PATH_MAX];
-	char response_map_path[PATH_MAX];
-	char dispatch_map_path[PATH_MAX];
+	struct ipc_path_data *path_data;
 
 	res = hdb_error_to_cs (hdb_handle_create (&ipc_hdb,
 		sizeof (struct ipc_instance), handle));
@@ -599,8 +603,6 @@ coroipcc_service_connect (
 		return (res);
 	}
 
-	res_setup.error = CS_ERR_LIBRARY;
-
 #if defined(COROSYNC_SOLARIS)
 	request_fd = socket (PF_UNIX, SOCK_STREAM, 0);
 #else
@@ -613,6 +615,14 @@ coroipcc_service_connect (
 	socket_nosigpipe (request_fd);
 #endif
 
+	path_data = malloc (sizeof(*path_data));
+	if (path_data == NULL) {
+		goto error_connect;
+	}
+	memset(path_data, 0, sizeof(*path_data));
+
+	path_data->res_setup.error = CS_ERR_LIBRARY;
+
 	memset (&address, 0, sizeof (struct sockaddr_un));
 	address.sun_family = AF_UNIX;
 #if defined(COROSYNC_BSD) || defined(COROSYNC_DARWIN)
@@ -632,7 +642,7 @@ coroipcc_service_connect (
 	}
 
 	sys_res = memory_map (
-		control_map_path,
+		path_data->control_map_path,
 		"control_buffer-XXXXXX",
 		(void *)&ipc_instance->control_buffer,
 		8192);
@@ -642,7 +652,7 @@ coroipcc_service_connect (
 	}
 
 	sys_res = memory_map (
-		request_map_path,
+		path_data->request_map_path,
 		"request_buffer-XXXXXX",
 		(void *)&ipc_instance->request_buffer,
 		request_size);
@@ -652,7 +662,7 @@ coroipcc_service_connect (
 	}
 
 	sys_res = memory_map (
-		response_map_path,
+		path_data->response_map_path,
 		"response_buffer-XXXXXX",
 		(void *)&ipc_instance->response_buffer,
 		response_size);
@@ -662,7 +672,7 @@ coroipcc_service_connect (
 	}
 
 	sys_res = circular_memory_map (
-		dispatch_map_path,
+		path_data->dispatch_map_path,
 		"dispatch_buffer-XXXXXX",
 		(void *)&ipc_instance->dispatch_buffer,
 		dispatch_size);
@@ -717,33 +727,33 @@ coroipcc_service_connect (
 	/*
 	 * Initialize IPC setup message
 	 */
-	req_setup.service = service;
-	strcpy (req_setup.control_file, control_map_path);
-	strcpy (req_setup.request_file, request_map_path);
-	strcpy (req_setup.response_file, response_map_path);
-	strcpy (req_setup.dispatch_file, dispatch_map_path);
-	req_setup.control_size = 8192;
-	req_setup.request_size = request_size;
-	req_setup.response_size = response_size;
-	req_setup.dispatch_size = dispatch_size;
+	path_data->req_setup.service = service;
+	strcpy (path_data->req_setup.control_file, path_data->control_map_path);
+	strcpy (path_data->req_setup.request_file, path_data->request_map_path);
+	strcpy (path_data->req_setup.response_file, path_data->response_map_path);
+	strcpy (path_data->req_setup.dispatch_file, path_data->dispatch_map_path);
+	path_data->req_setup.control_size = 8192;
+	path_data->req_setup.request_size = request_size;
+	path_data->req_setup.response_size = response_size;
+	path_data->req_setup.dispatch_size = dispatch_size;
 
 #if _POSIX_THREAD_PROCESS_SHARED < 1
-	req_setup.semkey = semkey;
+	path_data->req_setup.semkey = semkey;
 #endif
 
-	res = socket_send (request_fd, &req_setup, sizeof (mar_req_setup_t));
+	res = socket_send (request_fd, &path_data->req_setup, sizeof (mar_req_setup_t));
 	if (res != CS_OK) {
 		goto error_exit;
 	}
-	res = socket_recv (request_fd, &res_setup, sizeof (mar_res_setup_t));
+	res = socket_recv (request_fd, &path_data->res_setup, sizeof (mar_res_setup_t));
 	if (res != CS_OK) {
 		goto error_exit;
 	}
 
 	ipc_instance->fd = request_fd;
 
-	if (res_setup.error == CS_ERR_TRY_AGAIN) {
-		res = res_setup.error;
+	if (path_data->res_setup.error == CS_ERR_TRY_AGAIN) {
+		res = path_data->res_setup.error;
 		goto error_exit;
 	}
 
@@ -756,7 +766,9 @@ coroipcc_service_connect (
 
 	hdb_handle_put (&ipc_hdb, *handle);
 
-	return (res_setup.error);
+	res = path_data->res_setup.error;
+	free(path_data);
+	return (res);
 
 error_exit:
 #if _POSIX_THREAD_PROCESS_SHARED < 1
@@ -775,6 +787,7 @@ error_connect:
 
 	hdb_handle_destroy (&ipc_hdb, *handle);
 	hdb_handle_put (&ipc_hdb, *handle);
+	free(path_data);
 
 	return (res);
 }
diff --git a/man/corosync.conf.5 b/man/corosync.conf.5
index b6f769e..78eb2bb 100644
--- a/man/corosync.conf.5
+++ b/man/corosync.conf.5
@@ -472,6 +472,14 @@ may occur.
 The default is 10 problem counts.
 
 .TP
+rrp_problem_count_mcast_threshold
+This specifies the number of times a problem is detected with multicast before
+setting the link faulty for passive rrp mode. This variable is unused in active
+rrp mode.
+
+The default is 10 times rrp_problem_count_threshold.
+
+.TP
 rrp_token_expired_timeout
 This specifies the time in milliseconds to increment the problem counter for
 the redundant ring protocol after not having received a token from all rings
diff --git a/services/cfg.c b/services/cfg.c
index 0419847..24f19f2 100644
--- a/services/cfg.c
+++ b/services/cfg.c
@@ -379,6 +379,7 @@ static int send_shutdown(void)
 {
 	struct req_exec_cfg_shutdown req_exec_cfg_shutdown;
 	struct iovec iovec;
+	int result;
 
 	ENTER();
 	req_exec_cfg_shutdown.header.size =
@@ -389,10 +390,10 @@ static int send_shutdown(void)
 	iovec.iov_base = (char *)&req_exec_cfg_shutdown;
 	iovec.iov_len = sizeof (struct req_exec_cfg_shutdown);
 
-	assert (api->totem_mcast (&iovec, 1, TOTEM_SAFE) == 0);
+	result = api->totem_mcast (&iovec, 1, TOTEM_SAFE);
 
 	LEAVE();
-	return 0;
+	return (result);
 }
 
 static void send_test_shutdown(void *only_conn, void *exclude_conn, int status)
@@ -426,6 +427,9 @@ static void send_test_shutdown(void *only_conn, void *exclude_conn, int status)
 
 static void check_shutdown_status(void)
 {
+	int result;
+	cs_error_t error = CS_OK;
+
 	ENTER();
 
 	/*
@@ -448,9 +452,17 @@ static void check_shutdown_status(void)
 		    shutdown_flags == CFG_SHUTDOWN_FLAG_REGARDLESS) {
 			TRACE1("shutdown confirmed");
 
+			/*
+			 * Tell other nodes we are going down
+			 */
+			result = send_shutdown();
+			if (result == -1) {
+				error = CS_ERR_TRY_AGAIN;
+			}
+
 			res_lib_cfg_tryshutdown.header.size = sizeof(struct res_lib_cfg_tryshutdown);
 			res_lib_cfg_tryshutdown.header.id = MESSAGE_RES_CFG_TRYSHUTDOWN;
-			res_lib_cfg_tryshutdown.header.error = CS_OK;
+			res_lib_cfg_tryshutdown.header.error = error;
 
 			/*
 			 * Tell originator that shutdown was confirmed
@@ -459,10 +471,6 @@ static void check_shutdown_status(void)
 						    sizeof(res_lib_cfg_tryshutdown));
 			shutdown_con = NULL;
 
-			/*
-			 * Tell other nodes we are going down
-			 */
-			send_shutdown();
 
 		}
 		else {
@@ -698,7 +706,9 @@ static void message_handler_req_lib_cfg_ringreenable (
 	const void *msg)
 {
 	struct req_exec_cfg_ringreenable req_exec_cfg_ringreenable;
+	struct res_lib_cfg_ringreenable res_lib_cfg_ringreenable;
 	struct iovec iovec;
+	int result;
 
 	ENTER();
 	req_exec_cfg_ringreenable.header.size =
@@ -711,7 +721,19 @@ static void message_handler_req_lib_cfg_ringreenable (
 	iovec.iov_base = (char *)&req_exec_cfg_ringreenable;
 	iovec.iov_len = sizeof (struct req_exec_cfg_ringreenable);
 
-	assert (api->totem_mcast (&iovec, 1, TOTEM_SAFE) == 0);
+	result = api->totem_mcast (&iovec, 1, TOTEM_SAFE);
+
+	if (result == -1) {
+		res_lib_cfg_ringreenable.header.id = MESSAGE_RES_CFG_RINGREENABLE;
+		res_lib_cfg_ringreenable.header.size = sizeof (struct res_lib_cfg_ringreenable);
+		res_lib_cfg_ringreenable.header.error = CS_ERR_TRY_AGAIN;
+		api->ipc_response_send (
+			conn,
+			&res_lib_cfg_ringreenable,
+			sizeof (struct res_lib_cfg_ringreenable));
+
+		api->ipc_refcnt_dec(conn);
+	}
 
 	LEAVE();
 }
@@ -836,7 +858,8 @@ static void message_handler_req_lib_cfg_killnode (
 	struct res_lib_cfg_killnode res_lib_cfg_killnode;
 	struct req_exec_cfg_killnode req_exec_cfg_killnode;
 	struct iovec iovec;
-	int res;
+	int result;
+	cs_error_t error = CS_OK;
 
 	ENTER();
 	req_exec_cfg_killnode.header.size =
@@ -849,11 +872,14 @@ static void message_handler_req_lib_cfg_killnode (
 	iovec.iov_base = (char *)&req_exec_cfg_killnode;
 	iovec.iov_len = sizeof (struct req_exec_cfg_killnode);
 
-	res = api->totem_mcast (&iovec, 1, TOTEM_SAFE);
+	result = api->totem_mcast (&iovec, 1, TOTEM_SAFE);
+	if (result == -1) {
+		error = CS_ERR_TRY_AGAIN;
+	}
 
 	res_lib_cfg_killnode.header.size = sizeof(struct res_lib_cfg_killnode);
 	res_lib_cfg_killnode.header.id = MESSAGE_RES_CFG_KILLNODE;
-	res_lib_cfg_killnode.header.error = CS_OK;
+	res_lib_cfg_killnode.header.error = error;
 
 	api->ipc_response_send(conn, &res_lib_cfg_killnode,
 				    sizeof(res_lib_cfg_killnode));
@@ -869,6 +895,8 @@ static void message_handler_req_lib_cfg_tryshutdown (
 	struct cfg_info *ci = (struct cfg_info *)api->ipc_private_data_get (conn);
 	const struct req_lib_cfg_tryshutdown *req_lib_cfg_tryshutdown = msg;
 	struct list_head *iter;
+	int result;
+	cs_error_t error = CS_OK;
 
 	ENTER();
 
@@ -878,11 +906,14 @@ static void message_handler_req_lib_cfg_tryshutdown (
 		/*
 		 * Tell other nodes
 		 */
-		send_shutdown();
+		result = send_shutdown();
+		if (result == -1) {
+			error = CS_ERR_TRY_AGAIN;
+		}
 
 		res_lib_cfg_tryshutdown.header.size = sizeof(struct res_lib_cfg_tryshutdown);
 		res_lib_cfg_tryshutdown.header.id = MESSAGE_RES_CFG_TRYSHUTDOWN;
-		res_lib_cfg_tryshutdown.header.error = CS_OK;
+		res_lib_cfg_tryshutdown.header.error = error;
 		api->ipc_response_send(conn, &res_lib_cfg_tryshutdown,
 					    sizeof(res_lib_cfg_tryshutdown));
 
@@ -937,9 +968,14 @@ static void message_handler_req_lib_cfg_tryshutdown (
 	if (shutdown_expected == 0) {
 		struct res_lib_cfg_tryshutdown res_lib_cfg_tryshutdown;
 
+		result = send_shutdown();
+		if (result == -1) {
+			error = CS_ERR_TRY_AGAIN;
+		}
+
 		res_lib_cfg_tryshutdown.header.size = sizeof(struct res_lib_cfg_tryshutdown);
 		res_lib_cfg_tryshutdown.header.id = MESSAGE_RES_CFG_TRYSHUTDOWN;
-		res_lib_cfg_tryshutdown.header.error = CS_OK;
+		res_lib_cfg_tryshutdown.header.error = error;
 
 		/*
 		 * Tell originator that shutdown was confirmed
@@ -947,7 +983,6 @@ static void message_handler_req_lib_cfg_tryshutdown (
 		api->ipc_response_send(conn, &res_lib_cfg_tryshutdown,
 				       sizeof(res_lib_cfg_tryshutdown));
 
-		send_shutdown();
 		LEAVE();
 		return;
 	}
@@ -1089,7 +1124,8 @@ static void message_handler_req_lib_cfg_crypto_set (
 	struct res_lib_cfg_crypto_set res_lib_cfg_crypto_set;
 	struct req_exec_cfg_crypto_set req_exec_cfg_crypto_set;
 	struct iovec iovec;
-	int ret = CS_ERR_INVALID_PARAM;
+	cs_error_t error = CS_ERR_INVALID_PARAM;
+	int result;
 
 	req_exec_cfg_crypto_set.header.size =
 		sizeof (struct req_exec_cfg_crypto_set);
@@ -1105,13 +1141,17 @@ static void message_handler_req_lib_cfg_crypto_set (
 
 		iovec.iov_base = (char *)&req_exec_cfg_crypto_set;
 		iovec.iov_len = sizeof (struct req_exec_cfg_crypto_set);
-		assert (api->totem_mcast (&iovec, 1, TOTEM_SAFE) == 0);
-		ret = CS_OK;
+		result = api->totem_mcast (&iovec, 1, TOTEM_SAFE);
+		if (result == -1) {
+			error = CS_ERR_TRY_AGAIN;
+		} else {
+			error = CS_OK;
+		}
 	}
 
 	res_lib_cfg_crypto_set.header.size = sizeof(res_lib_cfg_crypto_set);
 	res_lib_cfg_crypto_set.header.id = MESSAGE_RES_CFG_CRYPTO_SET;
-	res_lib_cfg_crypto_set.header.error = ret;
+	res_lib_cfg_crypto_set.header.error = error;
 
 	api->ipc_response_send(conn, &res_lib_cfg_crypto_set,
 		sizeof(res_lib_cfg_crypto_set));
diff --git a/services/cpg.c b/services/cpg.c
index 6669fbd..d7a26f2 100644
--- a/services/cpg.c
+++ b/services/cpg.c
@@ -287,39 +287,39 @@ static int notify_lib_totem_membership (
  */
 static struct corosync_lib_handler cpg_lib_engine[] =
 {
-	{ /* 0 */
+	{ /* 0 - MESSAGE_REQ_CPG_JOIN */
 		.lib_handler_fn				= message_handler_req_lib_cpg_join,
 		.flow_control				= CS_LIB_FLOW_CONTROL_REQUIRED
 	},
-	{ /* 1 */
+	{ /* 1 - MESSAGE_REQ_CPG_LEAVE */
 		.lib_handler_fn				= message_handler_req_lib_cpg_leave,
 		.flow_control				= CS_LIB_FLOW_CONTROL_REQUIRED
 	},
-	{ /* 2 */
+	{ /* 2 - MESSAGE_REQ_CPG_MCAST */
 		.lib_handler_fn				= message_handler_req_lib_cpg_mcast,
 		.flow_control				= CS_LIB_FLOW_CONTROL_REQUIRED
 	},
-	{ /* 3 */
+	{ /* 3 - MESSAGE_REQ_CPG_MEMBERSHIP */
 		.lib_handler_fn				= message_handler_req_lib_cpg_membership,
 		.flow_control				= CS_LIB_FLOW_CONTROL_NOT_REQUIRED
 	},
-	{ /* 4 */
+	{ /* 4 - MESSAGE_REQ_CPG_LOCAL_GET */
 		.lib_handler_fn				= message_handler_req_lib_cpg_local_get,
 		.flow_control				= CS_LIB_FLOW_CONTROL_NOT_REQUIRED
 	},
-	{ /* 5 */
+	{ /* 5 - MESSAGE_REQ_CPG_ITERATIONINITIALIZE */
 		.lib_handler_fn				= message_handler_req_lib_cpg_iteration_initialize,
 		.flow_control				= CS_LIB_FLOW_CONTROL_NOT_REQUIRED
 	},
-	{ /* 6 */
+	{ /* 6 - MESSAGE_REQ_CPG_ITERATIONNEXT */
 		.lib_handler_fn				= message_handler_req_lib_cpg_iteration_next,
 		.flow_control				= CS_LIB_FLOW_CONTROL_NOT_REQUIRED
 	},
-	{ /* 7 */
+	{ /* 7 - MESSAGE_REQ_CPG_ITERATIONFINALIZE */
 		.lib_handler_fn				= message_handler_req_lib_cpg_iteration_finalize,
 		.flow_control				= CS_LIB_FLOW_CONTROL_NOT_REQUIRED
 	},
-	{ /* 8 */
+	{ /* 8 - MESSAGE_REQ_CPG_FINALIZE */
 		.lib_handler_fn				= message_handler_req_lib_cpg_finalize,
 		.flow_control				= CS_LIB_FLOW_CONTROL_REQUIRED
 	},
@@ -327,27 +327,27 @@ static struct corosync_lib_handler cpg_lib_engine[] =
 
 static struct corosync_exec_handler cpg_exec_engine[] =
 {
-	{ /* 0 */
+	{ /* 0 - MESSAGE_REQ_EXEC_CPG_PROCJOIN */
 		.exec_handler_fn	= message_handler_req_exec_cpg_procjoin,
 		.exec_endian_convert_fn	= exec_cpg_procjoin_endian_convert
 	},
-	{ /* 1 */
+	{ /* 1 - MESSAGE_REQ_EXEC_CPG_PROCLEAVE */
 		.exec_handler_fn	= message_handler_req_exec_cpg_procleave,
 		.exec_endian_convert_fn	= exec_cpg_procjoin_endian_convert
 	},
-	{ /* 2 */
+	{ /* 2 - MESSAGE_REQ_EXEC_CPG_JOINLIST */
 		.exec_handler_fn	= message_handler_req_exec_cpg_joinlist,
 		.exec_endian_convert_fn	= exec_cpg_joinlist_endian_convert
 	},
-	{ /* 3 */
+	{ /* 3 - MESSAGE_REQ_EXEC_CPG_MCAST */
 		.exec_handler_fn	= message_handler_req_exec_cpg_mcast,
 		.exec_endian_convert_fn	= exec_cpg_mcast_endian_convert
 	},
-	{ /* 4 */
+	{ /* 4 - MESSAGE_REQ_EXEC_CPG_DOWNLIST_OLD */
 		.exec_handler_fn	= message_handler_req_exec_cpg_downlist_old,
 		.exec_endian_convert_fn	= exec_cpg_downlist_endian_convert_old
 	},
-	{ /* 5 */
+	{ /* 5 - MESSAGE_REQ_EXEC_CPG_DOWNLIST */
 		.exec_handler_fn	= message_handler_req_exec_cpg_downlist,
 		.exec_endian_convert_fn	= exec_cpg_downlist_endian_convert
 	},
@@ -683,7 +683,8 @@ static int notify_lib_joinlist(
 				}
 				if (left_list_entries) {
 					if (left_list[0].pid == cpd->pid &&
-						left_list[0].nodeid == api->totem_nodeid_get()) {
+						left_list[0].nodeid == api->totem_nodeid_get() &&
+						left_list[0].reason == CONFCHG_CPG_REASON_LEAVE) {
 
 						cpd->pid = 0;
 						memset (&cpd->group_name, 0, sizeof(cpd->group_name));
@@ -865,12 +866,19 @@ static void cpg_pd_finalize (struct cpg_pd *cpd)
 static int cpg_lib_exit_fn (void *conn)
 {
 	struct cpg_pd *cpd = (struct cpg_pd *)api->ipc_private_data_get (conn);
+	int result;
 
 	log_printf(LOGSYS_LEVEL_DEBUG, "exit_fn for conn=%p\n", conn);
 
 	if (cpd->group_name.length > 0) {
-		cpg_node_joinleave_send (cpd->pid, &cpd->group_name,
+		result = cpg_node_joinleave_send (cpd->pid, &cpd->group_name,
 				MESSAGE_REQ_EXEC_CPG_PROCLEAVE, CONFCHG_CPG_REASON_PROCDOWN);
+		if (result == -1) {
+			/*
+			 * Call this function again later
+			 */
+			return (result);
+		}
 	}
 
 	cpg_pd_finalize (cpd);
@@ -1286,9 +1294,11 @@ static void message_handler_req_lib_cpg_join (void *conn, const void *message)
 {
 	const struct req_lib_cpg_join *req_lib_cpg_join = message;
 	struct cpg_pd *cpd = (struct cpg_pd *)api->ipc_private_data_get (conn);
+	struct cpg_pd tmp_cpd;
 	struct res_lib_cpg_join res_lib_cpg_join;
 	cs_error_t error = CPG_OK;
 	struct list_head *iter;
+	int result;
 
 	/* Test, if we don't have same pid and group name joined */
 	for (iter = cpg_pd_list_head.next; iter != &cpg_pd_list_head; iter = iter->next) {
@@ -1321,15 +1331,28 @@ static void message_handler_req_lib_cpg_join (void *conn, const void *message)
 	switch (cpd->cpd_state) {
 	case CPD_STATE_UNJOINED:
 		error = CPG_OK;
+		/*
+		 * Make copy of cpd, to restore if cpg_node_joinleave_send fails
+		 */
+		memcpy (&tmp_cpd, cpd, sizeof(tmp_cpd));
 		cpd->cpd_state = CPD_STATE_JOIN_STARTED;
 		cpd->pid = req_lib_cpg_join->pid;
 		cpd->flags = req_lib_cpg_join->flags;
 		memcpy (&cpd->group_name, &req_lib_cpg_join->group_name,
 			sizeof (cpd->group_name));
 
-		cpg_node_joinleave_send (req_lib_cpg_join->pid,
+		result = cpg_node_joinleave_send (req_lib_cpg_join->pid,
 			&req_lib_cpg_join->group_name,
 			MESSAGE_REQ_EXEC_CPG_PROCJOIN, CONFCHG_CPG_REASON_JOIN);
+
+		if (result == -1) {
+			error = CPG_ERR_TRY_AGAIN;
+			/*
+			 * Restore cpd
+			 */
+			memcpy (cpd, &tmp_cpd, sizeof(tmp_cpd));
+			goto response_send;
+		}
 		break;
 	case CPD_STATE_LEAVE_STARTED:
 		error = CPG_ERR_BUSY;
@@ -1356,6 +1379,7 @@ static void message_handler_req_lib_cpg_leave (void *conn, const void *message)
 	cs_error_t error = CPG_OK;
 	struct req_lib_cpg_leave  *req_lib_cpg_leave = (struct req_lib_cpg_leave *)message;
 	struct cpg_pd *cpd = (struct cpg_pd *)api->ipc_private_data_get (conn);
+	int result;
 
 	log_printf(LOGSYS_LEVEL_DEBUG, "got leave request on %p\n", conn);
 
@@ -1372,10 +1396,14 @@ static void message_handler_req_lib_cpg_leave (void *conn, const void *message)
 	case CPD_STATE_JOIN_COMPLETED:
 		error = CPG_OK;
 		cpd->cpd_state = CPD_STATE_LEAVE_STARTED;
-		cpg_node_joinleave_send (req_lib_cpg_leave->pid,
+		result = cpg_node_joinleave_send (req_lib_cpg_leave->pid,
 			&req_lib_cpg_leave->group_name,
 			MESSAGE_REQ_EXEC_CPG_PROCLEAVE,
 			CONFCHG_CPG_REASON_LEAVE);
+		if (result == -1) {
+			error = CPG_ERR_TRY_AGAIN;
+			cpd->cpd_state = CPD_STATE_JOIN_COMPLETED;
+		}
 		break;
 	}
 
@@ -1458,8 +1486,10 @@ static void message_handler_req_lib_cpg_mcast (void *conn, const void *message)
  		req_exec_cpg_iovec[1].iov_base = (char *)&req_lib_cpg_mcast->message;
  		req_exec_cpg_iovec[1].iov_len = msglen;
 
- 		result = api->totem_mcast (req_exec_cpg_iovec, 2, TOTEM_AGREED);
- 		assert(result == 0);
+		result = api->totem_mcast (req_exec_cpg_iovec, 2, TOTEM_AGREED);
+		if (result == -1) {
+			error = CPG_ERR_TRY_AGAIN;
+		}
  	}
 
 	res_lib_cpg_mcast.header.size = sizeof(res_lib_cpg_mcast);

-- 
corosync Debian packaging



More information about the Debian-ha-svn-commits mailing list