Bug#799781: device lock race condition between udev and multipathd may cause systemd to abort system boot
    Baumgartner Niels, Bedag 
    Niels.Baumgartner at bedag.ch
       
    Tue Sep 22 14:03:10 UTC 2015
    
    
  
Package: multipath-tools
Version: 0.5.0-6+deb8u1
Severity: critical
Tags: patch
Configuration:
I have the following setup: 
Dell PowerEdge M620 + QLogic ISP2532-based 8GB Fibre Channel to PCI Express HBA attached to our SAN with multipath.
OS is Debian Jessie 8.1
The Servers root file system resides on a LVM logical Volume.
The packages multipath-tools and multipath-tools-boot were installed.
Symptom:
Approximately 50% of the time the server won't boot correctly. (Depending on the outcome of the race condition between udev and multipathd [see below])
The password prompt for entering single user mode (or rescue.target) appears.
Problem:
The problem seems to be the same, Will Aoki already reported for upgrade-reports in the bug report 788295.
He was using open-iscsi, while I'm using a FC-HBA with the qla2xxx module. I'm guessing other combinations are affected too.
Bug 788295 has a very detailed analysis of the problem. The provided logs correlate with mine.
Since 788295 was filed against upgrade-reports, it'll probably not get fixed, hence this report.
Further Information:
Existing Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788295
Ubuntu fixed the issue. See https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1431650
Ubuntu Package with fix: http://packages.ubuntu.com/trusty-updates/multipath-tools
See also the comment of the patch taken from Ubuntu for more technical details.
Solution:
The following patch, taken from the Ubuntu package solved the problem for me and Will Aoki.
Could you please add this patch to the official Debian package and if possible get the fixed package into jessie-updates and the next jessie release?
------------------- START OF PATCH -----------------
>From 841977fc9c3432702c296d6239e4a54291a6007a Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare at suse.de>
Date: Tue, 24 Jun 2014 08:49:15 +0200
Subject: [PATCH] libmultipath: use a shared lock to co-operate with udev
udev since v214 is placing a shared lock on the device node
whenever it's processing the event. This introduces a race
condition with multipathd, as multipathd is processing the
event for the block device at the same time as udev is
processing the events for the partitions.
And a lock on the partitions will also be visible on the
block device itself, hence multipathd won't be able to
lock the device.
When multipath manages to take a lock on the device,
udev will fail, and consequently ignore this entire event.
Which in turn might cause the system to malfunction as it
might have been a crucial event like 'remove' or 'link down'.
So we should better use LOCK_SH here; with that the flock
call in multipathd _and_ udev will succeed and the events
can be processed.
References: bnc#883878
Signed-off-by: Hannes Reinecke <hare at suse.de>
---
 libmultipath/configure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index 0ddd3d5..dc2ebf0 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -529,7 +529,7 @@ lock_multipath (struct multipath * mpp, int lock)
 		if (!pgp->paths)
 			continue;
 		vector_foreach_slot(pgp->paths, pp, j) {
-			if (lock && flock(pp->fd, LOCK_EX | LOCK_NB) &&
+			if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) &&
 			    errno == EWOULDBLOCK)
 				goto fail;
 			else if (!lock)
------------------- END OF PATCH -----------------
Additional comments:
Why I rated this critical: (1) The Ubuntu bug is rated critical. (2) I think the "makes unrelated software on the system (or the whole system) break" clause applies when a system does not reliably boot anymore.
I can provide journal entries of a failed boot attempt if necessary. Since such logs already exist in bug 788295 and a tested patch exists, I thought it wasn't.
Kind Regards
Niels Baumgartner
    
    
More information about the pkg-lvm-maintainers
mailing list