Bug#601198: mdadm: Does all that can be expected ...

NeilBrown neilb at suse.de
Mon Aug 1 02:30:07 UTC 2011


On Sun, 31 Jul 2011 15:49:48 -0400 Scott Schaefer
<saschaefer at neurodiverse.org> wrote:

> I am glad that you phrased your request "It would better if it managed
> to say it failed doing the requested operation.".
> 
> Because it indeed did successfully perform the operation, exactly as
> the output indicated.  That is, it DID indeed set the MD_DISK_FAULTY
> attribute on the /dev/sdb2 device of the /dev/md0 array.
> 
> To be more precise, it set the attribute via ioctl() call to the
> kernel 'md' driver. (~ lines 980-995 of Manage.c).
> 
> Unfortunately, (or rather, fortunately, for your data as well as
> your blood pressure), the kernel 'md' driver, when receiving this
> request, sets flag to initiate a recovery, or, if a recovery is
> already in progress (as in your case),  sets flag for
> MD_RECOVERY_RECOVER.
> 
> I have not attempted to understand all the possibilities in the
> kernel driver.  However, it appears that, at least for RAID-1,
> the FAULTY flag on the (sdb2) device is cleared when the recovery
> completes, and the 'RECOVERY_RECOVER' finds nothing more to do.
> 
> At this point, I believe this a "won't fix" issue; one could
> potentially ask for mdadm to do some before/after status-check
> magic and "handle" this and other potential cases in some
> "better" way.  Asking it to do so raises a great deal many more
> problems than it solves.

I've just queued the following kernel patch which will be in 3.1 which I
believe is the best way to address this issue.

Thanks,
NeilBrown

From 70792a4e8fc486ab82449cb3165268131875b7c1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb at suse.de>
Date: Mon, 1 Aug 2011 12:28:41 +1000
Subject: [PATCH] md: report failure if a 'set faulty' request doesn't.

Sometimes a device will refuse to be set faulty.  e.g. RAID1 will
never let the last working device become faulty.

So check if "md_error()" did manage to set the faulty flag and fail
with EBUSY if it didn't.

Resolves-Debian-Bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601198
Reported-by: Mike Hommey <mh+reportbug at glandium.org>
Signed-off-by: NeilBrown <neilb at suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e221a2..1cd9bfb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2561,7 +2561,10 @@ state_store(mdk_rdev_t *rdev, const char *buf, size_t len)
 	int err = -EINVAL;
 	if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
 		md_error(rdev->mddev, rdev);
-		err = 0;
+		if (test_bit(Faulty, &rdev->flags))
+			err = 0;
+		else
+			err = -EBUSY;
 	} else if (cmd_match(buf, "remove")) {
 		if (rdev->raid_disk >= 0)
 			err = -EBUSY;
@@ -5983,6 +5986,8 @@ static int set_disk_faulty(mddev_t *mddev, dev_t dev)
 		return -ENODEV;
 
 	md_error(mddev, rdev);
+	if (!test_bit(Faulty, &rdev->flags))
+		return -EBUSY;
 	return 0;
 }
 





More information about the pkg-mdadm-devel mailing list