Bug#601198: mdadm: Does all that can be expected ...
NeilBrown
neilb at suse.de
Mon Aug 1 02:30:07 UTC 2011
On Sun, 31 Jul 2011 15:49:48 -0400 Scott Schaefer
<saschaefer at neurodiverse.org> wrote:
> I am glad that you phrased your request "It would better if it managed
> to say it failed doing the requested operation.".
>
> Because it indeed did successfully perform the operation, exactly as
> the output indicated. That is, it DID indeed set the MD_DISK_FAULTY
> attribute on the /dev/sdb2 device of the /dev/md0 array.
>
> To be more precise, it set the attribute via ioctl() call to the
> kernel 'md' driver. (~ lines 980-995 of Manage.c).
>
> Unfortunately, (or rather, fortunately, for your data as well as
> your blood pressure), the kernel 'md' driver, when receiving this
> request, sets flag to initiate a recovery, or, if a recovery is
> already in progress (as in your case), sets flag for
> MD_RECOVERY_RECOVER.
>
> I have not attempted to understand all the possibilities in the
> kernel driver. However, it appears that, at least for RAID-1,
> the FAULTY flag on the (sdb2) device is cleared when the recovery
> completes, and the 'RECOVERY_RECOVER' finds nothing more to do.
>
> At this point, I believe this a "won't fix" issue; one could
> potentially ask for mdadm to do some before/after status-check
> magic and "handle" this and other potential cases in some
> "better" way. Asking it to do so raises a great deal many more
> problems than it solves.
I've just queued the following kernel patch which will be in 3.1 which I
believe is the best way to address this issue.
Thanks,
NeilBrown
From 70792a4e8fc486ab82449cb3165268131875b7c1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb at suse.de>
Date: Mon, 1 Aug 2011 12:28:41 +1000
Subject: [PATCH] md: report failure if a 'set faulty' request doesn't.
Sometimes a device will refuse to be set faulty. e.g. RAID1 will
never let the last working device become faulty.
So check if "md_error()" did manage to set the faulty flag and fail
with EBUSY if it didn't.
Resolves-Debian-Bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601198
Reported-by: Mike Hommey <mh+reportbug at glandium.org>
Signed-off-by: NeilBrown <neilb at suse.de>
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e221a2..1cd9bfb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2561,7 +2561,10 @@ state_store(mdk_rdev_t *rdev, const char *buf, size_t len)
int err = -EINVAL;
if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
md_error(rdev->mddev, rdev);
- err = 0;
+ if (test_bit(Faulty, &rdev->flags))
+ err = 0;
+ else
+ err = -EBUSY;
} else if (cmd_match(buf, "remove")) {
if (rdev->raid_disk >= 0)
err = -EBUSY;
@@ -5983,6 +5986,8 @@ static int set_disk_faulty(mddev_t *mddev, dev_t dev)
return -ENODEV;
md_error(mddev, rdev);
+ if (!test_bit(Faulty, &rdev->flags))
+ return -EBUSY;
return 0;
}
More information about the pkg-mdadm-devel
mailing list