Bug#416512: removed disk && md-device

Neil Brown neilb at suse.de
Fri May 11 01:36:53 UTC 2007


On Thursday May 10, david at dgreaves.com wrote:
> Neil Brown wrote:
> > On Wednesday May 9, bs at q-leap.de wrote:
> >> Neil Brown <neilb at suse.de> [2007.04.02.0953 +0200]:
> >>> Hmmm... this is somewhat awkward.  You could argue that udev should be
> >>> taught to remove the device from the array before removing the device
> >> >from /dev.  But I'm not convinced that you always want to 'fail' the
> >>> device.   It is possible in this case that the array is quiescent and
> >>> you might like to shut it down without registering a device failure...
> >> Hmm, the the kernel advised hotplug to remove the device from /dev, but you 
> >> don't want to remove it from md? Do you have an example for that case?
> > 
> > Until there is known to be an inconsistency among the devices in an
> > array, you don't want to record that there is.
> > 
> > Suppose I have two USB drives with a mounted but quiescent filesystem
> > on a raid1 across them.
> > I pull them both out, one after the other, to take them to my friends
> > place.
> > 
> > I plug them both in and find that the array is degraded, because as
> > soon as I unplugged on, the other was told that it was now the only
> > one.
> And, in truth, so it was.

So what was?
It is true that now one drive is "the only one plugged in", but is
that relevant?
Is it true that the one drive is "the only drive in the array"??
That depends on what you mean by "the array".  If I am moving "the
array" to another computer, then the one drive still plugged into the
first computer is not "the only drive in the array" from my
perspective.

If there is a write request, and it can only be written to one drive
(because the other is unplugged), then it becomes appropriate to tell
the still-present drive that it is the only drive in the array.

> 
> Who updated the event count though?

Sorry, not enough words.  I don't know what you are asking.

> 
> > Not good.  Best to wait for an IO request that actually returns an
> > errors. 
> Ah, now would that be a good time to update the event count?

Yes.  Of course.  It is an event (IO failed).  That makes it a good
time to update the event count...... am I missing something here?

> 
> 
> Maybe you should allow drives to be removed even if they aren't faulty or spare?
> A write to a removed device would mark it faulty in the other devices without
> waiting for a timeout.

Maybe, but I'm not sure what the real gain would be.

> 
> But joggling a usb stick (similar to your use case) would probably be OK since
> it would be hot-removed and then hot-added.

This still needs user-space interaction.
If the USB layer detects a removal and a re-insert, sdb may well come
back a something different (sdp?) - though I'm not completely familiar
with how USB storage works.

In any case, it should really be a user-space decision what happens
then.  A hot re-add may well be appropriate, but I wouldn't want to
have the kernel make that decision.

NeilBrown





More information about the pkg-mdadm-devel mailing list