[PATCH] Do not assemble a duplicate of a running array

NeilBrown neilb at suse.de
Wed Aug 24 23:28:26 UTC 2011


On Wed, 24 Aug 2011 19:05:31 -0400 Yury Polyanskiy <yp at mit.edu> wrote:

> On Thu, 14 Jul 2011 17:30:22 +1000
> NeilBrown <neilb at suse.de> wrote:
> 
> > On Sun, 10 Jul 2011 21:08:53 -0400 Yury Polyanskiy <yp at mit.edu> wrote:
> > 
> > > Dear mdadm developers,
> > > 
> > > Consider the following scenario (based on a true story):
> > 
> > Hi,
> >  thanks for the bug report and the patch.
> > [...]
> > 
> 
> Dear Neil,
> 
> I was wondering if you have anything new on this bug/patch?
> 

I was just thinking about this - maybe half an hour ago.  Someone else had a
vaguely similar problem and I knew I had seen it before but couldn't remember
where.  You have reminded me - thanks.

I've added this to my list of things to fix in mdadm.  I probably won't get
to it for a few weeks but you can be sure it won't get forgotten now.

Thanks,
NeilBrown


> Best,
> Yury
> 
> > > SETUP:
> > > * a raid1 MD array is configured with the name /dev/md/data (metadata=1.0).
> > > * /dev/md/data is identified in the /etc/mdadm/mdadm.conf by its UUID
> > > * /dev/md/data is assembled from the initramfs with minor /dev/md127
> > > 
> > > CRASH:
> > > * [GLITCH] at some point one of the disks (/dev/sda) comprising /dev/md/data
> > > experiences a bus-failure and is correctly removed from the array.
> > > * array runs for some time in degraded state, data is being written and read
> > > from the remaining disks (/dev/sdb, sdc etc)
> > > * [REBOOT] after some time, the server is rebooted
> > > * at boot time initramfs script starts /dev/md/data, but kernel correctly
> > > kicks out /dev/sda from the array because the Event Counter is too old
> > > ('kicking non-fresh sda from array').
> > > * when udev starts, "mdadm --detail --export /dev/md127" correctly
> > > identifies /dev/md127 and UDEV creates a correct symbolic link:
> > >    /dev/md/data -> /dev/md127
> > > 
> > > * startup proceeds to execute /etc/init.d/mdadm-raid which runs "mdadm
> > > --assemble --scan".
> > > * mdadm --assemble --scan discovers the description of /dev/md/data in
> > > mdadm.conf and finds a lonely /dev/sda with a matching UUID; /dev/md126 is
> > > created out of a single /dev/sda
> > > * now udev receives an ADD event for /sys/block/md126; again "mdadm --detail
> > > --export /dev/md126" says that this device must be named /dev/md/data. UDEV
> > > then re-creates the symbolic link, which now points to /dev/md126:
> > >    /dev/md/data -> /dev/md126
> > > 
> > > * startup proceeds and /dev/md/data is mounted as a /home partition.
> > > 
> > > CONCLUSION:
> > > * data written to /dev/md/data between GLITCH and REBOOT appears as lost now
> > > (not mounted to /home).
> > > * data written to /dev/md/data after REBOOT is stored on a buggy /dev/sda
> > > instead of a sane array (/dev/sdb,/dev/sdc)
> > > 
> > > 
> > > PATCH:
> > > * attached is a patch against the master of http://neil.brown.name/git/mdadm
> > > 
> > > 
> > > Best wishes,
> > > Yury Polyanskiy
> > 




More information about the pkg-mdadm-devel mailing list