Bug#396582: infinite loop on assembly of degraded array
martin f krafft
madduck at debian.org
Wed Nov 1 17:52:43 CET 2006
Package: mdadm
Version: 2.5.4-1
Severity: important
Owner: dan at ag-projects.com
Tags: patch
----- Forwarded message from Dan Pascu <dan at ag-projects.com> -----
[...]
The second patch fixes a more serious problem. If the system boots with a
degraded array, it locks in the booting process forever. I traced the
problem to be with mdadm --assemble --scan --auto=yes
If an array is degraded, it is assembled, but for some reason it is not
properly recognized so when it tries to scan the available devices and it
then tries to reassemble it again and again.
The output of trying to assemble 2 RAID1 arrays (one being degraded) is
this:
debian:~# mdadm --assemble --scan --auto=yes
mdadm: /dev/md0 has been started with 2 drives.
mdadm: /dev/md1 has been started with 1 drive (out of 2).
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
[... repeated forever or until ^C ...]
The issue (as far as I was able to understand it) is this:
1. mdadm tries to first assemble all the arrays it finds in the mdadm.conf
file.
2. it then tries to find other arrays by scanning all the available
devices in the system and repeatedly assembling the ones found until none
is left. Now if at least 1 array is degraded, for some reason it won't
see it as assembled and it tries to assemble it again (this doesn't
happen when the arrays are not degraded).
At the 2nd step it calls the Assemble() function repeatedly, but it
doesn't pass a device list to it. This has the consequence that
Assemble() will request its own device list and while parsing it to
detect unassembled arrays it will mark the devices that are already used
by other arrays. However since it gets this device list new everytime it
is called, next time it won't know what devices are already used and has
to start over marking them.
My patch tries to fix the issue by passing a device list to Assemble()
while autosselecting, which means that over multiple calls Assemble()
will use the same device list and will know which devices were already
used by other arrays without redetecting this. This fixes the problem and
now mdadm no longer locks in an infinite loop when assembling degraded
arrays, and booting in such a condition no longer locks the machine. Also
arrays are assembled correctly.
Now I have only studied the code for a couple of hours and I'm sure I
didn't got all its subtleties, so this patch may not be the best way to
fix the issue. However it makes sense to me to pass the same list of
devices while autoassembling, so it knows which ones are already used in
another arrays and doesn't try to reuse them.
I would recommend that you take this to the upstream author and he should
decide if this is the right fix, or there is a better way to fix the
problem.
The patch applies cleanly on both mdadm 2.5.4 and 2.5.5
--
Dan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mdadm-fix-infinite-loop.diff
Type: text/x-diff
Size: 359 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/pkg-mdadm-devel/attachments/20061101/664a4c4b/mdadm-fix-infinite-loop.bin
More information about the pkg-mdadm-devel
mailing list