Bug#396582: infinite loop on assembly of degraded array

martin f krafft madduck at debian.org
Wed Nov 1 17:52:43 CET 2006


Package: mdadm
Version: 2.5.4-1
Severity: important
Owner: dan at ag-projects.com
Tags: patch

----- Forwarded message from Dan Pascu <dan at ag-projects.com> -----

[...]

The second patch fixes a more serious problem. If the system boots with a 
degraded array, it locks in the booting process forever. I traced the 
problem to be with mdadm --assemble --scan --auto=yes
If an array is degraded, it is assembled, but for some reason it is not 
properly recognized so when it tries to scan the available devices and it 
then tries to reassemble it again and again.

The output of trying to assemble 2 RAID1 arrays (one being degraded) is 
this:

debian:~# mdadm --assemble --scan --auto=yes
mdadm: /dev/md0 has been started with 2 drives.
mdadm: /dev/md1 has been started with 1 drive (out of 2).
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
mdadm: /dev/md/1 is already active.
[... repeated forever or until ^C ...]

The issue (as far as I was able to understand it) is this:

1. mdadm tries to first assemble all the arrays it finds in the mdadm.conf 
file.

2. it then tries to find other arrays by scanning all the available 
devices in the system and repeatedly assembling the ones found until none 
is left. Now if at least 1 array is degraded, for some reason it won't 
see it as assembled and it tries to assemble it again (this doesn't 
happen when the arrays are not degraded).

At the 2nd step it calls the Assemble() function repeatedly, but it 
doesn't pass a device list to it. This has the consequence that 
Assemble() will request its own device list and while parsing it to 
detect unassembled arrays it will mark the devices that are already used 
by other arrays. However since it gets this device list new everytime it 
is called, next time it won't know what devices are already used and has 
to start over marking them.

My patch tries to fix the issue by passing a device list to Assemble() 
while autosselecting, which means that over multiple calls Assemble() 
will use the same device list and will know which devices were already 
used by other arrays without redetecting this. This fixes the problem and 
now mdadm no longer locks in an infinite loop when assembling degraded 
arrays, and booting in such a condition no longer locks the machine. Also 
arrays are assembled correctly.

Now I have only studied the code for a couple of hours and I'm sure I 
didn't got all its subtleties, so this patch may not be the best way to 
fix the issue. However it makes sense to me to pass the same list of 
devices while autoassembling, so it knows which ones are already used in 
another arrays and doesn't try to reuse them.
I would recommend that you take this to the upstream author and he should 
decide if this is the right fix, or there is a better way to fix the 
problem.

The patch applies cleanly on both mdadm 2.5.4 and 2.5.5

-- 
Dan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mdadm-fix-infinite-loop.diff
Type: text/x-diff
Size: 359 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/pkg-mdadm-devel/attachments/20061101/664a4c4b/mdadm-fix-infinite-loop.bin


More information about the pkg-mdadm-devel mailing list