Bug#703315: mdadm: core dump with bad disk
NeilBrown
neilb at suse.de
Wed Mar 27 23:54:20 UTC 2013
On Fri, 22 Mar 2013 09:26:00 +0400 Michael Tokarev <mjt at tls.msk.ru> wrote:
> [Replying back to the bugreport AND to NeilB.
> Hope you're okay with that
> Neilb: This is just the (excellent!) analisys of the problem,
> I can send a patch if you like
> ]
>
> 20.03.2013 22:19, Francesco Potortì wrote:
> >> And also (as root) apt-get install build-essential to
> >> install build dependencies
> >
> > It misses ansidecl.h. I installed gcc-4.7-plugin-dev and copied it from
> > the standard palce to the build directory to satisfy it.
>
> Interesting. Thank you for letting us know!
>
> >> Use this executable to manage your arrays and to get a
> >> coredump, and run gdb on it. Or run this executable under
> >> gdb and get a sigsegv.
> >
> > By mistake, I first built version 3.1.4 which does not crash but rather
> > gives the correct error message:
> >
> > mdadm: failed to write superblock to /dev/sdb1
> >
> >> When you hit the issue, run `bt' in gdb to see where it
> >> is failing. This will show you a stack trace, and the
> >> current line of code where it fails. We may want to
> >> examine variables around there, using `p' command.
> >
> > Okay, that was easy, but I do not understand it. THe problem is in
> > write_init_super1 in super1.c:
> >
> > for (di = st->info; di && ! rv ; di = di->next) {
> > if (di->disk.state == 1)
> > continue;
> > if (di->fd < 0)
> > continue;
> >
> > This reads the structure, the second test passes, so the loop continues,
> > but next is null and the loop ends. After this, di in null. But in
> > this case:
> >
> > }
> > error_out:
> > if (rv)
> > fprintf(stderr, Name ": Failed to write metadata to %s\n",
> > di->devname);
> >
> > Which segsevs because rv is 4. The fact is, I cannot imagine why ever
> > it is 4. It should be 0.
> >
> > Today I have not had the time to change the disk, so I could do some
> > other test. Maybe this evening. If you write to me, I'll try something
> > else.
>
> Well. This should be all that's needed, actually even more than that!
> Your analisys is excellent, you did a very good work! Thank you very much
> for helping Francesco!
>
> Obviously these are places difficult to hit in real life...
>
> This is only the error reporting which is broken, -- mdadm will not eat
> your data with this bug. So there's nothing to worry about on your
> system anymore, except, ofcourse, the bad disk which needs to be
> replaced to restore redundancy, as you already know. Hopefully
> the next upload of mdadm package will fix this issue, but it is
> not very urgent - SIGSEGV'ing isn't nice but it isn't harmful either.
>
> Thank you for the good work!
>
> /mjt
>
> > tucano:/usr/local/src/mdadm/mdadm-3.2.5# gdb --args ./mdadm /dev/md3 --add /dev/sdb1
> > GNU gdb (GDB) 7.4.1-debian
> > Copyright (C) 2012 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-linux-gnu".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /usr/local/src/mdadm/mdadm-3.2.5/mdadm...done.
> > (gdb) run
> > Starting program: /usr/local/src/mdadm/mdadm-3.2.5/mdadm /dev/md3 --add /dev/sdb1
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > write_init_super1 (st=0x69b630) at super1.c:1248
> > 1248 fprintf(stderr, Name ": Failed to write metadata to %s\n",
> > (gdb) bt
> > #0 write_init_super1 (st=0x69b630) at super1.c:1248
> > #1 0x0000000000414600 in Manage_subdevs (devname=0x7fffffffebf3 "/dev/md3", fd=8, devlist=0x698030, verbose=0,
> > test=0, update=0x0, force=0) at Manage.c:952
> > #2 0x00000000004060fb in main (argc=4, argv=0x7fffffffe8f8) at mdadm.c:1245
> > (gdb) p di
> > $1 = (struct devinfo *) 0x0
> > (gdb) p rv
> > $2 = 4
> > (gdb) p st
> > $3 = (struct supertype *) 0x69b630
> > (gdb) p *st
> > $4 = {ss = 0x683cc0, minor_version = 2, max_devs = 1920, container_dev = 8388608, sb = 0x6ae000, info = 0x6ad7f0,
> > ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, sock = 0, devnum = 3, devname = 0x0,
> > devcnt = 0, retry_soon = 0, devs = 0x0}
> > (gdb) p st->info
> > $5 = (void *) 0x6ad7f0
> > (gdb) p *(struct devinfo *)(st->info)
> > $6 = {fd = -1, devname = 0x7fffffffec02 "/dev/sdb1", disk = {number = 2, major = 8, minor = 17, raid_disk = -1,
> > state = 0}, next = 0x0}
> > (gdb) quit
> > A debugging session is active.
> >
> > Inferior 1 [process 6824] will be killed.
> >
> > Quit anyway? (y or n) y
> > tucano:/usr/local/src/mdadm/mdadm-3.2.5#
> >
Thanks for the report.
This is fixed by commit 4687f160276a8f7815675ca758c598d881f04fd7 in mainline
and by commit 0d478e243a90a48fe4da581c7302771f0d66fb3b in the mdadm-3.2.x
branch and thus in mdadm-3.2.6.
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-mdadm-devel/attachments/20130328/ced4467c/attachment.pgp>
More information about the pkg-mdadm-devel
mailing list