Bug#703315: mdadm: core dump with bad disk

NeilBrown neilb at suse.de
Wed Mar 27 23:54:20 UTC 2013


On Fri, 22 Mar 2013 09:26:00 +0400 Michael Tokarev <mjt at tls.msk.ru> wrote:

> [Replying back to the bugreport AND to NeilB.
>  Hope you're okay with that
>  Neilb: This is just the (excellent!) analisys of the problem,
>   I can send a patch if you like
> ]
> 
> 20.03.2013 22:19, Francesco Potortì wrote:
> >> And also (as root) apt-get install build-essential to
> >> install build dependencies 
> > 
> > It misses ansidecl.h.  I installed gcc-4.7-plugin-dev and copied it from
> > the standard palce to the build directory to satisfy it.
> 
> Interesting.  Thank you for letting us know!
> 
> >> Use this executable to manage your arrays and to get a
> >> coredump, and run gdb on it.  Or run this executable under
> >> gdb and get a sigsegv.
> > 
> > By mistake, I first built version 3.1.4 which does not crash but rather
> > gives the correct error message: 
> > 
> >   mdadm: failed to write superblock to /dev/sdb1
> > 
> >> When you hit the issue, run `bt' in gdb to see where it
> >> is failing.  This will show you a stack trace, and the
> >> current line of code where it fails.  We may want to
> >> examine variables around there, using `p' command.
> > 
> > Okay, that was easy, but I do not understand it.  THe problem is in
> > write_init_super1 in super1.c:
> > 
> > 	for (di = st->info; di && ! rv ; di = di->next) {
> > 		if (di->disk.state == 1)
> > 			continue;
> > 		if (di->fd < 0)
> > 			continue;
> > 
> > This reads the structure, the second test passes, so the loop continues,
> > but next is null and the loop ends.  After this, di in null.  But in
> > this case:
> > 
> > 	}
> > error_out:
> > 	if (rv)
> > 		fprintf(stderr,	Name ": Failed to write metadata to %s\n",
> > 			di->devname);
> > 
> > Which segsevs because rv is 4.  The fact is, I cannot imagine why ever
> > it is 4.  It should be 0.
> > 
> > Today I have not had the time to change the disk, so I could do some
> > other test.  Maybe this evening.  If you write to me, I'll try something
> > else.
> 
> Well.  This should be all that's needed, actually even more than that!
> Your analisys is excellent, you did a very good work! Thank you very much
> for helping Francesco!
> 
> Obviously these are places difficult to hit in real life...
> 
> This is only the error reporting which is broken, -- mdadm will not eat
> your data with this bug.  So there's nothing to worry about on your
> system anymore, except, ofcourse, the bad disk which needs to be
> replaced to restore redundancy, as you already know.  Hopefully
> the next upload of mdadm package will fix this issue, but it is
> not very urgent - SIGSEGV'ing isn't nice but it isn't harmful either.
> 
> Thank you for the good work!
> 
> /mjt
> 
> > tucano:/usr/local/src/mdadm/mdadm-3.2.5# gdb --args ./mdadm /dev/md3 --add /dev/sdb1
> > GNU gdb (GDB) 7.4.1-debian
> > Copyright (C) 2012 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-linux-gnu".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /usr/local/src/mdadm/mdadm-3.2.5/mdadm...done.
> > (gdb) run
> > Starting program: /usr/local/src/mdadm/mdadm-3.2.5/mdadm /dev/md3 --add /dev/sdb1
> > 
> > Program received signal SIGSEGV, Segmentation fault.
> > write_init_super1 (st=0x69b630) at super1.c:1248
> > 1248			fprintf(stderr,	Name ": Failed to write metadata to %s\n",
> > (gdb) bt
> > #0  write_init_super1 (st=0x69b630) at super1.c:1248
> > #1  0x0000000000414600 in Manage_subdevs (devname=0x7fffffffebf3 "/dev/md3", fd=8, devlist=0x698030, verbose=0, 
> >     test=0, update=0x0, force=0) at Manage.c:952
> > #2  0x00000000004060fb in main (argc=4, argv=0x7fffffffe8f8) at mdadm.c:1245
> > (gdb) p di
> > $1 = (struct devinfo *) 0x0
> > (gdb) p rv
> > $2 = 4
> > (gdb) p st
> > $3 = (struct supertype *) 0x69b630
> > (gdb) p *st
> > $4 = {ss = 0x683cc0, minor_version = 2, max_devs = 1920, container_dev = 8388608, sb = 0x6ae000, info = 0x6ad7f0, 
> >   ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, sock = 0, devnum = 3, devname = 0x0, 
> >   devcnt = 0, retry_soon = 0, devs = 0x0}
> > (gdb) p st->info
> > $5 = (void *) 0x6ad7f0
> > (gdb) p *(struct devinfo *)(st->info)
> > $6 = {fd = -1, devname = 0x7fffffffec02 "/dev/sdb1", disk = {number = 2, major = 8, minor = 17, raid_disk = -1, 
> >     state = 0}, next = 0x0}
> > (gdb) quit
> > A debugging session is active.
> > 
> > 	Inferior 1 [process 6824] will be killed.
> > 
> > Quit anyway? (y or n) y
> > tucano:/usr/local/src/mdadm/mdadm-3.2.5# 
> > 


Thanks for the report.

This is fixed by commit 4687f160276a8f7815675ca758c598d881f04fd7 in mainline
and by commit 0d478e243a90a48fe4da581c7302771f0d66fb3b in the mdadm-3.2.x
branch and thus in mdadm-3.2.6.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-mdadm-devel/attachments/20130328/ced4467c/attachment.pgp>


More information about the pkg-mdadm-devel mailing list