Bug#703315: mdadm: core dump with bad disk

Michael Tokarev mjt at tls.msk.ru
Fri Mar 22 05:26:00 UTC 2013


[Replying back to the bugreport AND to NeilB.
 Hope you're okay with that
 Neilb: This is just the (excellent!) analisys of the problem,
  I can send a patch if you like
]

20.03.2013 22:19, Francesco Potortì wrote:
>> And also (as root) apt-get install build-essential to
>> install build dependencies 
> 
> It misses ansidecl.h.  I installed gcc-4.7-plugin-dev and copied it from
> the standard palce to the build directory to satisfy it.

Interesting.  Thank you for letting us know!

>> Use this executable to manage your arrays and to get a
>> coredump, and run gdb on it.  Or run this executable under
>> gdb and get a sigsegv.
> 
> By mistake, I first built version 3.1.4 which does not crash but rather
> gives the correct error message: 
> 
>   mdadm: failed to write superblock to /dev/sdb1
> 
>> When you hit the issue, run `bt' in gdb to see where it
>> is failing.  This will show you a stack trace, and the
>> current line of code where it fails.  We may want to
>> examine variables around there, using `p' command.
> 
> Okay, that was easy, but I do not understand it.  THe problem is in
> write_init_super1 in super1.c:
> 
> 	for (di = st->info; di && ! rv ; di = di->next) {
> 		if (di->disk.state == 1)
> 			continue;
> 		if (di->fd < 0)
> 			continue;
> 
> This reads the structure, the second test passes, so the loop continues,
> but next is null and the loop ends.  After this, di in null.  But in
> this case:
> 
> 	}
> error_out:
> 	if (rv)
> 		fprintf(stderr,	Name ": Failed to write metadata to %s\n",
> 			di->devname);
> 
> Which segsevs because rv is 4.  The fact is, I cannot imagine why ever
> it is 4.  It should be 0.
> 
> Today I have not had the time to change the disk, so I could do some
> other test.  Maybe this evening.  If you write to me, I'll try something
> else.

Well.  This should be all that's needed, actually even more than that!
Your analisys is excellent, you did a very good work! Thank you very much
for helping Francesco!

Obviously these are places difficult to hit in real life...

This is only the error reporting which is broken, -- mdadm will not eat
your data with this bug.  So there's nothing to worry about on your
system anymore, except, ofcourse, the bad disk which needs to be
replaced to restore redundancy, as you already know.  Hopefully
the next upload of mdadm package will fix this issue, but it is
not very urgent - SIGSEGV'ing isn't nice but it isn't harmful either.

Thank you for the good work!

/mjt

> tucano:/usr/local/src/mdadm/mdadm-3.2.5# gdb --args ./mdadm /dev/md3 --add /dev/sdb1
> GNU gdb (GDB) 7.4.1-debian
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/local/src/mdadm/mdadm-3.2.5/mdadm...done.
> (gdb) run
> Starting program: /usr/local/src/mdadm/mdadm-3.2.5/mdadm /dev/md3 --add /dev/sdb1
> 
> Program received signal SIGSEGV, Segmentation fault.
> write_init_super1 (st=0x69b630) at super1.c:1248
> 1248			fprintf(stderr,	Name ": Failed to write metadata to %s\n",
> (gdb) bt
> #0  write_init_super1 (st=0x69b630) at super1.c:1248
> #1  0x0000000000414600 in Manage_subdevs (devname=0x7fffffffebf3 "/dev/md3", fd=8, devlist=0x698030, verbose=0, 
>     test=0, update=0x0, force=0) at Manage.c:952
> #2  0x00000000004060fb in main (argc=4, argv=0x7fffffffe8f8) at mdadm.c:1245
> (gdb) p di
> $1 = (struct devinfo *) 0x0
> (gdb) p rv
> $2 = 4
> (gdb) p st
> $3 = (struct supertype *) 0x69b630
> (gdb) p *st
> $4 = {ss = 0x683cc0, minor_version = 2, max_devs = 1920, container_dev = 8388608, sb = 0x6ae000, info = 0x6ad7f0, 
>   ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, sock = 0, devnum = 3, devname = 0x0, 
>   devcnt = 0, retry_soon = 0, devs = 0x0}
> (gdb) p st->info
> $5 = (void *) 0x6ad7f0
> (gdb) p *(struct devinfo *)(st->info)
> $6 = {fd = -1, devname = 0x7fffffffec02 "/dev/sdb1", disk = {number = 2, major = 8, minor = 17, raid_disk = -1, 
>     state = 0}, next = 0x0}
> (gdb) quit
> A debugging session is active.
> 
> 	Inferior 1 [process 6824] will be killed.
> 
> Quit anyway? (y or n) y
> tucano:/usr/local/src/mdadm/mdadm-3.2.5# 
> 



More information about the pkg-mdadm-devel mailing list