Bug#496334: mdadm segfault on --assemble --force with raid10

Török Edwin edwintorok at gmail.com
Sun Aug 24 14:57:24 UTC 2008


Package: mdadm
Version: 2.6.7-3
Severity: critical
Justification: breaks the whole system


My raid10 array got marked faulty, and mdadm refused to assemble
(says it only got 2 devices, and needs 6).
I tried to run mdadm --assemble --force /dev/md4, but I got a segfault.
I couldn't reassemble the array using mdadm 2.6.7 in any way.

However I found a solution:
booted the laptop, and downloaded latest mdadm git from
git://neil.brown.name/mdadm, and compiled it (laptop is 32 bit), then 
transfered it to my (64-bit) box using netcat:
On the 64-bit box with the failed raid array: nc -l -p 1000 >x
On the laptop: nc 192.168.0.199 100 <mdadm
Then chmod +x x, ./x --assemble --force /dev/md4, and it WORKED!
I have the array up now, and a resync is in progress.

dmesg:
[ 1306.867850] md: bind<sdb3>
[ 1306.868912] md: bind<sdd3>
[ 1306.871850] md: bind<sde3>
[ 1306.871850] md: bind<sdc3>
[ 1306.867850] md: bind<sdf3>
[ 1306.871850] md: bind<sda3>
[ 1322.306694] md: kicking non-fresh sdc3 from array!
[ 1322.306694] md: unbind<sdc3>
[ 1322.306694] md: export_rdev(sdc3)
[ 1322.306694] md: kicking non-fresh sde3 from array!
[ 1322.306694] md: unbind<sde3>
[ 1322.306694] md: export_rdev(sde3)
[ 1322.306694] md: kicking non-fresh sdd3 from array!
[ 1322.306694] md: unbind<sdd3>
[ 1322.306694] md: export_rdev(sdd3)
[ 1322.306694] md: kicking non-fresh sdb3 from array!
[ 1322.306694] md: unbind<sdb3>
[ 1322.306694] md: export_rdev(sdb3)
[ 1322.306694] md: md4: raid array is not clean -- starting background reconstruction
[ 1322.838707] raid10: not enough operational mirrors for md4
[ 1322.838707] md: pers->run() failed ...
[ 1410.510714] md: md4 stopped.
[ 1410.510714] md: unbind<sda3>
[ 1410.510714] md: export_rdev(sda3)
[ 1410.518705] md: unbind<sdf3>
[ 1410.518705] md: export_rdev(sdf3)
[ 1623.820925] md: md4 stopped.
[ 1624.015384] md: bind<sdb3>
[ 1624.015384] md: bind<sdd3>
[ 1624.015511] md: bind<sde3>
[ 1624.015632] md: bind<sdc3>
[ 1624.016165] md: bind<sdf3>
[ 1624.016165] md: bind<sda3>
[ 1866.359643] md: md4 stopped.
[ 1866.359643] md: unbind<sda3>
[ 1866.359643] md: export_rdev(sda3)
[ 1866.359643] md: unbind<sdf3>
[ 1866.359643] md: export_rdev(sdf3)
[ 1866.359643] md: unbind<sdc3>
[ 1866.359643] md: export_rdev(sdc3)
[ 1866.359643] md: unbind<sde3>
[ 1866.359643] md: export_rdev(sde3)
[ 1866.359643] md: unbind<sdd3>
[ 1866.359643] md: export_rdev(sdd3)
[ 1866.359643] md: unbind<sdb3>
[ 1866.359643] md: export_rdev(sdb3)
[ 1866.599690] mdadm[3205]: segfault at a0 ip 417db5 sp 7fff5f3a6580 error 4 in mdadm[400000+29000]
^^^^^Here is where I have run mdadm --assemble --force /dev/md4

[ 3117.017725] md: md4 stopped.
[ 3117.185350] md: bind<sdb3>
[ 3117.186526] md: bind<sdd3>
[ 3117.186526] md: bind<sde3>
[ 3117.189911] md: bind<sdc3>
[ 3117.185350] md: bind<sdf3>
[ 3117.186526] md: bind<sda3>
[ 3122.470735] md: md4 stopped.
[ 3122.470735] md: unbind<sda3>
[ 3122.470735] md: export_rdev(sda3)
[ 3122.470735] md: unbind<sdf3>
[ 3122.470735] md: export_rdev(sdf3)
[ 3122.470735] md: unbind<sdc3>
[ 3122.470735] md: export_rdev(sdc3)
[ 3122.470735] md: unbind<sde3>
[ 3122.470735] md: export_rdev(sde3)
[ 3122.470735] md: unbind<sdd3>
[ 3122.470735] md: export_rdev(sdd3)
[ 3122.470735] md: unbind<sdb3>
[ 3122.470735] md: export_rdev(sdb3)
[ 3122.624492] md: bind<sdb3>
[ 3122.625244] md: bind<sdd3>
[ 3122.626323] md: bind<sde3>
[ 3122.626323] md: bind<sdc3>
[ 3122.626323] md: bind<sdf3>
[ 3122.626833] md: bind<sda3>
[ 3122.626898] md: md4: raid array is not clean -- starting background reconstruction

Here is where I've run the new mdadm from git, and it was able to start the
array:
[ 3122.655983] raid10: raid set md4 active with 6 out of 6 devices
[ 3161.721339] md: resync of RAID array md4
[ 3161.721339] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 3161.721339] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 3161.721339] md: using 128k window, over a total of 2159617728 blocks.

I don't know if mdadm from git worked because the bug in 2.6.7 got fixed,
or because it was a 32-bit build, but I suggest updating mdadm in debian to
include patches from upstream git relevant to this problem.

-- Package-specific info:
--- mount output
/dev/md3 on / type ext3 (rw,noatime,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
tmpfs on /tmp type tmpfs (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
/dev/mapper/vg--all-lv--usr on /usr type xfs (rw)
/dev/mapper/vg--all-lv--var on /var type xfs (rw)
/dev/mapper/vg--all-lv--home on /home type reiserfs (rw,noatime,notail)

--- mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md3 level=raid1 num-devices=6 UUID=8b4b5bca:f8e579d9:318cfef2:eb9ed6d4
ARRAY /dev/md4 level=raid10 num-devices=6 UUID=1156c578:5138c7ee:318cfef2:eb9ed6d4
	devices=/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sda3
# This file was auto-generated on Sat, 23 Aug 2008 13:25:32 +0000
# by mkconf $Id$

--- /proc/mdstat:
Personalities : [raid1] [raid10] 
md4 : active raid10 sda3[0] sdf3[5] sdc3[4] sde3[3] sdd3[2] sdb3[1]
      2159617728 blocks 64K chunks 2 near-copies [6/6] [UUUUUU]
      [>....................]  resync =  1.4% (31960768/2159617728) finish=174.9min speed=202727K/sec
      
md3 : active raid1 sda1[0] sdf1[5] sdc1[4] sde1[3] sdd1[2] sdb1[1]
      9767424 blocks [6/6] [UUUUUU]
      
unused devices: <none>

--- /proc/partitions:
major minor  #blocks  name

   8     0  732574584 sda
   8     1    9767488 sda1
   8     2    2931862 sda2
   8     3  719872650 sda3
   8    16  732574584 sdb
   8    17    9767488 sdb1
   8    18    2931862 sdb2
   8    19  719872650 sdb3
   8    32  732574584 sdc
   8    33    9767488 sdc1
   8    34    2931862 sdc2
   8    35  719872650 sdc3
   8    48  732574584 sdd
   8    49    9767488 sdd1
   8    50    2931862 sdd2
   8    51  719872650 sdd3
   8    64  732574584 sde
   8    65    9767488 sde1
   8    66    2931862 sde2
   8    67  719872650 sde3
   8    80  732574584 sdf
   8    81    9767488 sdf1
   8    82    2931862 sdf2
   8    83  719872650 sdf3
   9     3    9767424 md3
   9     4 2159617728 md4
 253     0  104857600 dm-0
 253     1 1363148800 dm-1
 253     2  629145600 dm-2
 253     3   10485760 dm-3

--- initrd.img-2.6.26-1-amd64:
45881 blocks
etc/mdadm
etc/mdadm/mdadm.conf
scripts/local-top/mdadm
sbin/mdadm
lib/modules/2.6.26-1-amd64/kernel/drivers/md/multipath.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/linear.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/dm-log.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/raid456.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/dm-snapshot.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/dm-mirror.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/md-mod.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/dm-mod.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/raid10.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/raid1.ko
lib/modules/2.6.26-1-amd64/kernel/drivers/md/raid0.ko

--- /proc/modules:
dm_mirror 20608 0 - Live 0xffffffffa0165000
dm_log 13956 1 dm_mirror, Live 0xffffffffa0160000
dm_snapshot 19400 0 - Live 0xffffffffa015a000
dm_mod 58864 11 dm_mirror,dm_log,dm_snapshot, Live 0xffffffffa014a000
raid10 23680 1 - Live 0xffffffffa0143000
raid1 24192 1 - Live 0xffffffffa013c000
md_mod 80164 4 raid10,raid1, Live 0xffffffffa0127000

--- volume detail:

--- /proc/cmdline
root=/dev/md3 ro single

--- grub:
kernel		/boot/vmlinuz-2.6.27-rc4 root=/dev/md3 ro 
kernel		/boot/vmlinuz-2.6.27-rc4 root=/dev/md3 ro single
kernel		/boot/vmlinuz-2.6.26-1-amd64 root=/dev/md3 ro 
kernel		/boot/vmlinuz-2.6.26-1-amd64 root=/dev/md3 ro single


-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages mdadm depends on:
ii  debconf                       1.5.23     Debian configuration management sy
ii  libc6                         2.7-13     GNU C Library: Shared libraries
ii  lsb-base                      3.2-20     Linux Standard Base 3.2 init scrip
ii  makedev                       2.3.1-88   creates device files in /dev
ii  udev                          0.125-5    /dev/ and hotplug management daemo

Versions of packages mdadm recommends:
ii  module-init-tools             3.4-1      tools for managing Linux kernel mo
ii  postfix [mail-transport-agent 2.5.4-1    High-performance mail transport ag

mdadm suggests no packages.

-- debconf information:
  mdadm/autostart: true
* mdadm/mail_to: root
  mdadm/initrdstart_msg_errmd:
* mdadm/initrdstart: all
  mdadm/initrdstart_msg_errconf:
  mdadm/initrdstart_notinconf: false
  mdadm/initrdstart_msg_errexist:
  mdadm/initrdstart_msg_intro:
* mdadm/autocheck: true
  mdadm/initrdstart_msg_errblock:
* mdadm/start_daemon: true





More information about the pkg-mdadm-devel mailing list