Bug#664088: mdadm fails to initialize components for bitmap
Markus Hochholdinger
Markus at hochholdinger.net
Fri Apr 13 12:39:14 UTC 2012
Hello Michael,
I can reproduce this with Linux 3.2.0-2-amd64 (3.2.12-1) and mdadm (3.2.3-2):
mdadm --grow /dev/md0 --bitmap=none
mdadm --grow /dev/md0 --bitmap=internal
And a few seconds later the kernel reboots:
[ 75.119802] md0: bitmap file is out of date (0 < 147) -- forcing full
recovery
[ 75.119817] created bitmap (1 pages) for device md0
[ 80.797978] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[ 80.797996] IP: [<ffffffffa00272c1>] bitmap_endwrite+0x131/0x18f [md_mod]
[ 80.798013] PGD 0
[ 80.798020] Oops: 0000 [#1] SMP
[ 80.798028] CPU 0
[ 80.798032] Modules linked in: fuse evdev snd_pcm snd_page_alloc snd_timer
snd soundcore pcspkr ext3 mbcache jbd raid1 md_mod xen_netfront xen_blkfront
[ 80.798070]
[ 80.798075] Pid: 0, comm: swapper/0 Not tainted 3.2.0-2-amd64 #1
[ 80.798086] RIP: e030:[<ffffffffa00272c1>] [<ffffffffa00272c1>]
bitmap_endwrite+0x131/0x18f [md_mod]
[ 80.798102] RSP: e02b:ffff8800ffe5ec88 EFLAGS: 00010046
[ 80.798109] RAX: 0000000000000000 RBX: ffff8800031835c0 RCX:
0000000000000888
[ 80.798117] RDX: 0000000000000000 RSI: 0000000000000088 RDI:
ffff8800031835c0
[ 80.798124] RBP: 0000000001103d50 R08: 0000000000000000 R09:
0000000000000000
[ 80.798132] R10: 0000000000000246 R11: 0000000000000246 R12:
0000000000000008
[ 80.798140] R13: ffff8800031835fc R14: ffff880003180078 R15:
0000000000000001
[ 80.798154] FS: 00007f9fe4ac57c0(0000) GS:ffff8800ffe5b000(0000)
knlGS:0000000000000000
[ 80.798165] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 80.798172] CR2: 0000000000000010 CR3: 00000000f48cc000 CR4:
0000000000002660
[ 80.798181] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 80.798189] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 80.798198] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task
ffffffff8160d020)
[ 80.798207] Stack:
[ 80.798211] 0000000000000246 ffff880003183678 0000000000000000
000000000001c2b0
[ 80.798227] ffff8800032f0040 ffff8800f57c00c0 ffff88000309c3c0
ffff8800032f2e48
[ 80.798243] 000000000000000b ffff8800030a54d0 0000000000000000
ffffffffa0006808
[ 80.798258] Call Trace:
[ 80.798263] <IRQ>
[ 80.798273] [<ffffffffa0006808>] ? close_write+0x71/0x7d [raid1]
[ 80.798284] [<ffffffffa0009677>] ? r1_bio_write_done+0x1e/0x37 [raid1]
[ 80.798295] [<ffffffffa00097a8>] ? raid1_end_write_request+0x118/0x134
[raid1]
[ 80.798309] [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
[ 80.798320] [<ffffffff81006c52>] ? check_events+0x12/0x20
[ 80.798331] [<ffffffff811965e8>] ? blk_update_request+0x18c/0x30a
[ 80.798341] [<ffffffff8119677e>] ? blk_update_bidi_request+0x18/0x63
[ 80.798351] [<ffffffff81197a0c>] ? __blk_end_bidi_request+0xe/0x27
[ 80.798361] [<ffffffff81197a3f>] ? __blk_end_request_all+0x1a/0x23
[ 80.798371] [<ffffffffa0000794>] ? blkif_interrupt+0x23f/0x2ae
[xen_blkfront]
[ 80.798384] [<ffffffff81090581>] ? handle_irq_event_percpu+0x50/0x180
[ 80.798394] [<ffffffff81070733>] ? arch_local_irq_restore+0x7/0x8
[ 80.798405] [<ffffffff81062441>] ? hrtimer_get_next_event+0x79/0x8f
[ 80.798414] [<ffffffff810906e5>] ? handle_irq_event+0x34/0x53
[ 80.798425] [<ffffffff81219206>] ? ack_dynirq+0x17/0x2e
[ 80.798435] [<ffffffff81092b21>] ? handle_edge_irq+0xa2/0xc9
[ 80.798446] [<ffffffff81218f25>] ? __xen_evtchn_do_upcall+0x157/0x1f2
[ 80.798457] [<ffffffff8106b8b3>] ? arch_local_irq_restore+0x7/0x8
[ 80.798468] [<ffffffff8121a548>] ? xen_evtchn_do_upcall+0x22/0x32
[ 80.798480] [<ffffffff813502be>] ? xen_do_hypervisor_callback+0x1e/0x30
[ 80.798488] <EOI>
[ 80.798496] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 80.798505] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 80.798515] [<ffffffff8100663a>] ? xen_safe_halt+0xc/0x13
[ 80.798525] [<ffffffff810144fc>] ? default_idle+0x47/0x7f
[ 80.798535] [<ffffffff8100d252>] ? cpu_idle+0xaf/0xf2
[ 80.798545] [<ffffffff816aab3d>] ? start_kernel+0x3bd/0x3c8
[ 80.798554] [<ffffffff816ac64a>] ? xen_start_kernel+0x590/0x596
[ 80.798561] Code: 77 0a 01 e1 48 8b 04 24 66 8b 10 ff ca 66 83 fa 02 66 89
10 77 2e 48 8b 4b 20 48 89 ee 48 89 df 83 e9 09 48 d3 ee e8 f9 f4 ff ff <48>
8b 40 10 48 8b 53 58 8d 04 85 01 00 00 00 0f ab 02 c7 43 78
[ 80.798665] RIP [<ffffffffa00272c1>] bitmap_endwrite+0x131/0x18f [md_mod]
[ 80.798678] RSP <ffff8800ffe5ec88>
[ 80.798683] CR2: 0000000000000010
[ 80.798691] ---[ end trace 96c25711b3dbe8e9 ]---
[ 80.798697] Kernel panic - not syncing: Fatal exception in interrupt
[ 80.798705] Pid: 0, comm: swapper/0 Tainted: G D 3.2.0-2-amd64 #1
[ 80.798712] Call Trace:
[ 80.798716] <IRQ> [<ffffffff81342930>] ? panic+0x95/0x1a5
[ 80.798731] [<ffffffff81070733>] ? arch_local_irq_restore+0x7/0x8
[ 80.798741] [<ffffffff81349e86>] ? oops_end+0xa9/0xb6
[ 80.798750] [<ffffffff8134227c>] ? no_context+0x1ff/0x20e
[ 80.798760] [<ffffffff8102bac8>] ? pvclock_clocksource_read+0x42/0xb2
[ 80.798770] [<ffffffff8134be99>] ? do_page_fault+0x1a8/0x337
[ 80.798779] [<ffffffff81006c3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[ 80.798789] [<ffffffff810b7227>] ? arch_local_irq_restore+0x7/0x8
[ 80.798800] [<ffffffff810359f1>] ? set_task_rq+0x23/0x35
[ 80.798809] [<ffffffff8104098c>] ? select_task_rq_fair+0x39f/0x67e
[ 80.798819] [<ffffffff8102bac8>] ? pvclock_clocksource_read+0x42/0xb2
[ 80.798828] [<ffffffff8102bac8>] ? pvclock_clocksource_read+0x42/0xb2
[ 80.798838] [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
[ 80.798848] [<ffffffff81006c52>] ? check_events+0x12/0x20
[ 80.798857] [<ffffffff813495f5>] ? page_fault+0x25/0x30
[ 80.798870] [<ffffffffa00272c1>] ? bitmap_endwrite+0x131/0x18f [md_mod]
[ 80.798882] [<ffffffffa00272c1>] ? bitmap_endwrite+0x131/0x18f [md_mod]
[ 80.798893] [<ffffffffa0006808>] ? close_write+0x71/0x7d [raid1]
[ 80.798903] [<ffffffffa0009677>] ? r1_bio_write_done+0x1e/0x37 [raid1]
[ 80.798914] [<ffffffffa00097a8>] ? raid1_end_write_request+0x118/0x134
[raid1]
[ 80.798925] [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
[ 80.798935] [<ffffffff81006c52>] ? check_events+0x12/0x20
[ 80.801968] [<ffffffff811965e8>] ? blk_update_request+0x18c/0x30a
[ 80.801968] [<ffffffff8119677e>] ? blk_update_bidi_request+0x18/0x63
[ 80.801968] [<ffffffff81197a0c>] ? __blk_end_bidi_request+0xe/0x27
[ 80.801968] [<ffffffff81197a3f>] ? __blk_end_request_all+0x1a/0x23
[ 80.801968] [<ffffffffa0000794>] ? blkif_interrupt+0x23f/0x2ae
[xen_blkfront]
[ 80.801968] [<ffffffff81090581>] ? handle_irq_event_percpu+0x50/0x180
[ 80.801968] [<ffffffff81070733>] ? arch_local_irq_restore+0x7/0x8
[ 80.801968] [<ffffffff81062441>] ? hrtimer_get_next_event+0x79/0x8f
[ 80.801968] [<ffffffff810906e5>] ? handle_irq_event+0x34/0x53
[ 80.801968] [<ffffffff81219206>] ? ack_dynirq+0x17/0x2e
[ 80.801968] [<ffffffff81092b21>] ? handle_edge_irq+0xa2/0xc9
[ 80.801968] [<ffffffff81218f25>] ? __xen_evtchn_do_upcall+0x157/0x1f2
[ 80.801968] [<ffffffff8106b8b3>] ? arch_local_irq_restore+0x7/0x8
[ 80.801968] [<ffffffff8121a548>] ? xen_evtchn_do_upcall+0x22/0x32
[ 80.801968] [<ffffffff813502be>] ? xen_do_hypervisor_callback+0x1e/0x30
[ 80.801968] <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 80.801968] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 80.801968] [<ffffffff8100663a>] ? xen_safe_halt+0xc/0x13
[ 80.801968] [<ffffffff810144fc>] ? default_idle+0x47/0x7f
[ 80.801968] [<ffffffff8100d252>] ? cpu_idle+0xaf/0xf2
[ 80.801968] [<ffffffff816aab3d>] ? start_kernel+0x3bd/0x3c8
[ 80.801968] [<ffffffff816ac64a>] ? xen_start_kernel+0x590/0x596
With a static linked mdadm (v2.6.7 - 6th June 2008) all keeps fine:
/tmp/mdadm.static64 --grow /dev/md0 --bitmap=none
/tmp/mdadm.static64 --grow /dev/md0 --bitmap=internal
[ 281.355160] md: md0: resync done.
[ 345.359062] md0: bitmap file is out of date (0 < 210) -- forcing full
recovery
[ 345.359077] created bitmap (160 pages) for device md0
[ 345.359156] md0: bitmap file is out of date, doing full recovery
[ 345.392291] md0: bitmap initialized from disk: read 11/11 pages, set 327679
of 327679 bits
And the kernel keeps going without interruption.
I'm puzzled because the older mdadm doesn't trigger this bug. Hope this helps.
--
greetings
eMHa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.alioth.debian.org/pipermail/pkg-mdadm-devel/attachments/20120413/ff3b7143/attachment.pgp>
More information about the pkg-mdadm-devel
mailing list