Bug#664088: mdadm fails to initialize components for bitmap

Fri Apr 13 12:39:14 UTC 2012

Hello Michael,

I can reproduce this with Linux 3.2.0-2-amd64 (3.2.12-1) and mdadm (3.2.3-2):
mdadm --grow /dev/md0 --bitmap=none
mdadm --grow /dev/md0 --bitmap=internal

And a few seconds later the kernel reboots:
[   75.119802] md0: bitmap file is out of date (0 < 147) -- forcing full 
recovery
[   75.119817] created bitmap (1 pages) for device md0
[   80.797978] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000010
[   80.797996] IP: [<ffffffffa00272c1>] bitmap_endwrite+0x131/0x18f [md_mod]
[   80.798013] PGD 0 
[   80.798020] Oops: 0000 [#1] SMP 
[   80.798028] CPU 0 
[   80.798032] Modules linked in: fuse evdev snd_pcm snd_page_alloc snd_timer 
snd soundcore pcspkr ext3 mbcache jbd raid1 md_mod xen_netfront xen_blkfront
[   80.798070] 
[   80.798075] Pid: 0, comm: swapper/0 Not tainted 3.2.0-2-amd64 #1  
[   80.798086] RIP: e030:[<ffffffffa00272c1>]  [<ffffffffa00272c1>] 
bitmap_endwrite+0x131/0x18f [md_mod]
[   80.798102] RSP: e02b:ffff8800ffe5ec88  EFLAGS: 00010046
[   80.798109] RAX: 0000000000000000 RBX: ffff8800031835c0 RCX: 
0000000000000888
[   80.798117] RDX: 0000000000000000 RSI: 0000000000000088 RDI: 
ffff8800031835c0
[   80.798124] RBP: 0000000001103d50 R08: 0000000000000000 R09: 
0000000000000000
[   80.798132] R10: 0000000000000246 R11: 0000000000000246 R12: 
0000000000000008
[   80.798140] R13: ffff8800031835fc R14: ffff880003180078 R15: 
0000000000000001
[   80.798154] FS:  00007f9fe4ac57c0(0000) GS:ffff8800ffe5b000(0000) 
knlGS:0000000000000000
[   80.798165] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[   80.798172] CR2: 0000000000000010 CR3: 00000000f48cc000 CR4: 
0000000000002660
[   80.798181] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   80.798189] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[   80.798198] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task 
ffffffff8160d020)
[   80.798207] Stack:
[   80.798211]  0000000000000246 ffff880003183678 0000000000000000 
000000000001c2b0
[   80.798227]  ffff8800032f0040 ffff8800f57c00c0 ffff88000309c3c0 
ffff8800032f2e48
[   80.798243]  000000000000000b ffff8800030a54d0 0000000000000000 
ffffffffa0006808
[   80.798258] Call Trace:
[   80.798263]  <IRQ> 
[   80.798273]  [<ffffffffa0006808>] ? close_write+0x71/0x7d [raid1]
[   80.798284]  [<ffffffffa0009677>] ? r1_bio_write_done+0x1e/0x37 [raid1]
[   80.798295]  [<ffffffffa00097a8>] ? raid1_end_write_request+0x118/0x134 
[raid1]
[   80.798309]  [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
[   80.798320]  [<ffffffff81006c52>] ? check_events+0x12/0x20
[   80.798331]  [<ffffffff811965e8>] ? blk_update_request+0x18c/0x30a
[   80.798341]  [<ffffffff8119677e>] ? blk_update_bidi_request+0x18/0x63
[   80.798351]  [<ffffffff81197a0c>] ? __blk_end_bidi_request+0xe/0x27
[   80.798361]  [<ffffffff81197a3f>] ? __blk_end_request_all+0x1a/0x23
[   80.798371]  [<ffffffffa0000794>] ? blkif_interrupt+0x23f/0x2ae 
[xen_blkfront]
[   80.798384]  [<ffffffff81090581>] ? handle_irq_event_percpu+0x50/0x180
[   80.798394]  [<ffffffff81070733>] ? arch_local_irq_restore+0x7/0x8
[   80.798405]  [<ffffffff81062441>] ? hrtimer_get_next_event+0x79/0x8f
[   80.798414]  [<ffffffff810906e5>] ? handle_irq_event+0x34/0x53
[   80.798425]  [<ffffffff81219206>] ? ack_dynirq+0x17/0x2e
[   80.798435]  [<ffffffff81092b21>] ? handle_edge_irq+0xa2/0xc9
[   80.798446]  [<ffffffff81218f25>] ? __xen_evtchn_do_upcall+0x157/0x1f2
[   80.798457]  [<ffffffff8106b8b3>] ? arch_local_irq_restore+0x7/0x8
[   80.798468]  [<ffffffff8121a548>] ? xen_evtchn_do_upcall+0x22/0x32
[   80.798480]  [<ffffffff813502be>] ? xen_do_hypervisor_callback+0x1e/0x30
[   80.798488]  <EOI> 
[   80.798496]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[   80.798505]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[   80.798515]  [<ffffffff8100663a>] ? xen_safe_halt+0xc/0x13
[   80.798525]  [<ffffffff810144fc>] ? default_idle+0x47/0x7f
[   80.798535]  [<ffffffff8100d252>] ? cpu_idle+0xaf/0xf2
[   80.798545]  [<ffffffff816aab3d>] ? start_kernel+0x3bd/0x3c8
[   80.798554]  [<ffffffff816ac64a>] ? xen_start_kernel+0x590/0x596
[   80.798561] Code: 77 0a 01 e1 48 8b 04 24 66 8b 10 ff ca 66 83 fa 02 66 89 
10 77 2e 48 8b 4b 20 48 89 ee 48 89 df 83 e9 09 48 d3 ee e8 f9 f4 ff ff <48> 
8b 40 10 48 8b 53 58 8d 04 85 01 00 00 00 0f ab 02 c7 43 78 
[   80.798665] RIP  [<ffffffffa00272c1>] bitmap_endwrite+0x131/0x18f [md_mod]
[   80.798678]  RSP <ffff8800ffe5ec88>
[   80.798683] CR2: 0000000000000010
[   80.798691] ---[ end trace 96c25711b3dbe8e9 ]---
[   80.798697] Kernel panic - not syncing: Fatal exception in interrupt
[   80.798705] Pid: 0, comm: swapper/0 Tainted: G      D      3.2.0-2-amd64 #1
[   80.798712] Call Trace:
[   80.798716]  <IRQ>  [<ffffffff81342930>] ? panic+0x95/0x1a5
[   80.798731]  [<ffffffff81070733>] ? arch_local_irq_restore+0x7/0x8
[   80.798741]  [<ffffffff81349e86>] ? oops_end+0xa9/0xb6
[   80.798750]  [<ffffffff8134227c>] ? no_context+0x1ff/0x20e
[   80.798760]  [<ffffffff8102bac8>] ? pvclock_clocksource_read+0x42/0xb2
[   80.798770]  [<ffffffff8134be99>] ? do_page_fault+0x1a8/0x337
[   80.798779]  [<ffffffff81006c3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[   80.798789]  [<ffffffff810b7227>] ? arch_local_irq_restore+0x7/0x8
[   80.798800]  [<ffffffff810359f1>] ? set_task_rq+0x23/0x35
[   80.798809]  [<ffffffff8104098c>] ? select_task_rq_fair+0x39f/0x67e
[   80.798819]  [<ffffffff8102bac8>] ? pvclock_clocksource_read+0x42/0xb2
[   80.798828]  [<ffffffff8102bac8>] ? pvclock_clocksource_read+0x42/0xb2
[   80.798838]  [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
[   80.798848]  [<ffffffff81006c52>] ? check_events+0x12/0x20
[   80.798857]  [<ffffffff813495f5>] ? page_fault+0x25/0x30
[   80.798870]  [<ffffffffa00272c1>] ? bitmap_endwrite+0x131/0x18f [md_mod]
[   80.798882]  [<ffffffffa00272c1>] ? bitmap_endwrite+0x131/0x18f [md_mod]
[   80.798893]  [<ffffffffa0006808>] ? close_write+0x71/0x7d [raid1]
[   80.798903]  [<ffffffffa0009677>] ? r1_bio_write_done+0x1e/0x37 [raid1]
[   80.798914]  [<ffffffffa00097a8>] ? raid1_end_write_request+0x118/0x134 
[raid1]
[   80.798925]  [<ffffffff81006670>] ? xen_force_evtchn_callback+0x9/0xa
[   80.798935]  [<ffffffff81006c52>] ? check_events+0x12/0x20
[   80.801968]  [<ffffffff811965e8>] ? blk_update_request+0x18c/0x30a
[   80.801968]  [<ffffffff8119677e>] ? blk_update_bidi_request+0x18/0x63
[   80.801968]  [<ffffffff81197a0c>] ? __blk_end_bidi_request+0xe/0x27
[   80.801968]  [<ffffffff81197a3f>] ? __blk_end_request_all+0x1a/0x23
[   80.801968]  [<ffffffffa0000794>] ? blkif_interrupt+0x23f/0x2ae 
[xen_blkfront]
[   80.801968]  [<ffffffff81090581>] ? handle_irq_event_percpu+0x50/0x180
[   80.801968]  [<ffffffff81070733>] ? arch_local_irq_restore+0x7/0x8
[   80.801968]  [<ffffffff81062441>] ? hrtimer_get_next_event+0x79/0x8f
[   80.801968]  [<ffffffff810906e5>] ? handle_irq_event+0x34/0x53
[   80.801968]  [<ffffffff81219206>] ? ack_dynirq+0x17/0x2e
[   80.801968]  [<ffffffff81092b21>] ? handle_edge_irq+0xa2/0xc9
[   80.801968]  [<ffffffff81218f25>] ? __xen_evtchn_do_upcall+0x157/0x1f2
[   80.801968]  [<ffffffff8106b8b3>] ? arch_local_irq_restore+0x7/0x8
[   80.801968]  [<ffffffff8121a548>] ? xen_evtchn_do_upcall+0x22/0x32
[   80.801968]  [<ffffffff813502be>] ? xen_do_hypervisor_callback+0x1e/0x30
[   80.801968]  <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[   80.801968]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[   80.801968]  [<ffffffff8100663a>] ? xen_safe_halt+0xc/0x13
[   80.801968]  [<ffffffff810144fc>] ? default_idle+0x47/0x7f
[   80.801968]  [<ffffffff8100d252>] ? cpu_idle+0xaf/0xf2
[   80.801968]  [<ffffffff816aab3d>] ? start_kernel+0x3bd/0x3c8
[   80.801968]  [<ffffffff816ac64a>] ? xen_start_kernel+0x590/0x596

With a static linked mdadm (v2.6.7 - 6th June 2008) all keeps fine:
/tmp/mdadm.static64 --grow /dev/md0 --bitmap=none
/tmp/mdadm.static64 --grow /dev/md0 --bitmap=internal

[  281.355160] md: md0: resync done.
[  345.359062] md0: bitmap file is out of date (0 < 210) -- forcing full 
recovery
[  345.359077] created bitmap (160 pages) for device md0
[  345.359156] md0: bitmap file is out of date, doing full recovery
[  345.392291] md0: bitmap initialized from disk: read 11/11 pages, set 327679 
of 327679 bits

And the kernel keeps going without interruption.

I'm puzzled because the older mdadm doesn't trigger this bug. Hope this helps.

-- 
greetings

eMHa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.alioth.debian.org/pipermail/pkg-mdadm-devel/attachments/20120413/ff3b7143/attachment.pgp>