Bug#580312: multipath-tools: multipathd segfaults when restarted

Guido Günther agx at sigxcpu.org
Mon May 10 16:23:10 UTC 2010


On Mon, May 10, 2010 at 11:09:41AM +1000, Vincent.McIntyre at csiro.au wrote:
> 
> I changed the setup slightly, connecting the storage unit and the host
> to a FC switch. There are still two LUNs and now there are 4 paths to
> each.
> 
> # multipath -l
> mpath1 (2227300015530e20d) dm-1 Promise ,VTrak E610f
> [size=13T][features=1 queue_if_no_path][hwhandler=0]
> \_ round-robin 0 [prio=0][active]
>  \_ 1:0:0:1 sdf 8:80  [active][undef]
>  \_ 1:0:1:1 sdh 8:112 [active][undef]
>  \_ 1:0:2:1 sdj 8:144 [active][undef]
>  \_ 1:0:3:1 sdl 8:176 [active][undef]
> mpath0 (2228f000155e2acda) dm-0 Promise ,VTrak E610f
> [size=13T][features=1 queue_if_no_path][hwhandler=0]
> \_ round-robin 0 [prio=0][active]
>  \_ 1:0:0:0 sde 8:64  [active][undef]
>  \_ 1:0:1:0 sdg 8:96  [active][undef]
>  \_ 1:0:2:0 sdi 8:128 [active][undef]
>  \_ 1:0:3:0 sdk 8:160 [active][undef]
> 
> I could go back to the other configuration briefly if you wish.
> 
> 
> > Could you run multipathd under valgrind please?
> 
> I ran it, then tried to stop with the init script (which didn't seem
> to work) and then tried with 'kill -HUP' and then just 'kill'.
> Somehow the last command caused my shell to attach to the process,
> I then hit ^C.
> Details below.
> 
> I tried this twice, with and without the filesystems mounted.
> Results were similar. Between mounting and running the second time
> I briefly exercised the filesystems by copying a bit of data from one to
> the other. I didn't try to stop/start under load.
> 
> # /etc/init.d/multipath-tools stop
> # ps -fade|grep multi
> root     12124 12098  0 10:45 pts/2    00:00:00 grep multi
> 
> # valgrind multipathd
> ==12134== Memcheck, a memory error detector.
> ==12134== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
> ==12134== Using LibVEX rev 1854, a library for dynamic binary translation.
> ==12134== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
> ==12134== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
> ==12134== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
> ==12134== For more details, rerun with: -v
> ==12134==
> ==12135==
> ==12135== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 1)
> ==12135== malloc/free: in use at exit: 1,120 bytes in 23 blocks.
> ==12135== malloc/free: 48 allocs, 25 frees, 4,716 bytes allocated.
> ==12135== For counts of detected errors, rerun with: -v
> ==12135== searching for pointers to 23 not-freed blocks.
> ==12135== checked 208,864 bytes.
> ==12135==
> ==12135== LEAK SUMMARY:
> ==12135==    definitely lost: 0 bytes in 0 blocks.
> ==12135==      possibly lost: 0 bytes in 0 blocks.
> ==12135==    still reachable: 1,120 bytes in 23 blocks.
> ==12135==         suppressed: 0 bytes in 0 blocks.
> ==12135== Rerun with --leak-check=full to see details of leaked memory.
> ==12134==
> ==12134== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 1)
> ==12134== malloc/free: in use at exit: 216 bytes in 1 blocks.
> ==12134== malloc/free: 48 allocs, 47 frees, 4,716 bytes allocated.
> ==12134== For counts of detected errors, rerun with: -v
> ==12134== searching for pointers to 1 not-freed blocks.
> ==12134== checked 208,048 bytes.
> ==12134==
> ==12134== LEAK SUMMARY:
> ==12134==    definitely lost: 0 bytes in 0 blocks.
> ==12134==      possibly lost: 0 bytes in 0 blocks.
> ==12134==    still reachable: 216 bytes in 1 blocks.
> ==12134==         suppressed: 0 bytes in 0 blocks.
> ==12134== Rerun with --leak-check=full to see details of leaked memory.
> 
> # ps -fade|grep multip
> root     12136     1  5 10:45 ?        00:00:00 /usr/bin/valgrind.bin multipathd
> root     12166 12098  0 10:45 pts/2    00:00:00 grep multip
> 
> # /etc/init.d/multipath-tools stop
> Stopping multipath daemon: multipathd.
> 
> # ps -fade|grep multip
> root     12136     1  2 10:45 ?        00:00:00 /usr/bin/valgrind.bin multipathd
> root     12172 12098  0 10:45 pts/2    00:00:00 grep multip
> 
> # kill -HUP 12136
> # ps -fade|grep multip
> root     12136     1  2 10:45 ?        00:00:00 /usr/bin/valgrind.bin multipathd
> root     12190 12098  0 10:45 pts/2    00:00:00 grep multip
> 
> # kill 12136
> ==12136== Thread 9:
> ==12136== Invalid read of size 4
> ==12136==    at 0x4E2E4FE: pthread_mutex_lock (in /lib/libpthread-2.7.so)
> ==12136==    by 0x42B596: (within /sbin/multipathd)
> ==12136==    by 0x42BC83: (within /sbin/multipathd)
> ==12136==    by 0x4E2CFC6: start_thread (in /lib/libpthread-2.7.so)
> ==12136==    by 0x59A959C: clone (in /lib/libc-2.7.so)
> ==12136==  Address 0x6068788 is 8 bytes inside a block of size 40 free'd
> ==12136==    at 0x4C2130F: free (vg_replace_malloc.c:323)
> ==12136==    by 0x415770: xfree (in /sbin/multipathd)
> ==12136==    by 0x4069F8: (within /sbin/multipathd)
> ==12136==    by 0x406D98: (within /sbin/multipathd)
> ==12136==    by 0x58F81A5: (below main) (in /lib/libc-2.7.so)
> ==12136==
> ==12136== Invalid read of size 4
> ==12136==    at 0x4E2E509: pthread_mutex_lock (in /lib/libpthread-2.7.so)
> ==12136==    by 0x42B596: (within /sbin/multipathd)
> ==12136==    by 0x42BC83: (within /sbin/multipathd)
> ==12136==    by 0x4E2CFC6: start_thread (in /lib/libpthread-2.7.so)
> ==12136==    by 0x59A959C: clone (in /lib/libc-2.7.so)
> ==12136==  Address 0x606878c is 12 bytes inside a block of size 40 free'd
> ==12136==    at 0x4C2130F: free (vg_replace_malloc.c:323)
> ==12136==    by 0x415770: xfree (in /sbin/multipathd)
> ==12136==    by 0x4069F8: (within /sbin/multipathd)
> ==12136==    by 0x406D98: (within /sbin/multipathd)
> ==12136==    by 0x58F81A5: (below main) (in /lib/libc-2.7.so)
> ==12136==
> ==12136== Invalid write of size 4
> ==12136==    at 0x4E2E50D: pthread_mutex_lock (in /lib/libpthread-2.7.so)
> ==12136==    by 0x42B596: (within /sbin/multipathd)
> ==12136==    by 0x42BC83: (within /sbin/multipathd)
> ==12136==    by 0x4E2CFC6: start_thread (in /lib/libpthread-2.7.so)
> ==12136==    by 0x59A959C: clone (in /lib/libc-2.7.so)
> ==12136==  Address 0x6068788 is 8 bytes inside a block of size 40 free'd
> ==12136==    at 0x4C2130F: free (vg_replace_malloc.c:323)
> ==12136==    by 0x415770: xfree (in /sbin/multipathd)
> ==12136==    by 0x4069F8: (within /sbin/multipathd)
> ==12136==    by 0x406D98: (within /sbin/multipathd)
> ==12136==    by 0x58F81A5: (below main) (in /lib/libc-2.7.so)
> ==12136==
> ==12136== Invalid read of size 8
> ==12136==    at 0x42B5E8: (within /sbin/multipathd)
> ==12136==    by 0x42BC83: (within /sbin/multipathd)
> ==12136==    by 0x4E2CFC6: start_thread (in /lib/libpthread-2.7.so)
> ==12136==    by 0x59A959C: clone (in /lib/libc-2.7.so)
> ==12136==  Address 0x6068738 is 0 bytes inside a block of size 24 free'd
> ==12136==    at 0x4C2130F: free (vg_replace_malloc.c:323)
> ==12136==    by 0x415770: xfree (in /sbin/multipathd)
> ==12136==    by 0x406A0C: (within /sbin/multipathd)
> ==12136==    by 0x406D98: (within /sbin/multipathd)
> ==12136==    by 0x58F81A5: (below main) (in /lib/libc-2.7.so)
> ==12136==
> ==12136== Invalid read of size 4
> ==12136==    at 0x4E2FBF5: __pthread_mutex_unlock_usercnt (in /lib/libpthread-2.7.so)
> ==12136==    by 0x42B5EF: (within /sbin/multipathd)
> ==12136==    by 0x42BC83: (within /sbin/multipathd)
> ==12136==    by 0x4E2CFC6: start_thread (in /lib/libpthread-2.7.so)
> ==12136==    by 0x59A959C: clone (in /lib/libc-2.7.so)
> ==12136==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==12136==
> ==12136==
> ==12136== Process terminating with default action of signal 11 (SIGSEGV)
> ==12136==  Access not within mapped region at address 0x10
> ==12136==    at 0x4E2FBF5: __pthread_mutex_unlock_usercnt (in /lib/libpthread-2.7.so)
> ==12136==    by 0x42B5EF: (within /sbin/multipathd)
> ==12136==    by 0x42BC83: (within /sbin/multipathd)
> ==12136==    by 0x4E2CFC6: start_thread (in /lib/libpthread-2.7.so)
> ==12136==    by 0x59A959C: clone (in /lib/libc-2.7.so)
> ==12136==
> ==12136== ERROR SUMMARY: 6 errors from 5 contexts (suppressed: 9 from 2)
> ==12136== malloc/free: in use at exit: 13,258 bytes in 44 blocks.
> ==12136== malloc/free: 3,990 allocs, 3,946 frees, 3,016,169 bytes allocated.
> ==12136== For counts of detected errors, rerun with: -v
> ==12136== searching for pointers to 44 not-freed blocks.
> ==12136== checked 399,480 bytes.
> ==12136==
> ==12136== LEAK SUMMARY:
> ==12136==    definitely lost: 0 bytes in 0 blocks.
> ==12136==      possibly lost: 1,152 bytes in 4 blocks.
> ==12136==    still reachable: 12,106 bytes in 40 blocks.
> ==12136==         suppressed: 0 bytes in 0 blocks.
> ==12136== Rerun with --leak-check=full to see details of leaked memory.
> ^C
Hmm...no idea what causes this. I can't reproduce this here. Could you
rebuild multipath-tools with DEB_BUILD_OPTIONS=debug and run it in gdb?
Or is there a chance I can logon the box to check myself?
 -- Guido






More information about the pkg-lvm-maintainers mailing list