Bug#859157: multipath-tools: after bootup multipathd timeout on commands - requires daemon restart

Alban Browaeys prahal at yahoo.com
Fri Mar 31 15:33:36 UTC 2017


Le vendredi 31 mars 2017 à 14:24 +0530, Ritesh Raj Sarraf a écrit :
> Control: tag -1 +moreinfo
> 
> 
> Hello,
> 
> I cannot reproduce this locally in my setup.
> My guess is, from your other bug report, that you may have other
> components
> interfering with multipath but I cannot confirm that right now with
> the limited
> information available.

Mind this is always the case when udev triggers an add (at boot).
http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commitdiff;h
=5ed6122c0e5d8326bc0f3beaf1f5de7303986952 introduced the lock in
uev_add_path but at that time the uev_update_path call to uev_add_path
was after uev_update_path unlocked vecs->lock.
The bug appears after http://git.opensvc.com/gitweb.cgi?p=multipath-too
ls/.git;a=commitdiff;h=c6a18f4541d0a161e2f5fed8c67d9732bf512b37 move
the calls to uev_add_path inside the already locked area.




> 
> On Fri, 2017-03-31 at 06:05 +0200, Alban Browaeys wrote:
> > Package: multipath-tools
> > Version: 0.6.4-5.1
> > Severity: normal
> > 
> > Dear Maintainer,
> > multipathd does not respond to commands : list paths or list maps
> > returns "timed out".
> > 
> > The mutlipathd daemon when triggered by udev locks up in
> > uev_add_path
> > per this locks is already helds by its caller uev_update_path.
> > 
> > Here in thread 5 uev_update_path and uev_add_path chain
> >  (note that in thread 4 checkerloop also waits for this lock).
> 
> rrs at learner:~$ ssh 172.16.230.133
> The authenticity of host '172.16.230.133 (172.16.230.133)' can't be
> established.
> ECDSA key fingerprint is
> SHA256:BrBphJSYS/93xpb/GxHgDPLTlHWGheloG7wlTepHQYk.
> Are you sure you want to continue connecting (yes/no)? yes
> Warning: Permanently added '172.16.230.133' (ECDSA) to the list of
> known hosts.
> rrs at 172.16.230.133's password: 
> Permission denied, please try again.
> rrs at 172.16.230.133's password: 
> 
> The programs included with the Debian GNU/Linux system are free
> software;
> the exact distribution terms for each program are described in the
> individual files in /usr/share/doc/*/copyright.
> 
> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
> permitted by applicable law.
> Last login: Sat Nov 19 23:17:46 2016 from 172.16.20.1
> rrs at debian-btrfs:~$ su -
> Password: 
> su: Authentication failure
> rrs at debian-btrfs:~$ su -
> Password: 
> root at debian-btrfs:~# multipathd -k
> multipathd> list paths
> hcil    dev dev_t pri dm_st  chk_st dev_st  next_check      
> 2:0:0:0 sda 8:0   50  active ready  running XXXXX..... 20/40
> 3:0:0:0 sdb 8:16  50  active ready  running XXXX...... 19/40
> 3:0:0:1 sdc 8:32  50  active ready  running XXXX...... 19/40
> 2:0:0:1 sdd 8:48  50  active ready  running XXXXX..... 20/40
> 4:0:0:0 sde 8:64  50  active ready  running XXXXX..... 20/40
> 5:0:0:0 sdf 8:80  50  active ready  running XXXXX..... 20/40
> 5:0:0:1 sdg 8:96  50  active ready  running XXXXX..... 20/40
> 4:0:0:1 sdh 8:112 50  active ready  running XXXXX..... 20/40
> multipathd> list maps
> name   sysfs uuid                             
> mpatha dm-0  36001405c2d2d9a03751406691395e741
> mpathb dm-1  36001405226c2409d98a4e35ba427b274
> multipathd> 
> 
> root at debian-btrfs:~# apt policy multipath-tools
> multipath-tools:
>   Installed: 0.6.4-5
>   Candidate: 0.6.4-5
>   Version table:
>  *** 0.6.4-5 500
>         500 http://httpredir.debianorg/debian unstable/main amd64
> Packages
>         100 /var/lib/dpkg/status
>      0.6.4-3 500
>         500 http://httpredir.debianorg/debian testing/main amd64
> Packages
> 

Has this boxes been rebooted in between ?
The broken commit "fix INIT_REQUESTED_UDEV code" was in 0.6.4 http://gi
t.opensvc.com/gitweb.cgi?p=multipath-
tools/.git;a=shortlog;h=refs/tags/0.6.4 upstream but not in 0.6.3 http:
//git.opensvc.com/gitweb.cgi?p=multipath-
tools/.git;a=shortlog;h=refs/tags/0.6.3 .
The deadlock only shows up at boot. Only then udev triggers the add
events for the devices. It might be that this code path is only
triggered by virtual devices 
(if without the fix I boot I get a deadlock of multipathd but it I kill
and start multipathd /dev/sda and /dev/sdb are availables -- but
bcache0 virtual device only do so with a fixed multipathd started at
boot.
Likely when I kill and restart multipathd , no add udev events
triggers)

The test to verify multipathd deadlocks is : "multipathd list paths"
returns timeout 

bcache0 device does nothing special to hang multipath. Only it triggers
the udev add code path.

By the way the deadlock shown in the backtrace is short to follow:
uev_update_path initial lock http://git.opensvc.com/gitweb.cgi?p=multip
ath-
tools/.git;a=blob;f=multipathd/main.c;h=283d81dd97608fed4705ac18af96a7f
4dee7e785;hb=HEAD#l985
then uev_update_path calls uev_add_path http://git.opensvc.com/gitweb.c
gi?p=multipath-
tools/.git;a=blob;f=multipathd/main.c;h=283d81dd97608fed4705ac18af96a7f
4dee7e785;hb=HEAD#l1012
then uev_add_path deadlock locking the same lock in http://git.opensvc.
com/gitweb.cgi?p=multipath-
tools/.git;a=blob;f=multipathd/main.c;h=283d81dd97608fed4705ac18af96a7f
4dee7e785;hb=HEAD#l631

Best regards
Alban



More information about the pkg-lvm-maintainers mailing list