[Pkg-iscsi-maintainers] Bug#775778: open-iscsi: Boot with systemd hangs (ordering of init script w.r.t. remote filesystems)
Ritesh Raj Sarraf
rrs at debian.org
Tue Jan 20 09:58:58 UTC 2015
Hello Christian,
On 01/20/2015 01:02 AM, Christian Seiler wrote:
> Dear Maintainer,
>
> tl;dr: systemd + open-iscsi = 90s hang at boot in some cases,
> and umountiscsi.sh is not called on shutdown. Attached a
> debdiff that fixes that without being too invasive.
>
> Longer explanation: if you have the following configuration:
>
> - Jessie
> - systemd as init
> - open-iscsi configured to automatically log in to some iSCSI target,
> iSCSI disk /dev/sdb is then available
> - /etc/fstab containing an entry like
> /dev/sdb1 /data ext4 rw,_netdev 0 0
> or (when using LVM)
> /dev/vg_.../lv_... /data ext4 rw,_netdev 0 0
>
> the system boot will hang for 90s because of systemd's default timeout
> when devices are not available.
Actually, from what I know so far, systemd aggressively backgrounds any
processes that is taking time. And only processes that depend on it, are
put on hold, again in the background.
> The reason behind this is that open-iscsi contains the following LSB
> headers:
> Required-Start: $network $remote_fs
> Required-Stop: $network $remote_fs sendsigs
> Here, $network maps to network-online.target in systemd, that's fine,
> but $remote_fs maps to remote-fs.target in systemd, that is the problem.
> This is because
>
> a) systemd treats file systems that couldn't be mounted as hard
> failures.
> and
> b) systemd's logic of mounting all remote filesystems is to mount
> all filesystems in /etc/fstab that are marked _netdev (and not
> makred noauto)
>
> Therefore, systemd waits for the iSCSI device to appear for 90s before
> timing out and proceeding with boot. Only then remote-fs.target is
> reached and systemd starts the open-iscsi init script.
I think you may be missing something here. I believe devices marked
_netdev are always backgrounded. At least in sysvinit. And not having
them do so in systemd is highly unlikely.
>
> That in turn will then make the devices appear. The init script will
> then call a "mount -a -O _netdev" and "swapon -a -e" in it's start()
> routine, that will then cause the mount points to be activated.
>
> So in the end, the boot is kind-of successful in the sense that
> everything kind of works at the end of boot, with the following two caveats:
>
> - there is this needless 90s delay (or whatever other delay the admin
> has configured) in waiting on the iSCSI targets
Have you had luck root causing in why there is the 90 sec delay ?
> - if I want to use systemd's features to order to order a specific
> service after remote-fs.target to make sure that the remove file
> systems I have are mounted, maybe because the service needs the
> data on them, then this won't work consistently, because the
> file systems will only be mounted after open-iscsi is started,
> which will then be in parallel to any services I have ordered
> after remote-fs.target, for example:
> - exporting a subdirectory of an iSCSI filesystem via NFS; if
> nfs-kernel-server gets started too early, this might fail
> because the directory that is exported doesn't exist
>
> If I modify the init script to remove $remote_fs from it's LSB headers,
> then booting works as expected. However, this causes two problems:
>
> 1. I assume that $remote_fs is in there because you want to support
> NFS-based sepearte /usr. Removing $remote_fs from LSB headers
> would break such a configuration under sysvinit, since the
> open-iscsi tools wouldn't be able to be called.
>
> However, systemd in Debian currently doesn't really support a
> separate /usr that's not mounted from initrd anyway.
>
> 2. Shutting down is racy.
>
> Shutting down is racy because you then have the following constellation:
>
> - systemd tracks services' states. And while bug #732793 does not occur
> anymore because invoke-rc.d strips the .sh from umountiscsi.sh, the
> call to umountiscsi.sh stop doesn't really do anything, because
> systemd already thinks it is stopped, since it was never started.
>
> - OTOH, systemd will tear down remote filesystems on its own. But
> because open-iscsi is only ordered after network-online.target then,
> tearing down the remote filesystesm will be done in parallel (!) to
> stopping open-iscsi.
>
> This has the unfortunate effect that it could be the case that the
> umount call to the filesystem is made after open-iscsi has been
> stopped. This will then cause the kernel to hang trying to umount
> the filesystem.
>
> I haven't been able to reproduce this race yet, i.e. I have gotten
> lucky so far, in that umount was typically faster on my system than
> stopping open-iscsi - BUT I am really not comfortable with having
> such a flimsy race in place, especially since umount will sync
> stuff to the filesystem and stopping open-iscsi too early could
> easily cause severe data loss.
>
> So far for my analysis. How do we proceed from here?
>
> - it is quite clear that you probably don't want to change the sysvinit
> logic now, especially so late in the Jessie freeze
>
> - however, this bug w.r.t. systemd should definitely be fixed in my
> eyes
>
> Therefore, I suggest that you provide a unit file specifically for
> systemd. In order to as minimally invasive as possible (especially this
> late in the freeze), the unit file should ideally call the original init
> script.
I am willing to accept a systemd unit. But it is too late for Jessie
right now. If you have the unit ready and tested, for now, we can put it
into experimental.
I would not want to ship something for Jessie now. Ideally, systemd's
logic on handling init scripts should take care of it. It has worked for
other sysvinit scripts so far.
And introducing the systemd unit now in Jessie is late. Because it
wouldn't have had enough test cycles.
> After Jessie one should consider redoing the entire logic for
> systemd-based systems, there are a lot more features of sytstemd that
> one can leverage to make things work better. But to fix this immediate
> bug, the changes I mentioned are sufficient.
>
> I have created a debdiff for a test package that changes the following:
>
> - add systemd unit that just calls the init script but has adjusted
> dependencies:
> - no more After=remote-fs.target
> - new Before=remote-fs-pre.target
> - add dh-systemd as build-dep and use dh_systemd in debian/rules
> - move #DEBHELPER# around in postinst, to make sure package upgrades
> don't break the system (dh_systemd_enable code has to come before
> the unit is first started, otherwise weird things occur)
> - do the equivalent of umountiscsi.sh start so that systemd will
> track that service as 'running' - then at shutdown the open-iscsi
> init script will be able to call the stop action of that script
>
> I have now tested this under systemd with Jessie, in two different
> configurations of Jessie running systemd:
>
> 1. root on normal device, separate iSCSI devices mounted
> 2. root on iSCSI, boot via PXE
>
> In both cases, iSCSI now seems to work as expected. There are a couple
> of caveats though:
>
> - as discussed before, non-initrd-mounted separate /usr on NFS
> won't work together with this constellation
> - unlikely to work well with systemd anyway, regardless of
> iSCSI, and I don't think this is something that could be
> fixed without a major redesign of the remote-fs*.target
> logic across the board
>
> - irrespective of systemd, while looking at it I noticed that
> umountiscsi.sh's logic is incomplete, it doesn't try to umount
> filesystems on LVM on top of iSCSI, unless they were marked with
> _netdev (it only detects direct devices).
Can you please elaborate more here ? Or perhaps just file a separate bug
report. The current init scripts are designed to support LVM + iSCSI.
> OTOH, this has been the case since at least Squeeze, so it can't
> be that critical.
>
> - the current design of using umountiscsi.sh doesn't integrate well
> with systemd's dependency logic. I don't think this is a huge issue,
> as far as I can see, stuff works as well under systemd with my patch
> as under sysvinit (except for the /usr-NFS thing), but I do think
> that you could make the whole thing a lot more robust if this is
> redesigned a bit - but I don't think that is something that should
> go to Jessie.
I agree. We need to switch to systemd. But I haven't had the time to do
it, and right now, your patch is too late. :-(
This is one reason why I keep telling most Debian (Enterprise) users to
at least keep track of testing. Because they usually end up reporting
bugs too late in the cycle.
> -- System Information:
> Debian Release: 8.0
> APT prefers testing
> APT policy: (500, 'testing')
> Architecture: amd64 (x86_64)
>
> Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/bash
> Init: systemd (via /run/systemd/system)
>
> Versions of packages open-iscsi depends on:
> ii libc6 2.19-13
> ii udev 215-8.0+~tfptkm1
>
> open-iscsi recommends no packages.
>
> open-iscsi suggests no packages.
>
> -- no debconf information
>
> Btw. I selected severity 'important' because I don't think this bug is
> 'grave', but I do think that it could be categorized as 'serious', since
> in my eyes it is unwritten policy that packages should properly support
> the default init system unless there's a really good reason against it.
> Unfortunately for me, current policy doesn't mention multiple init
> systems at all, therefore the severity 'important', because I can't
> point to a specific part of the text. Nevertheless, I think this bug
> would qualify as RC.
>
> Regards,
> Christian
>
--
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-iscsi-maintainers/attachments/20150120/37372ac1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-iscsi-maintainers/attachments/20150120/37372ac1/attachment-0001.sig>
More information about the Pkg-iscsi-maintainers
mailing list