[Pkg-iscsi-maintainers] Bug#775778: open-iscsi: Boot with systemd hangs (ordering of init script w.r.t. remote filesystems)

Christian Seiler christian at iwakd.de
Mon Jan 19 19:32:11 UTC 2015


Package: open-iscsi
Version: 2.0.873+git0.3b4b4500-4
Severity: important
Tags: patch

Dear Maintainer,

tl;dr: systemd + open-iscsi = 90s hang at boot in some cases,
       and umountiscsi.sh is not called on shutdown. Attached a
       debdiff that fixes that without being too invasive.

Longer explanation: if you have the following configuration:

 - Jessie
 - systemd as init
 - open-iscsi configured to automatically log in to some iSCSI target,
   iSCSI disk /dev/sdb is then available
 - /etc/fstab containing an entry like
     /dev/sdb1           /data ext4 rw,_netdev 0 0
   or (when using LVM)
     /dev/vg_.../lv_...  /data ext4 rw,_netdev 0 0

the system boot will hang for 90s because of systemd's default timeout
when devices are not available.

The reason behind this is that open-iscsi contains the following LSB
headers:
      Required-Start:    $network $remote_fs
      Required-Stop:     $network $remote_fs sendsigs
Here, $network maps to network-online.target in systemd, that's fine,
but $remote_fs maps to remote-fs.target in systemd, that is the problem.
This is because

 a) systemd treats file systems that couldn't be mounted as hard
    failures.
and
 b) systemd's logic of mounting all remote filesystems is to mount
    all filesystems in /etc/fstab that are marked _netdev (and not
    makred noauto)

Therefore, systemd waits for the iSCSI device to appear for 90s before
timing out and proceeding with boot. Only then remote-fs.target is
reached and systemd starts the open-iscsi init script.

That in turn will then make the devices appear. The init script will
then call a "mount -a -O _netdev" and "swapon -a -e" in it's start()
routine, that will then cause the mount points to be activated.

So in the end, the boot is kind-of successful in the sense that
everything kind of works at the end of boot, with the following two caveats:

 - there is this needless 90s delay (or whatever other delay the admin
   has configured) in waiting on the iSCSI targets

 - if I want to use systemd's features to order to order a specific
   service after remote-fs.target to make sure that the remove file
   systems I have are mounted, maybe because the service needs the
   data on them, then this won't work consistently, because the
   file systems will only be mounted after open-iscsi is started,
   which will then be in parallel to any services I have ordered
   after remote-fs.target, for example:
      - exporting a subdirectory of an iSCSI filesystem via NFS; if
        nfs-kernel-server gets started too early, this might fail
        because the directory that is exported doesn't exist

If I modify the init script to remove $remote_fs from it's LSB headers,
then booting works as expected. However, this causes two problems:

 1. I assume that $remote_fs is in there because you want to support
    NFS-based sepearte /usr. Removing $remote_fs from LSB headers
    would break such a configuration under sysvinit, since the
    open-iscsi tools wouldn't be able to be called.

    However, systemd in Debian currently doesn't really support a
    separate /usr that's not mounted from initrd anyway.

 2. Shutting down is racy.

Shutting down is racy because you then have the following constellation:

 - systemd tracks services' states. And while bug #732793 does not occur
   anymore because invoke-rc.d strips the .sh from umountiscsi.sh, the
   call to umountiscsi.sh stop doesn't really do anything, because
   systemd already thinks it is stopped, since it was never started.

 - OTOH, systemd will tear down remote filesystems on its own. But
   because open-iscsi is only ordered after network-online.target then,
   tearing down the remote filesystesm will be done in parallel (!) to
   stopping open-iscsi.

   This has the unfortunate effect that it could be the case that the
   umount call to the filesystem is made after open-iscsi has been
   stopped. This will then cause the kernel to hang trying to umount
   the filesystem.

   I haven't been able to reproduce this race yet, i.e. I have gotten
   lucky so far, in that umount was typically faster on my system than
   stopping open-iscsi - BUT I am really not comfortable with having
   such a flimsy race in place, especially since umount will sync
   stuff to the filesystem and stopping open-iscsi too early could
   easily cause severe data loss.

So far for my analysis. How do we proceed from here?

 - it is quite clear that you probably don't want to change the sysvinit
   logic now, especially so late in the Jessie freeze

 - however, this bug w.r.t. systemd should definitely be fixed in my
   eyes

Therefore, I suggest that you provide a unit file specifically for
systemd. In order to as minimally invasive as possible (especially this
late in the freeze), the unit file should ideally call the original init
script.

After Jessie one should consider redoing the entire logic for
systemd-based systems, there are a lot more features of sytstemd that
one can leverage to make things work better. But to fix this immediate
bug, the changes I mentioned are sufficient.

I have created a debdiff for a test package that changes the following:

 - add systemd unit that just calls the init script but has adjusted
   dependencies:
     - no more After=remote-fs.target
     - new Before=remote-fs-pre.target
 - add dh-systemd as build-dep and use dh_systemd in debian/rules
 - move #DEBHELPER# around in postinst, to make sure package upgrades
   don't break the system (dh_systemd_enable code has to come before
   the unit is first started, otherwise weird things occur)
 - do the equivalent of umountiscsi.sh start so that systemd will
   track that service as 'running' - then at shutdown the open-iscsi
   init script will be able to call the stop action of that script

I have now tested this under systemd with Jessie, in two different
configurations of Jessie running systemd:

 1. root on normal device, separate iSCSI devices mounted
 2. root on iSCSI, boot via PXE

In both cases, iSCSI now seems to work as expected. There are a couple
of caveats though:

 - as discussed before, non-initrd-mounted separate /usr on NFS
   won't work together with this constellation
       - unlikely to work well with systemd anyway, regardless of
         iSCSI, and I don't think this is something that could be
         fixed without a major redesign of the remote-fs*.target
         logic across the board

 - irrespective of systemd, while looking at it I noticed that
   umountiscsi.sh's logic is incomplete, it doesn't try to umount
   filesystems on LVM on top of iSCSI, unless they were marked with
   _netdev (it only detects direct devices).

   OTOH, this has been the case since at least Squeeze, so it can't
   be that critical.

 - the current design of using umountiscsi.sh doesn't integrate well
   with systemd's dependency logic. I don't think this is a huge issue,
   as far as I can see, stuff works as well under systemd with my patch
   as under sysvinit (except for the /usr-NFS thing), but I do think
   that you could make the whole thing a lot more robust if this is
   redesigned a bit - but I don't think that is something that should
   go to Jessie.

-- System Information:
Debian Release: 8.0
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)

Versions of packages open-iscsi depends on:
ii  libc6  2.19-13
ii  udev   215-8.0+~tfptkm1

open-iscsi recommends no packages.

open-iscsi suggests no packages.

-- no debconf information

Btw. I selected severity 'important' because I don't think this bug is
'grave', but I do think that it could be categorized as 'serious', since
in my eyes it is unwritten policy that packages should properly support
the default init system unless there's a really good reason against it.
Unfortunately for me, current policy doesn't mention multiple init
systems at all, therefore the severity 'important', because I can't
point to a specific part of the text. Nevertheless, I think this bug
would qualify as RC.

Regards,
Christian

-------------- next part --------------
diffstat for open-iscsi-2.0.873+git0.3b4b4500 open-iscsi-2.0.873+git0.3b4b4500

 changelog           |   14 ++++++++++++++
 control             |    2 +-
 open-iscsi.init     |   10 ++++++++++
 open-iscsi.postinst |    4 ++--
 open-iscsi.service  |   22 ++++++++++++++++++++++
 rules               |    2 ++
 6 files changed, 51 insertions(+), 3 deletions(-)

diff -Nru open-iscsi-2.0.873+git0.3b4b4500/debian/changelog open-iscsi-2.0.873+git0.3b4b4500/debian/changelog
--- open-iscsi-2.0.873+git0.3b4b4500/debian/changelog	2014-09-01 11:03:23.000000000 +0200
+++ open-iscsi-2.0.873+git0.3b4b4500/debian/changelog	2015-01-19 19:45:36.000000000 +0100
@@ -1,3 +1,17 @@
+open-iscsi (2.0.873+git0.3b4b4500-4.0+systemd1) UNRELEASED; urgency=medium
+
+  * Test package
+  * Create systemd unit, make it order before remote-fs-pre.target to
+    fix hang at boot. Unit currently only starts init script.
+  * Manually start umountiscsi.sh in open-iscsi init script to make
+    the stop action on shutdown not be a noop. (systemd tracks service
+    state)
+  * Add dh-systemd to build-deps.
+  * Reorder #DEBHELPER# in postinst to not break upgrades (dh-systemd's
+    code has to be there before invoke-rc.d is called).
+
+ -- Christian Seiler <christian at iwakd.de>  Mon, 19 Jan 2015 19:45:05 +0100
+
 open-iscsi (2.0.873+git0.3b4b4500-4) unstable; urgency=medium
 
   * [41c7eca] Introduce new architectures based on current build
diff -Nru open-iscsi-2.0.873+git0.3b4b4500/debian/control open-iscsi-2.0.873+git0.3b4b4500/debian/control
--- open-iscsi-2.0.873+git0.3b4b4500/debian/control	2014-09-01 11:02:01.000000000 +0200
+++ open-iscsi-2.0.873+git0.3b4b4500/debian/control	2015-01-19 18:59:21.000000000 +0100
@@ -3,7 +3,7 @@
 Priority: optional
 Maintainer: Debian iSCSI Maintainers <pkg-iscsi-maintainers at lists.alioth.debian.org>
 Uploaders: Andrew Moise <chops at demiurgestudios.com>, Philipp Hug <debian at hug.cx>, Guido Günther <agx at sigxcpu.org>, Ritesh Raj Sarraf <rrs at debian.org>
-Build-Depends: debhelper (>= 7.0.0), bzip2, bison, flex, autotools-dev, dh-autoreconf, dpkg-dev (>= 1.16.1~)
+Build-Depends: debhelper (>= 7.0.0), bzip2, bison, flex, autotools-dev, dh-autoreconf, dpkg-dev (>= 1.16.1~), dh-systemd
 Standards-Version: 3.9.2
 Vcs-Git: git://anonscm.debian.org/pkg-iscsi/open-iscsi.git
 Vcs-Browser: http://anonscm.debian.org/gitweb/?p=pkg-iscsi/open-iscsi.git
diff -Nru open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.init open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.init
--- open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.init	2014-08-20 15:53:55.000000000 +0200
+++ open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.init	2015-01-19 19:45:04.000000000 +0100
@@ -107,6 +107,16 @@
 
 	udevadm settle || true;
 
+	# If we are under systemd, make sure we start umountiscsi.sh.
+	# This is a no-op, but systemd tracks each unit's status, so
+	# we unless we start it here, the invoke-rc.d during stop()
+	# doesn't do anything.
+	if [ -d /run/systemd/system ] ; then
+		# we don't want to deadlock here, so ignore deps,
+		# the start of the service is a no-op anyway
+		systemctl --job-mode=ignore-dependencies start umountiscsi.service
+	fi
+
 
 	# Handle iSCSI LVM devices
 	if [ ! -x "/sbin/vgchange" -a -n "$LVMGROUPS" ]; then
diff -Nru open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.postinst open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.postinst
--- open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.postinst	2014-08-20 15:53:55.000000000 +0200
+++ open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.postinst	2015-01-19 19:04:03.000000000 +0100
@@ -11,6 +11,8 @@
     fi
 }
 
+#DEBHELPER#
+
 case "$1" in
     configure)
 	# Move old configuration from /etc/ into /etc/iscsi/
@@ -43,6 +45,4 @@
     ;;
 esac
 
-#DEBHELPER#
-
 exit 0
diff -Nru open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.service open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.service
--- open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.service	1970-01-01 01:00:00.000000000 +0100
+++ open-iscsi-2.0.873+git0.3b4b4500/debian/open-iscsi.service	2015-01-19 19:14:53.000000000 +0100
@@ -0,0 +1,22 @@
+[Unit]
+Description=iSCSI initiator
+DefaultDependencies=no
+Before=sysinit.target shutdown.target remote-fs-pre.target
+After=network-online.target
+Wants=network-online.target
+Conflicts=shutdown.target
+
+[Service]
+Type=forking
+Restart=no
+TimeoutSec=0
+IgnoreSIGPIPE=no
+KillMode=process
+GuessMainPID=no
+RemainAfterExit=yes
+SysVStartPriority=20
+ExecStart=/etc/init.d/open-iscsi start
+ExecStop=/etc/init.d/open-iscsi stop
+
+[Install]
+WantedBy=multi-user.target
diff -Nru open-iscsi-2.0.873+git0.3b4b4500/debian/rules open-iscsi-2.0.873+git0.3b4b4500/debian/rules
--- open-iscsi-2.0.873+git0.3b4b4500/debian/rules	2014-08-20 15:53:55.000000000 +0200
+++ open-iscsi-2.0.873+git0.3b4b4500/debian/rules	2015-01-19 19:18:24.000000000 +0100
@@ -127,8 +127,10 @@
 	dh_installchangelogs 
 	dh_installdocs
 	dh_installexamples
+	dh_systemd_enable
 	dh_installinit -u 'start 45 S . stop 81 0 1 6 .' --no-start
 	dh_installinit -u 'stop 80 0 1 6 .' --no-start --name=umountiscsi.sh
+	dh_systemd_start --no-start
 	dh_installman
 	dh_link
 	dh_strip


More information about the Pkg-iscsi-maintainers mailing list