[Pkg-iscsi-maintainers] Bug#850211: open-iscsi: Boot with systemd hangs for iSCSI+LUKS+LVM volume

Abhijit Menon-Sen iehieshi at toroid.org
Thu Jan 5 03:19:43 UTC 2017


Package: open-iscsi
Version: 2.0.874-1~bpo8+1
Severity: normal

Dear Maintainer,

I have a NAS that exports an iSCSI target. I have created a LUKS volume
on a partition on this target, used that volume as a PV, and created an
LVM VG/LV on top of it. On boot, systemd waits for 90s for the iSCSI
device and its subsidiary devices to appear before bringing up the
network. This is similar to bug #775778.

I am not sure that this is a bug in open-iscsi, but I'm filing it here
on the basis of the strong similarity with #775778. (I'm running jessie
with updates from jessie-backports, so I have the patches from that bug
already.)

This is how I setup my volume:

    # The iSCSI part works fine.
    iscsiadm … --login

    # Device shows up as /dev/sdb; I create a /dev/sdb1 partition using
    # fdisk, of type 8e.

    # Create encrypted LUKS volume on /dev/sdb1, open and map as
    # /dev/mapper/sdb1_crypt.
    cryptsetup luksFormat /dev/sdb1
    cryptsetup luksOpen /dev/sdb1 sdb1_crypt \
        --key-file /root/blackbird-ullu

    # Set up LVM PV, VG, and LV mapped to /dev/mapper/blackbird-ullu,
    # with an ext4 filesystem on top.
    pvcreate /dev/mapper/sdb1_crypt
    vgcreate blackbird /dev/mapper/sdb1_crypt
    lvcreate -n ullu -l 100%VG blackbird
    mkfs.ext4 /dev/mapper/blackbird-ullu

    mount /dev/mapper/blackbird-ullu /media/nas

I have an entry in /etc/crypttab like this:

    sdb1_crypt UUID=ae6b9263-d63c-4515-b7ce-51e5cc4caa9f /root/blackbird-ullu luks

And an entry in /etc/fstab like this (I've tried various variants here,
see below):

    /dev/mapper/blackbird-ullu /media/nas ext4 defaults,nofail,_netdev 0 6

There are three devices involved:

    - /dev/disk/by-uuid/<x>: is the iSCSI target (/dev/sdb)
    - /dev/mapper/sdb1_crypt: result of cryptsetup luksOpen /dev/sdb1 
    - /dev/mapper/blackbird-ullu: LV built on top of sdb1_crypt

Now I suffer from the 90s wait on startup (before network-online), where
systemd waits for the dev-mapper-blackbird\x2dullu.device to become
available, along with dev-disk-by\x2duuid-<xxx>.device and
dev-mapper-sdb1_crypt.device.

After the timeout, the boot proceeds, brings up the network, starts
iscsid, and runs /lib/open-iscsi/activate-storage.sh once /dev/sdb is
available, but the "vgchange -aay" command fails because the PV is not
yet mapped with cryptsetup:

    activate-storage.sh[2618]:   Volume group "blackbird" not found
    activate-storage.sh[2618]:   Skipping volume group blackbird
    activate-storage.sh[2618]: Warning: could not activate all LVM groups.

This is the first time I have used iSCSI or systemd, so my attempts to
resolve this are pretty amateurish, but here's an overview:

(a) I shortened the 90s timeout for all three devices by creating
drop-in overrides for the corresponding .device units under
/etc/systemd/system:

    dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device.d/nowait.conf
    dev-mapper-sdb1_crypt.device.d/nowait.conf
    dev-mapper-blackbird\x2dullu.device.d/nowait.conf
    
    [Unit]
    JobTimeoutSec=1

(I later removed the third file and instead added the
x-systemd.device-timeout=1s mount flag in /etc/fstab for the
/dev/mapper/blackbird-ullu volume, but this is functionally equivalent
to the above.)

After this, the boot proceeded with only a 1s wait, with messages like this:

    systemd[1]: dev-mapper-sdb1_crypt.device: Job dev-mapper-sdb1_crypt.device/start timed out.
    systemd[1]: Timed out waiting for device dev-mapper-sdb1_crypt.device.
    systemd[1]: Dependency failed for Cryptography Setup for sdb1_crypt.
    systemd[1]: Dependency failed for Encrypted Volumes.
    systemd[1]: cryptsetup.target: Job cryptsetup.target/start failed with result 'dependency'.
    systemd[1]: systemd-cryptsetup at sdb1_crypt.service: Job systemd-cryptsetup at sdb1_crypt.service/start failed with result 'dependency'.
    systemd[1]: dev-mapper-sdb1_crypt.device: Job dev-mapper-sdb1_crypt.device/start failed with result 'timeout'.
    systemd[1]: dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device: Job dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa
    systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device.
    systemd[1]: dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device: Job dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa
    systemd[1]: dev-mapper-blackbird\x2dullu.device: Job dev-mapper-blackbird\x2dullu.device/start timed out.
    systemd[1]: Timed out waiting for device dev-mapper-blackbird\x2dullu.device.
    systemd[1]: Dependency failed for /media/nas.
    systemd[1]: media-nas.mount: Job media-nas.mount/start failed with result 'dependency'.
    systemd[1]: Dependency failed for File System Check on /dev/mapper/blackbird-ullu.
    systemd[1]: systemd-fsck at dev-mapper-blackbird\x2dullu.service: Job systemd-fsck at dev-mapper-blackbird\x2dullu.service/start failed with result 'dependency'.
    systemd[1]: dev-mapper-blackbird\x2dullu.device: Job dev-mapper-blackbird\x2dullu.device/start failed with result 'timeout'.

(b) To address the "vgchange -aay" problem, I enabled global/use_lvmetad
in lvm.conf. Now activate-storage.sh still fails to activate the volume
group, but lvmetad activates it once the PV is mapped. I've edited the
following log excerpt to remove unrelated interleaved output:

    systemd[1]: Starting iSCSI initiator daemon (iscsid)...
    iscsid[2544]: iSCSI logger with pid=2548 started!
    systemd[1]: Started iSCSI initiator daemon (iscsid).
    systemd[1]: Starting Login to default iSCSI targets...
    kernel: scsi host4: iSCSI Initiator over TCP/IP
    kernel: scsi 4:0:0:0: Direct-Access     SYNOLOGY iSCSI Storage    3.1  PQ: 0 ANSI: 5
    kernel: sd 4:0:0:0: Attached scsi generic sg1 type 0
    kernel: sd 4:0:0:0: [sdb] 3897335808 512-byte logical blocks: (2.00 TB/1.81 TiB)
    kernel: sd 4:0:0:0: [sdb] Write Protect is off
    kernel: sd 4:0:0:0: [sdb] Mode Sense: 3b 00 00 00
    kernel: sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
    iscsiadm[2558]: Logging in to [iface: default, target: iqn.2017-01.in.lweshal:blackbird.Target-1.6f8f6dc754, portal: 192.168.11.2,3260] (multiple)
    iscsiadm[2558]: Login to [iface: default, target: iqn.2017-01.in.lweshal:blackbird.Target-1.6f8f6dc754, portal: 192.168.11.2,3260] successful.
    kernel:  sdb: sdb1
    kernel: sd 4:0:0:0: [sdb] Attached SCSI disk
    …
    systemd[1]: Reached target Bluetooth.
    systemd[1]: Starting Cryptography Setup for sdb1_crypt...
    systemd-cryptsetup[2729]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/ae6b9263-d63c-4515-b7ce-51e5cc4caa9f.
    systemd[1]: Reached target Multi-User System.
    systemd[1]: Reached target Graphical Interface.
    iscsid[2548]: iSCSI daemon with pid=2549 started!
    iscsid[2548]: Connection1:0 to [target: iqn.2017-01.in.lweshal:blackbird.Target-1.6f8f6dc754, portal: 192.168.11.2,3260] through [iface: default] is operational now
    activate-storage.sh[2618]:   Volume group "blackbird" not found
    activate-storage.sh[2618]:   Skipping volume group blackbird
    activate-storage.sh[2618]: Warning: could not activate all LVM groups.
    systemd[1]: Started Login to default iSCSI targets.
    systemd[1]: dev-mapper-sdb1_crypt.device: Job dev-mapper-sdb1_crypt.device/start timed out.
    systemd[1]: Timed out waiting for device dev-mapper-sdb1_crypt.device.
    systemd[1]: Dependency failed for Cryptography Setup for sdb1_crypt.
    systemd[1]: systemd-cryptsetup at sdb1_crypt.service: Job systemd-cryptsetup at sdb1_crypt.service/start failed with result 'dependency'.
    systemd[1]: dev-mapper-sdb1_crypt.device: Job dev-mapper-sdb1_crypt.device/start failed with result 'timeout'.
    systemd[1]: Reached target Remote File Systems (Pre).
    systemd[1]: Startup finished in 12.294s (kernel) + 14.191s (userspace) = 26.485s.
    systemd[1]: Starting LVM2 PV scan on device 254:4...
    lvm[3268]:   1 logical volume(s) in volume group "blackbird" now active
    systemd[1]: Started LVM2 PV scan on device 254:4.
    systemd[1]: Starting File System Check on /dev/mapper/blackbird-ullu...
    systemd-fsck[3282]: /dev/mapper/blackbird-ullu: clean, 11/121798656 files, 7928205/487165952 blocks
    systemd[1]: Started File System Check on /dev/mapper/blackbird-ullu.
    systemd[1]: Mounting /media/nas...
    systemd[1]: Mounted /media/nas.

Note that although systemd-cryptsetup at sdb1_crypt service is considered
to have failed, it actually does its job and maps the encrypted volume:

● systemd-cryptsetup at sdb1_crypt.service - Cryptography Setup for sdb1_crypt
   Loaded: loaded (/etc/crypttab; generated; vendor preset: enabled)
   Active: active (exited) since Thu 2017-01-05 07:24:32 IST; 1h 6min ago
     Docs: man:crypttab(5)
           man:systemd-cryptsetup-generator(8)
           man:systemd-cryptsetup at .service(8)
  Process: 2729 ExecStart=/lib/systemd/systemd-cryptsetup attach sdb1_crypt /dev/disk/by-uuid/ae6b9263-d63c-4515-b7ce-51e5cc4caa9f /root/blackbird-ullu luks (code=exited, status=0/SUCCESS)
 Main PID: 2729 (code=exited, status=0/SUCCESS)

Jan 05 07:24:31 ullu.lweshal.in systemd[1]: Starting Cryptography Setup for sdb1_crypt...
Jan 05 07:24:31 ullu.lweshal.in systemd-cryptsetup[2729]: Set cipher aes, mode xts-plain64, key size 256 bits for device /dev/disk/by-uuid/ae6b9263-d63c-4515-b7ce-51e5cc4caa9f.
Jan 05 07:24:32 ullu.lweshal.in systemd[1]: Dependency failed for Cryptography Setup for sdb1_crypt.
Jan 05 07:24:32 ullu.lweshal.in systemd[1]: systemd-cryptsetup at sdb1_crypt.service: Job systemd-cryptsetup at sdb1_crypt.service/start failed with result 'dependency'.

So now the volume is correctly mounted and everything works, the only
problem is the spurious wait and the timeout/dependency errors.

(c) I tried to set correct dependencies for the devices in my drop-in
override files. For example,

dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device:

    [Unit]
    Requires=iscsi.service
    After=iscsi.service

dev-mapper-sdb1_crypt.device:

    [Unit]
    Requires=iscsi.service dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device
    After=iscsi.service dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device

dev-mapper-blackbird\x2dullu.device:

    [Unit]
    Requires=iscsi.service dev-mapper-sdb1_crypt.device
    After=iscsi.service dev-mapper-sdb1_crypt.device

I later removed this last file and added a x-systemd.requires=foo.device
mount flag in /etc/fstab; I tried many combinations: one .device, both
.devices, requires=/dev/mapper/sdb1_crypt, requires=/dev/disk/by-uuid/…
and so on.

All of these attempts resulted in systemd detecting ordering cycles and
breaking them randomly—some of these situations worked, while others did
not. Just as an example, here's one such loop-breaking:

open-iscsi.service: Found ordering cycle on open-iscsi.service/start
open-iscsi.service: Found dependency on network-online.target/start
open-iscsi.service: Found dependency on networking.service/start
open-iscsi.service: Found dependency on local-fs.target/start
open-iscsi.service: Found dependency on lvm2-activation.service/start
open-iscsi.service: Found dependency on cryptsetup.target/start
open-iscsi.service: Found dependency on systemd-cryptsetup at sdb1_crypt.service/start
open-iscsi.service: Found dependency on dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device/start
open-iscsi.service: Found dependency on open-iscsi.service/start
open-iscsi.service: Breaking ordering cycle by deleting job network-online.target/start
network-online.target: Job network-online.target/start deleted to break ordering cycle starting with open-iscsi.service/start
open-iscsi.service: Found ordering cycle on open-iscsi.service/start
open-iscsi.service: Found dependency on network-online.target/start
open-iscsi.service: Found dependency on networking.service/start
open-iscsi.service: Found dependency on local-fs.target/start
open-iscsi.service: Found dependency on lvm2-activation.service/start
open-iscsi.service: Found dependency on cryptsetup.target/start
open-iscsi.service: Found dependency on systemd-cryptsetup at sdb1_crypt.service/start
open-iscsi.service: Found dependency on dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device/start
open-iscsi.service: Found dependency on open-iscsi.service/start
open-iscsi.service: Breaking ordering cycle by deleting job local-fs.target/start
local-fs.target: Job local-fs.target/start deleted to break ordering cycle starting with open-iscsi.service/start
open-iscsi.service: Found ordering cycle on open-iscsi.service/start
open-iscsi.service: Found dependency on network-online.target/start
open-iscsi.service: Found dependency on networking.service/start
open-iscsi.service: Found dependency on local-fs.target/start
open-iscsi.service: Found dependency on lvm2-activation.service/start
open-iscsi.service: Found dependency on cryptsetup.target/start
open-iscsi.service: Found dependency on systemd-cryptsetup at sdb1_crypt.service/start
open-iscsi.service: Found dependency on dev-disk-by\x2duuid-ae6b9263\x2dd63c\x2d4515\x2db7ce\x2d51e5cc4caa9f.device/start
open-iscsi.service: Found dependency on open-iscsi.service/start
open-iscsi.service: Breaking ordering cycle by deleting job cryptsetup.target/start
cryptsetup.target: Job cryptsetup.target/start deleted to break ordering cycle starting with open-iscsi.service/start

I tried various other things, including masking lvm2-activation.service
(safe because of lvmetad), overriding cryptsetup.target and explicitly
listing all the other systemd-cryptsetup@… services but the iscsi one,
overriding systemd-cryptsetup at sdb1_crypt.service altogether and removing
all references to cryptsetup.target, and so on. Nothing really worked.

Summary: I reduced the timeout and enabled lvmetad, and the volume was
correctly mounted after boot. I was not able to specify dependencies in
such a way as to remove the spurious timeout/dependency errors.

(I have not included open-iscsi configuration details here. The use of
lvmetad solved the "vgchange -aay" problem, and everything else related
to iscsi works fine. But I'm happy to provide more details if required,
please let me know.)

Thank you for your attention.

-- System Information:
Debian Release: 8.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.8.0-0.bpo.2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages open-iscsi depends on:
ii  debconf [debconf-2.0]  1.5.56
ii  init-system-helpers    1.22
ii  libc6                  2.19-18+deb8u6
ii  libisns0               0.96-5~bpo8+1
ii  libmount1              2.25.2-6
ii  lsb-base               4.1+Debian13+nmu1
ii  udev                   230-7~bpo8+2

Versions of packages open-iscsi recommends:
ii  busybox  1:1.22.0-9+deb8u1

open-iscsi suggests no packages.



More information about the Pkg-iscsi-maintainers mailing list