Bug#833528: multipath-tools: Multipathd reports failed paths as active, but with increased error count.

Panos Gotsis pgotsis at noc.grnet.gr
Fri Aug 5 13:13:50 UTC 2016


Package: multipath-tools
Version: 0.5.0-6+deb8u2
Severity: grave
Tags: newcomer
Justification: renders package unusable

Dear Maintainer,

We have our nodes installed with the latest patched jessie. The OS version comes with
multipath-tools-0.5.0-6+deb8u2. The system uses iscsi to connect to a SAN storage (Netapp).

The multipathd daemon starts properly and assembles the multipath devices for the iscsi LUNs.
However, when testing how it handles failures, we are seeing that for LUNs that we have set
offline or delete completely on our storage system, the output of multipath -l or, even
better, dmsetup status, is not showing these paths as failed, but as active, and it only
increments the fail_count counter.

This is an example output from one of our systems, where one out of the three LUNs has
been disabled through the storage management.


# dmsetup status --target multipath
3600a0980383036522d5d48687465724c: 0 10485760 multipath 2 0 0 0 1 1 A 0 2 0 8:32 A 0 8:80 A 0 
3600a0980383036522d5d48687465724a: 0 10485760 multipath 2 0 0 0 1 1 A 0 2 0 8:16 A 1 8:64 A 1 
3600a0980383036522d5d486874657252: 0 10485760 multipath 2 0 0 0 1 1 A 0 2 0 8:48 A 0 8:96 A 0 

The device with dm name 3600a0980383036522d5d48687465724a should be reported as failed. Running
the multipath daemon in forground mode and with increased verbosity, we are seeing the following
messages.

Aug 04 17:35:46 | sdg: get_state
Aug 04 17:35:46 | 8:96: tur checker starting up
Aug 04 17:35:46 | 8:96: tur checker finished, state down
Aug 04 17:35:46 | sdg: state = down
Aug 04 17:35:46 | sdg: checker msg is "tur checker reports path is down"
Aug 04 17:35:46 | 3600a0980383036522d5d48687465724a: disassemble map [1 retain_attached_hw_handler 0 1 1 round-robin 0 2 1 8:16 1 8:64 1 ]
Aug 04 17:35:46 | 3600a0980383036522d5d48687465724a: disassemble status [2 0 0 0 1 1 A 0 2 0 8:16 A 1 8:64 A 1 ]
Aug 04 17:35:46 | 3600a0980383036522d5d48687465724a: sdg - tur checker reports path is down
Aug 04 17:35:46 | sdg: mask = 0x8
Aug 04 17:35:46 | sdg: path state = running

It seems weird that the path state is set as running, when the tur checker marks the path as down.
Maybe I am missing the semantics of these messages, but in any case the output of the dmsetup status
is wrong.

We got the sources from stretch and built package version 0.6.1-3 for jessie. After installing the
packages and rebooting, we run the same test. We got the following output.


# dmsetup status --target multipath
3600a0980383036522d5d48687465724c: 0 10485760 multipath 2 0 1 0 1 1 A 0 2 0 8:32 A 0 8:80 A 0 
3600a0980383036522d5d48687465724a: 0 10485760 multipath 2 0 1 0 1 1 E 0 2 0 8:16 F 1 8:64 F 1 
3600a0980383036522d5d486874657252: 0 10485760 multipath 2 0 1 0 1 1 A 0 2 0 8:48 A 0 8:96 A 0 

Which is what we would expect.

I am unsure whether the issue is apparent only on iscsi LUNs or if it is affecting FC LUNs as
well. For good measure, we tweaked the timeouts of the iscsi daemon to much lower values so
that the possibility of some queued command not being processed promptly is ruled out. With
the upgrade though, the output of the status command is as it should be.

We are unsure about which is the exact patch on multipath-tools that fixed it upstream.

The 0.6.1-3 package was built as is on jessie, without changes to the control file, as the
package dependencies are already covered. However, the -k command line option is not
working. This is not a major problem, as the commands get properly processed anyway, but we
are also unsure whether it has any bugs.


-- Package-specific info:
Contents of /etc/multipath.conf:
defaults {
    user_friendly_names         no
    max_fds                     8192
    flush_on_last_del           yes
}
blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
      devnode "^sda[0-9]*$"
      devnode "^hd[a-z][0-9]*"
      devnode "^cciss.c[0-9]d[0-9].*"
}
devices {
    device {
        vendor                  "NETAPP"
        product                 "LUN"
        path_checker            tur
        path_grouping_policy    group_by_prio
        path_selector           "round-robin 0"
        features                "1 queue_if_no_path"
        prio                    ontap
        hardware_handler        "0"
        failback                immediate
        rr_weight               uniform
        rr_min_io               128
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
    }
}


-- System Information:
Debian Release: 8.5
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/40 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages multipath-tools depends on:
ii  initscripts         2.88dsf-59
ii  kpartx              0.5.0-6+deb8u2
ii  libaio1             0.3.110-1
ii  libc6               2.19-18+deb8u4
ii  libdevmapper1.02.1  2:1.02.90-2.2
ii  libgcc1             1:4.9.2-10
ii  libreadline6        6.3-8+b3
ii  libudev1            215-17+deb8u4
ii  lsb-base            4.1+Debian13+nmu1
ii  udev                215-17+deb8u4

multipath-tools recommends no packages.

Versions of packages multipath-tools suggests:
pn  multipath-tools-boot  <none>

-- Configuration Files:
/etc/init.d/multipath-tools-boot [Errno 2] No such file or directory: u'/etc/init.d/multipath-tools-boot'

-- no debconf information



More information about the pkg-lvm-maintainers mailing list