Bug#740701: multipath-tools: mkfs fails "Add. Sense: Incompatible medium installed"

Ritesh Raj Sarraf rrs at debian.org
Tue Mar 4 18:19:52 UTC 2014


Let's attemt the email in chunks.


On 03/04/2014 01:19 PM, Bill MacAllister wrote:
> Package: multipath-tools
> Version: 0.4.9+git0.9c3c5172-1
> Severity: important
>
> Dear Maintainer,
>
> * What led up to the situation?
>
> We use multipath connected devices on our OpenAFS servers.  We are in the 
> process of attempting to upgrade the systems to wheezy.  Our normal 
> procedure is to completely evacuate a server and rebuild it from scratch 
> and reconfigure the storage.

What is your Storage Array like ?

>      
> * What exactly did you do (or not do) that was effective (or
>   ineffective)?
>
> On the systems we have tested:
>
>   - mkfs on squeeze with a 2.6.32-5-amd64 kernel SUCCEEDS
>   - mkfs on squeeze with a 3.2 backport kernel FAILS
>   - mkfs on wheezy FAILS
>   - mkfs on jessie FAILS
>
> There is one exception to the above list.  We do have one wheezy system 
> where we are able to mkfs file systems on multipath served devices.  These 
> systems are built using FAI with the configuration be supplied by puppet.  
> We have not been able to figure out what is unique about this system.
>
> The current system we have available to test with has a qlogic HBA card.
> We have also tested with Emulex cards with the same results.

This alone doesn't help. Can you please provide the make model ?


>
> When mkfs fails multipathd starts spewing errors on the console:
>
> afssvr20:/etc/rsyslog.d# mkfs -t ext4 /dev/mapper/mpathb-part1
> mke2fs 1.42.9 (4-Feb-2014)
> [11712.731701] sd 3:0:0:20: [sdd] Unhandled sense code
> [11712.800335] sd 3:0:0:20: [sdd]
> [11712.838877] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [11712.905537] sd 3:0:0:20: [sdd]
> [11712.944077] Sense Key : Medium Error [current]
> [11712.998246] sd 3:0:0:20: [sdd]
> [11713.036788] Add. Sense: Incompatible medium installed
> [11713.097195] sd 3:0:0:20: [sdd] CDB:
> [11713.139903] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> [11713.212987] end_request: I/O error, dev sdd, sector 128
> [11713.276032] device-mapper: multipath: Failing path 8:48.
> [11713.340113] sd 4:0:1:20: [sdc] Unhandled sense code
> [11713.398455] sd 4:0:1:20: [sdc]
> [11713.437003] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [11713.503669] sd 4:0:1:20: [sdc]
> [11713.542218] Sense Key : Medium Error [current]
> [11713.596408] sd 4:0:1:20: [sdc]
> [11713.634948] Add. Sense: Incompatible medium installed
> [11713.695357] sd 4:0:1:20: [sdc] CDB:
> [11713.738077] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> [11713.811160] end_request: I/O error, dev sdc, sector 128
> [11713.874213] device-mapper: multipath: Failing path 8:32.
> [11713.938320] sd 3:0:1:20: [sde] Unhandled sense code
> [11713.996662] sd 3:0:1:20: [sde]
> [11714.035206] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [11714.101868] sd 3:0:1:20: [sde]
> [11714.140413] Sense Key : Medium Error [current]
> [11714.194590] sd 3:0:1:20: [sde]
> [11714.233136] Add. Sense: Incompatible medium installed
> [11714.293553] sd 3:0:1:20: [sde] CDB:
> [11714.336268] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> [11714.409350] end_request: I/O error, dev sde, sector 128
> [11714.472357] device-mapper: multipath: Failing path 8:64.
> [11714.536430] sd 4:0:0:20: [sdb] Unhandled sense code
> [11714.594768] sd 4:0:0:20: [sdb]
> [11714.633306] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [11714.699970] sd 4:0:0:20: [sdb]
> [11714.738506] Sense Key : Medium Error [current]
> [11714.792681] sd 4:0:0:20: [sdb]
> [11714.831221] Add. Sense: Incompatible medium installed
> [11714.891625] sd 4:0:0:20: [sdb] CDB:
> [11714.934333] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> [11715.007395] end_request: I/O error, dev sdb, sector 128
> [11715.070374] device-mapper: multipath: Failing path 8:16.
> [11715.134475] device-mapper: multipath: Failing path 8:48.
> [11715.134690] sd 4:0:1:20: [sdc] Unhandled sense code
> [11715.134691] sd 4:0:1:20: [sdc]
> [11715.134692] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> .....

Hmmm!!! If I make the count, from these logs, it shows that all your
paths went faulty.
Either, you had permanent faulty paths, or else the status of the paths
wasn't updated because of a bug.


> Here is the output from multipath -ll -v3

A query with -v3 will alwasy instruct multipath to go update its status
entries. So while you might think it was reporting the current status,
it actually is reporting the new status.

>
> Mar 03 23:16:44 | loading /lib/multipath/libcheckdirectio.so checker
> Mar 03 23:16:44 | loading /lib/multipath/libprioconst.so prioritizer
> Mar 03 23:16:44 | sda: udev property ID_WWN whitelisted
> Mar 03 23:16:44 | sda: not found in pathvec
> Mar 03 23:16:44 | sda: mask = 0x25
> Mar 03 23:16:44 | sda: dev_t = 8:0
> Mar 03 23:16:44 | sda: size = 142082048
> Mar 03 23:16:44 | sda: vendor = DELL    
> Mar 03 23:16:44 | sda: product = PERC 6/i        
> Mar 03 23:16:44 | sda: rev = 1.21
> Mar 03 23:16:44 | sda: h:b:t:l = 0:2:0:0
> Mar 03 23:16:44 | sda: path state = running
>
> Mar 03 23:16:44 | sda: get_state
> Mar 03 23:16:44 | sda: path checker = directio (internal default)
> Mar 03 23:16:44 | sda: checker timeout = 90 ms (sysfs setting)
> Mar 03 23:16:44 | directio: starting new request
> Mar 03 23:16:44 | directio: io finished 4096/0
> Mar 03 23:16:44 | sda: state = up
> Mar 03 23:16:44 | sdc: udev property ID_WWN whitelisted
> Mar 03 23:16:44 | sdc: not found in pathvec
> Mar 03 23:16:44 | sdc: mask = 0x25
> Mar 03 23:16:44 | sdc: dev_t = 8:32
> Mar 03 23:16:44 | sdc: size = 11488198656
> Mar 03 23:16:44 | sdc: vendor = NETAPP  
> Mar 03 23:16:44 | sdc: product = LUN             
> Mar 03 23:16:44 | sdc: rev = 811a
> Mar 03 23:16:44 | sdc: h:b:t:l = 3:0:0:20
> Mar 03 23:16:44 | SCSI target 3:0:0 -> FC rport 3:0-0
> Mar 03 23:16:44 | sdc: tgt_node_name = 0x500a09808d44a0ea
> Mar 03 23:16:44 | sdc: path state = running

Ah!!! You're on NetApp Array. :-)

> Mar 03 23:16:44 | sdc: get_state
> Mar 03 23:16:44 | loading /lib/multipath/libchecktur.so checker
> Mar 03 23:16:44 | sdc: path checker = tur (controller setting)
> Mar 03 23:16:44 | sdc: checker timeout = 30 ms (sysfs setting)
> Mar 03 23:16:44 | sdc: state = up
> Mar 03 23:16:44 | sde: udev property ID_WWN whitelisted
> Mar 03 23:16:44 | sde: not found in pathvec
> Mar 03 23:16:44 | sde: mask = 0x25
> Mar 03 23:16:44 | sde: dev_t = 8:64
> Mar 03 23:16:44 | sde: size = 11488198656
> Mar 03 23:16:44 | sde: vendor = NETAPP  
> Mar 03 23:16:44 | sde: product = LUN             
> Mar 03 23:16:44 | sde: rev = 811a
> Mar 03 23:16:44 | sde: h:b:t:l = 3:0:1:20
> Mar 03 23:16:44 | SCSI target 3:0:1 -> FC rport 3:0-1
> Mar 03 23:16:44 | sde: tgt_node_name = 0x500a09808d44a0ea
> Mar 03 23:16:44 | sde: path state = running
> Mar 03 23:16:44 | sde: get_state
> Mar 03 23:16:44 | sde: path checker = tur (controller setting)
> Mar 03 23:16:44 | sde: checker timeout = 30 ms (sysfs setting)
> Mar 03 23:16:44 | sde: state = up
> Mar 03 23:16:44 | sdb: udev property ID_WWN whitelisted
> Mar 03 23:16:44 | sdb: not found in pathvec
> Mar 03 23:16:44 | sdb: mask = 0x25
> Mar 03 23:16:44 | sdb: dev_t = 8:16
> Mar 03 23:16:44 | sdb: size = 11488198656
> Mar 03 23:16:44 | sdb: vendor = NETAPP  
> Mar 03 23:16:44 | sdb: product = LUN             
> Mar 03 23:16:44 | sdb: rev = 811a
> Mar 03 23:16:44 | sdb: h:b:t:l = 4:0:0:20
> Mar 03 23:16:44 | SCSI target 4:0:0 -> FC rport 4:0-0
> Mar 03 23:16:44 | sdb: tgt_node_name = 0x500a09808d44a0ea
> Mar 03 23:16:44 | sdb: path state = running
>
> Mar 03 23:16:44 | sdb: get_state
> Mar 03 23:16:44 | sdb: path checker = tur (controller setting)
> Mar 03 23:16:44 | sdb: checker timeout = 30 ms (sysfs setting)
> Mar 03 23:16:44 | sdb: state = up
> Mar 03 23:16:44 | sdd: udev property ID_WWN whitelisted
> Mar 03 23:16:44 | sdd: not found in pathvec
> Mar 03 23:16:44 | sdd: mask = 0x25
> Mar 03 23:16:44 | sdd: dev_t = 8:48
> Mar 03 23:16:44 | sdd: size = 11488198656
> Mar 03 23:16:44 | sdd: vendor = NETAPP  
> Mar 03 23:16:44 | sdd: product = LUN             
> Mar 03 23:16:44 | sdd: rev = 811a
> Mar 03 23:16:44 | sdd: h:b:t:l = 4:0:1:20
> Mar 03 23:16:44 | SCSI target 4:0:1 -> FC rport 4:0-1
> Mar 03 23:16:44 | sdd: tgt_node_name = 0x500a09808d44a0ea
> Mar 03 23:16:44 | sdd: path state = running
>
> Mar 03 23:16:44 | sdd: get_state
> Mar 03 23:16:44 | sdd: path checker = tur (controller setting)
> Mar 03 23:16:44 | sdd: checker timeout = 30 ms (sysfs setting)
> Mar 03 23:16:44 | sdd: state = up
> Mar 03 23:16:44 | sr0: blacklisted, udev property missing
> Mar 03 23:16:44 | dm-0: blacklisted, udev property missing
> Mar 03 23:16:44 | dm-1: blacklisted, udev property missing
> Mar 03 23:16:44 | dm-2: blacklisted, udev property missing
> ===== paths list =====
> uuid hcil     dev dev_t pri dm_st chk_st vend/prod/rev             dev_st 
>      0:2:0:0  sda 8:0   -1  undef ready  DELL    ,PERC 6/i         running
>      3:0:0:20 sdc 8:32  -1  undef ready  NETAPP  ,LUN              running
>      3:0:1:20 sde 8:64  -1  undef ready  NETAPP  ,LUN              running
>      4:0:0:20 sdb 8:16  -1  undef ready  NETAPP  ,LUN              running
> Mar 03 23:16:44 | params = 4 queue_if_no_path pg_init_retries 50 retain_attached
> _hw_handler 0 2 1 service-time 0 2 2 8:64 1 1 8:16 1 1 service-time 0 2 2 8:32 1
>  1 8:48 1 1 
> Mar 03 23:16:44 | status = 2 0 0 0 2 1 A 0 2 2 8:64 A 0 0 1 8:16 A 0 0 1 E 0 2 2
>  8:32 A 0 0 1 8:48 A 0 0 1 
> Mar 03 23:16:44 | mpathb: disassemble map [4 queue_if_no_path pg_init_retries 50
>  retain_attached_hw_handler 0 2 1 service-time 0 2 2 8:64 1 1 8:16 1 1 service-t
> ime 0 2 2 8:32 1 1 8:48 1 1 ]
> Mar 03 23:16:44 | sde: mask = 0x8
> Mar 03 23:16:44 | sde: path state = running
>
> Mar 03 23:16:44 | sde: detect_prio = 2 (controller default)
> Mar 03 23:16:44 | loading /lib/multipath/libprioalua.so prioritizer
> Mar 03 23:16:44 | sde: prio = alua (detected setting)
> Mar 03 23:16:44 | reported target port group is 0
> Mar 03 23:16:44 | aas = 00 [active/optimized]
> Mar 03 23:16:44 | sde: alua prio = 50
> Mar 03 23:16:44 | sdb: mask = 0x8
> Mar 03 23:16:44 | sdb: path state = running
>
> Mar 03 23:16:44 | sdb: detect_prio = 2 (controller default)
> Mar 03 23:16:44 | sdb: prio = alua (detected setting)
> Mar 03 23:16:44 | reported target port group is 0
> Mar 03 23:16:44 | aas = 00 [active/optimized]
> Mar 03 23:16:44 | sdb: alua prio = 50
> Mar 03 23:16:44 | sdc: mask = 0x8
> Mar 03 23:16:44 | sdc: path state = running
>
> Mar 03 23:16:44 | sdc: detect_prio = 2 (controller default)
> Mar 03 23:16:44 | sdc: prio = alua (detected setting)
> Mar 03 23:16:44 | reported target port group is 1
> Mar 03 23:16:44 | aas = 01 [active/non-optimized]
> Mar 03 23:16:44 | sdc: alua prio = 10
> Mar 03 23:16:44 | sdd: mask = 0x8
> Mar 03 23:16:44 | sdd: path state = running
>
> Mar 03 23:16:44 | sdd: detect_prio = 2 (controller default)
> Mar 03 23:16:44 | sdd: prio = alua (detected setting)
> Mar 03 23:16:44 | reported target port group is 1
> Mar 03 23:16:44 | aas = 01 [active/non-optimized]
> Mar 03 23:16:44 | sdd: alua prio = 10
> Mar 03 23:16:44 | mpathb: disassemble status [2 0 0 0 2 1 A 0 2 2 8:64 A 0 0 1 8
> :16 A 0 0 1 E 0 2 2 8:32 A 0 0 1 8:48 A 0 0 1 ]
> mpathb (360a9800064704842534a6c5063513577) dm-0 NETAPP  ,LUN             
> size=5.3T features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_han
> dle' hwhandler='0' wp=rw
> |-+- policy='service-time 0' prio=50 status=active
> | |- 3:0:1:20 sde 8:64 active ready running
> | `- 4:0:0:20 sdb 8:16 active ready running
> `-+- policy='service-time 0' prio=10 status=enabled
>   |- 3:0:0:20 sdc 8:32 active ready running
>   `- 4:0:1:20 sdd 8:48 active ready running

And as expected, now everything is perfect.

From what I look into your logs, it is hard to conclude the cause. It
looks like a case of stale path status, which actually failed, when you
did a write() using the mkfs command.
If you have the resources, you may want to try the directio path checker
to see if it provide the real status.

Also at the time when mkfs failed, was the SCSI device responding ? I
mean the TUR command definitely was, but what about, say, running the
mkfs directly on the SCSI device ??

The other observation is that the SCSI devices have an ALUA priority
defined (inherited the default, I guess), whereas your mulitpath
hardware handler is set to none. For NetApp, on the newer ONTAP
versions, it is highly recommended to use ALUA.


The conclusion is simple. You've seen mkfs fail because all the paths
were faulty.

-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-lvm-maintainers/attachments/20140304/e0560e4f/attachment.sig>


More information about the pkg-lvm-maintainers mailing list