Bug#659762: lvm2: LVM commands freeze after snapshot delete fails

Frank Steinborn steinex at nognu.de
Fri Jul 26 15:14:00 UTC 2013


Hi,

we are a bit further in debugging this. We installed a DELL PowerEdge r620
(same hardware as used in our DRBD-cluster where this problem happens). As
noone in this thread brought DRBD into play, I didn't expect any
interaction with it related to this bug. However, we were not able to
reproduce with just LVM2 (eg. configure LV, do IO in LV, remove LV, hang.)

So we installed a second machine and put DRBD on top of the LVs. And voila,
as soon as we create a snapshot of the LV where DRBD is on top and remove
this snapshot it fails ca. 1/3 of the time.

Some facts:

root at drbd-primary:~# lvremove --force /dev/vg0/lv0-snap
  Unable to deactivate open vg0-lv0--snap-cow (254:3)
  Failed to resume lv0-snap.
  libdevmapper exiting with 1 device(s) still suspended.

After this, "dmsetup info" gives the following output:

<<< snip >>>

Name:              vg0-lv0--snap
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      254, 1
Number of targets: 1
UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ

Name:              vg0-lv0-real
State:             ACTIVE
Read Ahead:        0
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      254, 2
Number of targets: 1
UUID:
LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j-real

Name:              vg0-lv0
State:             SUSPENDED
Read Ahead:        256
Tables present:    LIVE & INACTIVE
Open count:        2
Event number:      0
Major, minor:      254, 0
Number of targets: 1
UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j

Name:              vg0-lv0--snap-cow
State:             ACTIVE
Read Ahead:        0
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      254, 3
Number of targets: 1
UUID:
LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ-cow

<<< snap >>>

As you can see, the real LV with DRBD on top is now in state SUSPENDED -
which causes the cluster to be non-functional as IO operations stall on
both the primary and secondary node until one does "dmsetup resume
/dev/vg0/lv0".

Another interesting issue we've seen: after doing "dmsetup resume
/dev/vg0/lv0", lv0-snap doesn't appear to be a snapshot anymore, given the
output of lvs (lv0-snap has no origin anymore):

  LV       VG   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  lv0      vg0  -wi-ao-- 200.00g
  lv0-snap vg0  -wi-a---  40.00g


Some miscellaneous notes:
* It _feels_ to only happen when the snapshot is filled at least something
around 50-60%.
* We can trigger something like this even without DRBD. When triggered
however, the LV will never end up in SUSPENDED state and a second try of
lvremove will always succeed.

Thats all we have so far. I already had a private conversation with
waldi at debian.org on this and we will (probably) provide him remote access
on this system as soon as we have the setup reachable from the outside.

Please let me know if I can provide any more information to get this fixed.
I put drbd-dev in cc, maybe someone over there has an idea on this?

@drbd-dev: system is debian wheezy, w/ drbd 8.3.11, lvm2 2.02.95.

Thanks,
Frank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-lvm-maintainers/attachments/20130726/53555b8f/attachment.html>


More information about the pkg-lvm-maintainers mailing list