Bug#659762: lvm2: LVM commands freeze after snapshot delete fails

Urban Loesch bind at enas.net
Fri Jul 26 15:45:22 UTC 2013


Hi,

we had the same problems with Debian Wheezy, LVM2 and DRBD.
But this seems not DRBD related. It seems to be some problem between lvm and udevd.

See:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549691

Stopping udevd before taking the snapshot and starting after removing the
snapshot solved the problem for us. It's only a workaround, but it works for us.

Regards
Urban


Am 26.07.2013 17:14, schrieb Frank Steinborn:
> Hi,
>
> we are a bit further in debugging this. We installed a DELL PowerEdge r620 (same hardware as used in our DRBD-cluster where this problem happens). As
> noone in this thread brought DRBD into play, I didn't expect any interaction with it related to this bug. However, we were not able to reproduce with
> just LVM2 (eg. configure LV, do IO in LV, remove LV, hang.)
>
> So we installed a second machine and put DRBD on top of the LVs. And voila, as soon as we create a snapshot of the LV where DRBD is on top and remove
> this snapshot it fails ca. 1/3 of the time.
>
> Some facts:
>
> root at drbd-primary:~# lvremove --force /dev/vg0/lv0-snap
>    Unable to deactivate open vg0-lv0--snap-cow (254:3)
>    Failed to resume lv0-snap.
>    libdevmapper exiting with 1 device(s) still suspended.
>
> After this, "dmsetup info" gives the following output:
>
> <<< snip >>>
>
> Name:              vg0-lv0--snap
> State:             ACTIVE
> Read Ahead:        256
> Tables present:    LIVE
> Open count:        0
> Event number:      0
> Major, minor:      254, 1
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ
>
> Name:              vg0-lv0-real
> State:             ACTIVE
> Read Ahead:        0
> Tables present:    LIVE
> Open count:        1
> Event number:      0
> Major, minor:      254, 2
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j-real
>
> Name:              vg0-lv0
> State:             SUSPENDED
> Read Ahead:        256
> Tables present:    LIVE & INACTIVE
> Open count:        2
> Event number:      0
> Major, minor:      254, 0
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j
>
> Name:              vg0-lv0--snap-cow
> State:             ACTIVE
> Read Ahead:        0
> Tables present:    LIVE
> Open count:        0
> Event number:      0
> Major, minor:      254, 3
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ-cow
>
> <<< snap >>>
>
> As you can see, the real LV with DRBD on top is now in state SUSPENDED - which causes the cluster to be non-functional as IO operations stall on both
> the primary and secondary node until one does "dmsetup resume /dev/vg0/lv0".
>
> Another interesting issue we've seen: after doing "dmsetup resume /dev/vg0/lv0", lv0-snap doesn't appear to be a snapshot anymore, given the output of
> lvs (lv0-snap has no origin anymore):
>
>    LV       VG   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
>    lv0      vg0  -wi-ao-- 200.00g
>    lv0-snap vg0  -wi-a---  40.00g
>
>
> Some miscellaneous notes:
> * It _feels_ to only happen when the snapshot is filled at least something around 50-60%.
> * We can trigger something like this even without DRBD. When triggered however, the LV will never end up in SUSPENDED state and a second try of
> lvremove will always succeed.
>
> Thats all we have so far. I already had a private conversation with waldi at debian.org <mailto:waldi at debian.org> on this and we will (probably) provide
> him remote access on this system as soon as we have the setup reachable from the outside.
>
> Please let me know if I can provide any more information to get this fixed. I put drbd-dev in cc, maybe someone over there has an idea on this?
>
> @drbd-dev: system is debian wheezy, w/ drbd 8.3.11, lvm2 2.02.95.
>
> Thanks,
> Frank



More information about the pkg-lvm-maintainers mailing list