Bug#686189: pvmove hungs after moving one of many LVs, all IO on affected LVs hungs

Pokotilenko Kostik casper at meteor.dp.ua
Wed Aug 29 17:59:46 UTC 2012


Package: lvm2
Version: 2.02.66-5
Severity: critical
Tags: upstream

First of all, links to this problem:
https://bugzilla.redhat.com/show_bug.cgi?id=602516
https://bugzilla.redhat.com/show_bug.cgi?id=706036

As stated in above links this bug is fixed upstream in 2.02.86, squeeze still
have 2.02.66 and as I see without a fix backported.

Also as stated in above links there is a workaround for this bug - move one LV at a time, i.e.:
pvmove -i0 -n $lvname

Now about my story...

I recently upgraded to squeeze and wanted to utilize the ability of new grub-pc to boot from
lvm over raid1 directly. This way I could migrate my system from partitions to lvm over raid
and have failover over 3 drives in raid1.

To do so I had to convert one drive at a time. I have seccessfully converted 2 drives and
started to convert 3rd one in the exactly the same way which resulted in pvmove hang and
hang of all IOs to all affected LVs.

The layout before converting 3rd drive was:

sda[1,2,5,6,7]: old unused system partitions
sda8: LVM PV of VGa
sda9: Raid5 member

sdb1: Raid1 member
sdb3: LVM PV of VGb
sdb4: Raid5 member

sdc1: Raid1 member
sdc3: LVM PV of VGc
sdc4: Raid5 member

md0: Raid5 of sda9, sdb4, sdc4
md1: Raid1 of missing, sdb1, sdc1

md0: PV of VGraid5
md1: PV of VGsystem

As I did not have spare drive to move to I used LV on VGraid5 as PV for VGa:

lvcreate -n pvVGa VGraid5
pvcreate /dev/mapper/VGraid5-pvVGa
vgextend VGa /dev/mapper/VGraid5-pvVGa

And stared the move:

pvmove /dev/sda8 /dev/mapper/VGraid5-pvVGa

The proccess of moving data stared, but stuck at about 26%.
At this point pvmove, any other lvm commands and all processes tried to access
LVs on VGa resulted in D state and I was not been able to kill -9 them.

Any other attempt to access data on VGa resulted in hang, devices IO queue never
decreased and 100% device usage with no read/write activity was shown in atop
(no dequeue was performed).

The system did not responded to reboot/shutdown and to resolve this problem
quickly I have to shutdown whatever I can and hard-reset. I was not able even to
sync as it also hanged.

After reboot VGa did not activated as pvmove process left in inconsistent state.
As I figured out, first LV in VGa has successfully moved, mirror has created to
second one and this mirror left out of sync.

pvmove --abort canceled the move and I was able to activate VGa. All data was
there, safe.

Then I made sure nothing is accessing VGa and started pvmove with the same command
and it successfully finished the move.

During first failed move VGa was actively accessed by one of kvm guests which I
think in addition to multi-leveled dm/lv/pv/lv setup caused suspend/lock race,
which is explained in the links above.

Squeeze should provide solution to this problem as LVM considered production stable
and many people rely on this.

-- System Information:
Debian Release: 6.0.5
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-0.bpo.2-amd64 (SMP w/8 CPU cores)
Locale: LANG=ru_UA.UTF-8, LC_CTYPE=ru_UA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages lvm2 depends on:
ii  dmsetup                 2:1.02.48-5      The Linux Kernel Device Mapper use
ii  libc6                   2.11.3-3         Embedded GNU C Library: Shared lib
ii  libdevmapper1.02.1      2:1.02.48-5      The Linux Kernel Device Mapper use
ii  libreadline5            5.2-7            GNU readline and history libraries
ii  libudev0                164-3            libudev shared library
ii  lsb-base                3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip

lvm2 recommends no packages.

lvm2 suggests no packages.

-- no debconf information



More information about the pkg-lvm-maintainers mailing list