Bug#549660: md raid1 + lvm2 + snapshot resulted in lvm2 hang
Phil Ten
phil.info at dafweb.com
Mon Oct 5 10:30:01 UTC 2009
Package: lvm2
Version: 2.02.39-7
Severity: important
Hello,
This problem is sort of similar to #419209 but I believe
it is different because in my case the snapshot
was successfully created and the volume stalled
later while writing to the snapshot.
I use a proxmox 1.3:
Linux ns300364.ovh.net 2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 2009 x86_64 GNU/Linux
The system was installed a couple weeks ago and is not an upgrade.
Snapshot worked fine until the problem occured
despite no changes to the disks configuration.
My configuration:
md1: md RAID 1 + ext3 mounted as /
md0: md RAID 1 + lvm2 divided in 2 x ext3 volumes vmdata and vmbackups, mounted as /var/lib/vz and /backups.
root at ns300364:/backups/tmp# lvdisplay
--- Logical volume ---
LV Name /dev/data/vmdata
VG Name data
LV UUID 9CzFBp-k7fV-wlls-qeeG-v7Or-u1pq-9XhKKy
LV Write Access read/write
LV Status available
# open 1
LV Size 309.57 GB
Current LE 79250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:1
--- Logical volume ---
LV Name /dev/data/vmbackup
VG Name data
LV UUID jzCjXx-IodU-chBx-Aw3L-JUbv-dRho-vaOFCl
LV Write Access read/write
LV Status available
# open 1
LV Size 309.57 GB
Current LE 79250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:4
root at ns300364:~# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90
Creation Time : Tue Sep 15 17:48:43 2009
Raid Level : raid1
Array Size : 664986496 (634.18 GiB 680.95 GB)
Used Dev Size : 664986496 (634.18 GiB 680.95 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Oct 4 02:02:04 2009
State : active, recovering
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Rebuild Status : 29% complete
UUID : ab296276:ea3e622e:7008e345:84b8f442 (local to host ns300364.ov h.net)
Events : 0.17
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
Symptoms:
- snapshot creation for vmdata OK
- backup on vmbackups started OK
- after writing about 1Gb the snapshot stalled. I mean that all requests
to read files on the lvm volume will hang.
However "ls" and "cd" commands do work and I can get directories listing.
Any command to read a file content stall the ssh session (ex cat,cp,mv).
In particular, "cat /backups/phil.log" will also stall the ssh session.
Remember that the snapshot is for volume vmdata, and the "cat" above concern volume vmbackups.
- smartctl do not report any problem (including the long test)
- "wa" in "top" is blocked at 99%, cpu is near zero.
- the snapshot is visible in /dev/mapper
- the snapshot cannot be removed (lvremove -f). again no error reported. just hanged with no output at all.
- the system seem to work fine as long as nothing tries to read on one of the
2 lvm2 volumes.
- no error reported in messages or syslog.
- it seem a md check started after the snapshot creation. This check process
also stalled at 29% (speed=0K/sec). again no error reported.
- soft reboot did not work
- hard reboot worked. But a md resync started and stalled at 0.1% leaving the system in the same context as before the hard reboot.
To recover a working system I set sdb3 as faulty, removed sdb3 from the raid1
and hard rebooted. That worked and I could remove the snapshot and access the data
on both lvm volumes. Since then, I did not try to create a snapshot and system
seem to work fine.
Greetings,
Phil Ten
More information about the pkg-lvm-maintainers
mailing list