[Virtual-pkg-base-maintainers] Bug#759706: base: bad write performance on software RAID 1
Dominique Barton
dbarton at confirm.ch
Fri Aug 29 15:28:29 UTC 2014
Package: base
Severity: normal
Tags: lfs
Hi
Sorry I don't know if this bug is part of the base package or the madm package.
I hope base is OK, because I thought mdadm is only for the mdadm util and not for the md/dm itself.
I've some serious performance issues w/ my software RAID 1, especially write performance.
This is my setup:
LV/ext4 (mounted w/ noatime)
--------------------
lvm (VG ssdvg on PV /dev/md0)
--------------------
/dev/md0 (RAID 1)
--------------------
/dev/sda1 /dev/sdb1 (identical SSDs)
The monitoring sent a lot of alerts in the past, because of a high I/O wait CPU usage.
So I started to have a look at the raid by testing the performance via iozone.
The results were aweful, because I got approx. 4-9MB/s write performance (w/ direct I/O enabled).
I thought one of the SSDs is corrupt and tested them both separately (S.M.A.R.T. and iozone),
but they seem OK.
There is something odd w/ the RAID /dev/md0, but I don't have any test scenarios left :(
Here's my test setup:
~snip~
# remove one SSD from active RAID
mdadm --manage /dev/md0 --set-faulty /dev/sda2
mdadm --manage /dev/md0 --remove /dev/sda2
# create new RAID w/ only one SSD
mdadm --create /dev/md1 --level=mirror --raid-devices=2 /dev/sda2 missing
# create new PV / VG
pvcreate /dev/md1
vgcreate testvg /dev/md1
# now I've two RAID 1 devices, each one w/ a single SSD device
# create some LVs for testing
lvcreate -L5G -niozone ssdvg # this is the origin VG/PV/RAID w/ performance issues
mkfs.ext4 /dev/mapper/ssdvg-iozone
lvcreate -L5G -niozone testvg # this is the new VG/PV/RAID
mkfs.ext4 /dev/mapper/testvg-iozone
# mount the LVs
mkdir -p /mnt/iozone-ssdvg
mount /dev/mapper/ssdvg-iozone !$
mkdir -p /mnt/iozone-testvg
mount /dev/mapper/testvg-iozone !$
~snap~
With the test setup prepared, I was ready to run iozone:
~snip~
iozone -t 2 -F /mnt/iozone-ssdvg/file1 /mnt/iozone-ssdvg/file2 -s 1g -r 4m -i 0 -I
...
Children see throughput for 2 initial writers = 19015.39 KB/sec
Parent sees throughput for 2 initial writers = 18931.88 KB/sec
Min throughput per process = 9385.20 KB/sec
Max throughput per process = 9630.19 KB/sec
Avg throughput per process = 9507.69 KB/sec
Min xfer = 1024000.00 KB
...
iozone -t 2 -F /mnt/iozone-ssdvg/file1 /mnt/iozone-ssdvg/file2 -s 1g -r 4m -i 0 -I
...
Children see throughput for 2 initial writers = 267037.86 KB/sec
Parent sees throughput for 2 initial writers = 266448.39 KB/sec
Min throughput per process = 133126.77 KB/sec
Max throughput per process = 133911.09 KB/sec
Avg throughput per process = 133518.93 KB/sec
Min xfer = 1044480.00 KB
...
~snap~
As you can see, the write performance on /dev/md0 is ~ 9MB/s while /dev/md1 is ~130MB/s.
I switched the disks (/dev/sda1 back to /dev/md0, resync, removed /dev/sdb1 from /dev/md0, created /dev/md1 with /dev/sdb1) and started the same tests again.
Of course I was surprised when I saw the same results: /dev/md0 was still slow, while /dev/md1 rocked!
So the problem is not on the SSD itself, because /dev/md0 is always slow as hell, even when I switch the disks.
I also checked both RAIDs for differences:
~snip~
mdadm --detail /dev/md0 > /tmp/details.md0
mdadm --detail /dev/md1 > /tmp/details.md1
root at asteria:/mnt# diff /tmp/details.md0 /tmp/details.md1
1c1
< /dev/md0:
---
> /dev/md1:
3c3
< Creation Time : Wed Jan 15 22:09:46 2014
---
> Creation Time : Fri Aug 29 15:55:22 2014
11,12c11,12
< Update Time : Fri Aug 29 16:01:57 2014
< State : active, degraded
---
> Update Time : Fri Aug 29 15:59:39 2014
> State : clean, degraded
18,20c18,20
< Name : asteria:0 (local to host asteria)
< UUID : fd0c6a27:fddfb55b:3cc9c9d7:c9b59e8f
< Events : 61833
---
> Name : asteria:1 (local to host asteria)
> UUID : 88f5f694:c2e0e9ba:82c24871:06368106
> Events : 34
23,24c23,24
< 0 0 0 0 removed
< 2 8 18 1 active sync /dev/sdb2
---
> 0 8 2 0 active sync /dev/sda2
> 1 0 0 1 removed
~snap~
Nothing there...
Disks are looking good too:
~snip~
root at asteria:/mnt# smartctl -l selftest /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5407 -
# 2 Short offline Completed without error 00% 5383 -
# 3 Short offline Completed without error 00% 5359 -
# 4 Short offline Completed without error 00% 5335 -
# 5 Extended offline Completed without error 00% 5292 -
# 6 Short offline Completed without error 00% 5288 -
# 7 Short offline Completed without error 00% 5264 -
# 8 Short offline Completed without error 00% 5217 -
# 9 Short offline Completed without error 00% 5193 -
#10 Short offline Completed without error 00% 5169 -
#11 Short offline Completed without error 00% 5145 -
#12 Extended offline Completed without error 00% 5126 -
#13 Short offline Completed without error 00% 5121 -
#14 Extended offline Completed without error 00% 5108 -
#15 Short offline Completed without error 00% 5104 -
#16 Short offline Aborted by host 90% 5104 -
root at asteria:/mnt# smartctl -l selftest /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5407 -
# 2 Short offline Completed without error 00% 5383 -
# 3 Short offline Completed without error 00% 5359 -
# 4 Short offline Completed without error 00% 5335 -
# 5 Extended offline Completed without error 00% 5297 -
# 6 Short offline Completed without error 00% 5288 -
# 7 Short offline Completed without error 00% 5264 -
# 8 Short offline Completed without error 00% 5217 -
# 9 Short offline Completed without error 00% 5193 -
#10 Short offline Completed without error 00% 5169 -
#11 Short offline Completed without error 00% 5145 -
#12 Extended offline Completed without error 00% 5132 -
#13 Short offline Completed without error 00% 5121 -
#14 Extended offline Completed without error 00% 5114 -
#15 Short offline Completed without error 00% 5105 -
#16 Short offline Aborted by host 70% 5105 -
root at asteria:/mnt# parted /dev/sda print
Model: ATA Samsung SSD 840 (scsi)
Disk /dev/sda: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 128MB 127MB fat32 boot
2 128MB 256GB 256GB ext4 raid
root at asteria:/mnt# parted /dev/sdb print
Model: ATA Samsung SSD 840 (scsi)
Disk /dev/sdb: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 128MB 127MB fat32 boot
2 128MB 256GB 256GB raid
~snap~
I/O usage is also very low (dm-28 is an iSCSI LUN and not on stored on /dev/md0):
~snip~
root at asteria:/mnt# iostat -d -m 10
...
sda 0.00 0.00 0.00 0 0
sdb 24.30 0.00 0.15 0 1
md0 48.00 0.00 0.15 0 1
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.00 0.00 0.00 0 0
dm-4 0.00 0.00 0.00 0 0
dm-5 0.00 0.00 0.00 0 0
dm-6 0.00 0.00 0.00 0 0
dm-7 0.00 0.00 0.00 0 0
dm-8 5.70 0.00 0.01 0 0
dm-9 1.00 0.00 0.00 0 0
dm-10 0.00 0.00 0.00 0 0
dm-11 33.50 0.00 0.11 0 1
dm-12 2.40 0.00 0.01 0 0
dm-13 0.00 0.00 0.00 0 0
dm-14 5.40 0.00 0.02 0 0
dm-15 0.00 0.00 0.00 0 0
dm-16 0.00 0.00 0.00 0 0
dm-17 0.00 0.00 0.00 0 0
sdc 17.30 0.00 0.82 0 8
sdd 0.00 0.00 0.00 0 0
dm-18 0.00 0.00 0.00 0 0
dm-19 0.00 0.00 0.00 0 0
dm-20 0.00 0.00 0.00 0 0
dm-21 28.50 0.00 0.11 0 1
dm-22 0.00 0.00 0.00 0 0
dm-23 0.00 0.00 0.00 0 0
dm-24 0.00 0.00 0.00 0 0
dm-25 0.00 0.00 0.00 0 0
dm-26 0.00 0.00 0.00 0 0
dm-27 0.00 0.00 0.00 0 0
dm-28 182.30 0.00 0.71 0 7
md1 0.00 0.00 0.00 0 0
dm-29 0.00 0.00 0.00 0 0
dm-30 0.00 0.00 0.00 0 0
~snap~
Any ideas or hints where the write issues are coming from, or what I can test next?
Cheers
Domi
-- System Information:
Debian Release: 7.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
More information about the Virtual-pkg-base-maintainers
mailing list