Bug#578352: closed by martin f krafft <madduck at debian.org> (Re: Bug#578352: mdadm: failed devices become spares!)
Pierre Vignéras
pierre at vigneras.name
Tue Apr 27 18:17:36 UTC 2010
On mardi 20 avril 2010, you wrote:
> also sprach Pierre Vignéras <pierre at vigneras.name> [2010.04.20.1317 +0200]:
> > Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device
> > /dev/md2, component device /dev/sdf1
> > Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md
> > device /dev/md2, component device /dev/sdf1
> >
> > And at that time I was neither logged in nor did I touch to
> > that NFS server (neither the USB drives, nor the server itself).
> > Actually, I discovered the problem the day after. So the first question
> > is:
> >
> > is it normal that after a failure detected on /dev/sdf1 it becomes
> > a spare (again if I understand the syslog message correctly)?
>
> It seems like the drive goes offline and comes back, and then
> I think it would be normal. Are there no kernel messages about this?
Well, here is the content of my kern.log:
Apr 12 19:22:44 phobos kernel: [5768580.538554] ip_tables: (C) 2000-2006
Netfilter Core Team
Apr 12 20:10:02 phobos kernel: [5771419.310123] sd 5:0:0:0: [sdf] Result:
hostbyte=DID_ERROR
driverbyte=DRIVER_OK,S
UGGEST_RETRY
Apr 12 20:10:02 phobos kernel: [5771419.310144] end_request: I/O error, dev
sdf, sector 115347706
Apr 12 20:10:02 phobos kernel: [5771419.310156] raid10: Disk failure on sdf1,
disabling device.
Apr 12 20:10:02 phobos kernel: [5771419.310158] raid10: Operation continuing
on 3 devices.
Apr 12 20:10:02 phobos kernel: [5771419.323466] RAID10 conf printout:
Apr 12 20:10:02 phobos kernel: [5771419.323480] --- wd:3 rd:4
Apr 12 20:10:02 phobos kernel: [5771419.323488] disk 0, wo:0, o:1, dev:sdd1
Apr 12 20:10:02 phobos kernel: [5771419.323495] disk 1, wo:1, o:0, dev:sdf1
Apr 12 20:10:02 phobos kernel: [5771419.323501] disk 2, wo:0, o:1, dev:sdc1
Apr 12 20:10:02 phobos kernel: [5771419.323508] disk 3, wo:0, o:1, dev:sde1
Apr 12 20:10:02 phobos kernel: [5771419.323801] RAID10 conf printout:
Apr 12 20:10:02 phobos kernel: [5771419.323813] --- wd:3 rd:4
Apr 12 20:10:02 phobos kernel: [5771419.323820] disk 0, wo:0, o:1, dev:sdd1
Apr 12 20:10:02 phobos kernel: [5771419.323826] disk 2, wo:0, o:1, dev:sdc1
Apr 12 20:10:02 phobos kernel: [5771419.323833] disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.091249] sd 2:0:0:0: [sdd] Result:
hostbyte=DID_ERROR
driverbyte=DRIVER_OK,S
UGGEST_RETRY
Apr 13 08:00:02 phobos kernel: [5814019.091272] end_request: I/O error, dev
sdd, sector 115351425
Apr 13 08:00:02 phobos kernel: [5814019.091283] raid10: Disk failure on sdd1,
disabling device.
Apr 13 08:00:02 phobos kernel: [5814019.091285] raid10: Operation continuing
on 2 devices.
Apr 13 08:00:02 phobos kernel: [5814019.110225] md: recovery of RAID array md2
Apr 13 08:00:02 phobos kernel: [5814019.110250] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
Apr 13 08:00:02 phobos kernel: [5814019.110265] md: using maximum available
idle IO bandwidth (but not more than
20
0000 KB/sec) for recovery.
Apr 13 08:00:02 phobos kernel: [5814019.110293] md: using 128k window, over a
total of 312568576 blocks.
Apr 13 08:00:02 phobos kernel: [5814019.110308] md: resuming recovery of md2
from checkpoint.
Apr 13 08:00:02 phobos kernel: [5814019.110323] md: md2: recovery done.
Apr 13 08:00:02 phobos kernel: [5814019.133498] sd 2:0:0:0: [sdd] Result:
hostbyte=DID_ERROR
driverbyte=DRIVER_OK,S
UGGEST_RETRY
Apr 13 08:00:02 phobos kernel: [5814019.133533] end_request: I/O error, dev
sdd, sector 115351428
Apr 13 08:00:02 phobos kernel: [5814019.133842] I/O error in filesystem
("dm-7") meta-data dev dm-7 block
0x1403d63
("xlog_iodone") error 5 buf count 32768
Apr 13 08:00:02 phobos kernel: [5814019.133876] xfs_force_shutdown(dm-7,0x2)
called from line 1026 of file
fs/xfs/x
fs_log.c. Return address = 0xf8a351e2
Apr 13 08:00:02 phobos kernel: [5814019.133942] Filesystem "dm-7": Log I/O
Error Detected. Shutting down
filesyste
m: dm-7
Apr 13 08:00:02 phobos kernel: [5814019.133966] Please umount the filesystem,
and rectify the problem(s)
Apr 13 08:00:02 phobos kernel: [5814019.136669] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.136690] --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.136704] disk 0, wo:1, o:0, dev:sdd1
Apr 13 08:00:02 phobos kernel: [5814019.136718] disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.136731] disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.139509] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.139529] --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.139542] disk 0, wo:1, o:0, dev:sdd1
Apr 13 08:00:02 phobos kernel: [5814019.139556] disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.139569] disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.140077] xfs_force_shutdown(dm-7,0x2)
called from line 789 of file
fs/xfs/xfs_log.c. Return address = 0xf8a36400
Apr 13 08:00:02 phobos kernel: [5814019.140376] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.140394] --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.140408] disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.140421] disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.143330] nfsd: non-standard errno: 5
Apr 13 08:00:02 phobos kernel: [5814019.143806] nfsd: non-standard errno: 5
Apr 13 08:00:02 phobos kernel: [5814019.144412] nfsd: non-standard errno: 5
[...and so on...]
Apr 13 08:00:03 phobos kernel: [5814019.732063] nfsd: non-standard errno: 5
Apr 13 08:00:04 phobos kernel: [5814021.301124] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 08:00:05 phobos kernel: [5814021.653520] nfsd: non-standard errno: 5
Apr 13 08:00:05 phobos last message repeated 25 times
Apr 13 08:00:05 phobos kernel: [5814021.653521] nfsd: non-standard errno: 5
Apr 13 08:00:05 phobos last message repeated 27 times
[...and so on...]
Apr 13 08:00:05 phobos kernel: [5814021.680261] nfsd: non-standard errno: 5
Apr 13 08:00:07 phobos kernel: [5814024.288015] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 08:00:43 phobos kernel: [5814060.296016] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 08:01:19 phobos kernel: [5814096.296013] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 08:01:55 phobos kernel: [5814132.296014] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 08:02:31 phobos kernel: [5814168.296015] Filesystem "dm-7":
xfs_log_force: error 5 returned.
[...and so on...]
Apr 13 18:47:31 phobos kernel: [5852868.316015] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 18:47:37 phobos kernel: [5852873.772006] I/O error in filesystem
("dm-6") meta-data dev dm-6 block
0x6b9e8 ("xfs_trans_read_buf") error 5 buf count 4096
Apr 13 18:47:37 phobos last message repeated 20 times
Apr 13 18:47:37 phobos kernel: [5852873.799028] I/O error in filesystem
("dm-6") meta-data dev dm-6 block
0x6b9e8 ("xfs_trans_read_buf") error 5 buf count 4096
[...and so on...]
Apr 13 18:49:22 phobos kernel: [5852979.352288] I/O error in filesystem
("dm-6") meta-data dev dm-6 block
0x165a80 ("xfs_trans_read_buf") error 5 buf count 8192
Apr 13 18:49:22 phobos kernel: [5852979.352288] xfs_imap_to_bp:
xfs_trans_read_buf()returned an error 5 on dm-6.
Returning error.
[...and so on...]
Apr 13 19:22:12 phobos kernel: [5854948.560651] Filesystem "dm-7":
xfs_log_force: error 5 returned.
Apr 13 19:22:12 phobos kernel: [5854948.560686] xfs_force_shutdown(dm-7,0x1)
called from line 420 of file
fs/xfs/xfs_rw.c. Return address = 0xf8a48ce7
Apr 13 19:22:15 phobos kernel: [5854951.619319] Device dm-6, XFS metadata
write error block 0x63ff3d8 in dm-6
Then I rebooted...
Apr 13 19:22:15 phobos kernel: Kernel logging (proc) stopped.
Apr 13 19:22:15 phobos kernel: Kernel log daemon terminating
Apr 13 19:25:07 phobos kernel: klogd 1.5.0#5, log source = /proc/kmsg started.
Apr 13 19:25:07 phobos kernel: [ 0.000000] Initializing cgroup subsys
cpuset
Apr 13 19:25:07 phobos kernel: [ 0.000000] Initializing cgroup subsys cpu
Apr 13 19:25:07 phobos kernel: [ 0.000000] Linux version 2.6.26-2-686
(Debian 2.6.26-21lenny4)
(dannf at debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25))
#1 SMP Tue Mar 9 17:35:51 UTC
2010
Apr 13 19:25:07 phobos kernel: [ 0.000000] BIOS-provided physical RAM map:
...
> > what should I do to recover my data? You suggest to remove
> > previous one. I don't get how:
> >
> > mdadm /dev/md2 --remove ?? (according to /proc/mdstat, /dev/sdf1
> > and /dev/sdc1 are now spares).
>
> Try
>
> mdadm --remove /dev/md2 /dev/sdc1
> mdadm --add /dev/md2 /dev/sdc1
> mdadm --remove /dev/md2 /dev/sdf1
> mdadm --add /dev/md2 /dev/sdf1
>
> and post all the output.
Ok. Here is the result:
phobos:/var/log# mdadm -Q --detail /dev/md2
/dev/md2:
Version : 00.90
Creation Time : Thu Aug 6 01:59:44 2009
Raid Level : raid10
Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Tue Apr 13 19:22:21 2010
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2
Layout : near=2, far=1
Chunk Size : 64K
UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
Events : 0.90612
Number Major Minor RaidDevice State
0 0 0 0 removed
1 0 0 1 removed
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 - spare /dev/sdf1
5 8 33 - spare /dev/sdc1
phobos:/var/log# mdadm --remove /dev/md2 /dev/sdc1
mdadm: hot remove failed for /dev/sdc1: No such device
phobos:/var/log#
Strange isn't it?
phobos:/var/log# ls -l /dev/sdc1
brw-rw---- 1 root floppy 8, 33 2010-04-13 23:31 /dev/sdc1
phobos:/var/log#
Well, it seems strange to me that the group is 'floppy',
but it is the same for all USB drives. So I guess it's fine.
phobos:/var/log# ls -l /dev/sd*
brw-rw---- 1 root disk 8, 0 2010-04-13 19:24 /dev/sda
brw-rw---- 1 root disk 8, 1 2010-04-13 19:24 /dev/sda1
brw-rw---- 1 root disk 8, 2 2010-04-13 19:24 /dev/sda2
brw-rw---- 1 root disk 8, 16 2010-04-13 19:24 /dev/sdb
brw-rw---- 1 root disk 8, 17 2010-04-13 19:24 /dev/sdb1
brw-rw---- 1 root disk 8, 18 2010-04-13 19:24 /dev/sdb2
brw-rw---- 1 root disk 8, 19 2010-04-13 19:24 /dev/sdb3
brw-rw---- 1 root floppy 8, 32 2010-04-13 19:24 /dev/sdc
brw-rw---- 1 root floppy 8, 33 2010-04-13 23:31 /dev/sdc1
brw-rw---- 1 root floppy 8, 34 2010-04-13 19:24 /dev/sdc2
brw-rw---- 1 root floppy 8, 35 2010-04-13 19:24 /dev/sdc3
brw-rw---- 1 root floppy 8, 48 2010-04-13 19:24 /dev/sdd
brw-rw---- 1 root floppy 8, 49 2010-04-13 23:31 /dev/sdd1
brw-rw---- 1 root floppy 8, 64 2010-04-13 19:24 /dev/sde
brw-rw---- 1 root floppy 8, 65 2010-04-13 23:31 /dev/sde1
brw-rw---- 1 root floppy 8, 66 2010-04-13 19:24 /dev/sde2
brw-rw---- 1 root floppy 8, 67 2010-04-13 19:24 /dev/sde3
brw-rw---- 1 root floppy 8, 80 2010-04-13 19:24 /dev/sdf
brw-rw---- 1 root floppy 8, 81 2010-04-13 23:31 /dev/sdf1
brw-rw---- 1 root floppy 8, 82 2010-04-13 19:24 /dev/sdf2
phobos:/var/log#
By the way, I can access to that drive:
phobos:/var/log# strings /dev/sdc1|head
LABELONE
LVM2 001oQcXx95Eazcja3nCnP2owexSjLk1sFiZ
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTz
LVM2 x[5A%r0N*>
rs.RW.1 {
id = "11SAPG-98i8-6zl2-3xtU-1W5d-0mmA-0N30vU"
seqno = 1
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192
max_lv = 0
phobos:/var/log#
So, I am lost...
Thanks for your time.
Regards.
--
Pierre Vignéras
More information about the pkg-mdadm-devel
mailing list