[Virtual-pkg-base-maintainers] Bug#462229: sata - sata_nv - sata link fails on heavy load

hoover at gmx.at hoover at gmx.at
Wed Jan 23 10:55:40 UTC 2008


Package: base
Severity: critical
Justification: causes serious data loss



-- System Infomation:
Dabian Release: etch
  APT prefers: stable
  APT policy: (1001, 'stable')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18 customized
Locale: LANG=de_AT.UTF-8, LC_CTYPE=de_AT.UTF-8 (charmap=UTF-8)

Motherboard: ASUS M2NPV-MX
Chipset: NFORCE-MCP51, chipset revision 161

libata version 2.00
sata_nv 0000:00:0e:0: version 2.0


I encountered two strange problems concerning my SATA-drives.

Chapter I)

One SAMSUNG SP084N PATA drive (/) [hda]
One SAMSUNG SP2004C Rev: VM10 / 05 SATA drive (payload) [sda] sata1
 -> Using LVM2 (2.02.06-4) on non / partitions

On every boot I have this message, but I think this is only
showing there is no more drive attached?!? If so it is a
little confusing ...

-----------
ata2: SATA link down (SStatus 0 SContorl 300)
ATA: abnormal status 0x7F on port 0x977
	Vendor: ATA	Model SP2004C
	Type: Direct-Access	ANSI SCSI reversion: 05
-----------


sda1 (LVM) is used by samba. As I tried to restore the data
(about 90 GB) via network (GBit) from a windows backup client
to the new debian server, 33% were copied without problems.
Then problems occurred:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0x35 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
-----------
ata1.00: soft resetting port
ata1.00: limiting speed to UDMA/66
ata1.00: configured for UDMA/66
ata1.00: sd 0:0:0:0: SCSI error: return code = 0x08000002
------------
ata1.00: end_request: I/O error, dev sda, sector 31464335
ata1.00: printk: 127 messages suppressed
ata1.00: Buffer I/O error on device dm-6, logical block 3932986
ata1.00: lost page write due to I/O error on dm-6
sata1: EH complete
------------
sata1.00 speed down requested but no transfer mode left

The transfer rate went down to 0,01 kb/secs, and the filesystem was
unrepairable destroyed.

I tried this three times (new fs etc. etc.). After the third attempt
I was able to repair the fs and I let the current data on the drive
because i thought the drive is corrupted on these certian places.
After this I was able to copy all the data, no more problems occurred
on this day, but a few days after, the same situations came out.


Because of these problemes I 'went' to


Chapter II)

One SAMSUNG SP084N PATA drive (/) [hda]
One SAMSUNG SP2004C Rev: VM10 / 05 SATA drive (payload) [sda] sata1
 -> Using LVM2 (2.02.06-4) on non / partitions
One SEAGATE ST3250410AS Rev: 3.AA /05 [sdb] sata2
One SEAGATE ST3250410AS Rev: 3.AA /05 [sdc] sata3
 -> sdb1 und sdc2 in a RAID1-Array (without LVM)


On every boot I have this message, but I think this is only
showing there is no more drive attached?!? I so it is a
little confusing ... compare with Chapter I)

-----------
ata4: SATA link down (SStatus 0 SContorl 300)
ATA: abnormal status 0x7F on port 0x967
	Vendor: ATA	Model: ST3250410AS Rev: 3.AA
	Type: Direct-Acess	ANSI SCSI revision: 05
-----------

So I copied all the formerly backuped data from hda and a
windows backup client to the new created raid1-array (90 GB).

Everything went fine, at least with sdb and sdc.
But I got these messages in the time I copied the data form
hda and network to /dev/md0 (sdb and sbc).

-----------
ata1: port is slow to respond, please be patient
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through
ata1: port is slow to respond, please be patient
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through
-----------

ATTENTION!!! sda was not involved in this 'thing' it was only
mounted (there were no open files). Therefore the were also
no data losses.

So it seems that sata_nv (or maybe the mainboard) has a problem
with one (?) of the sata ports on heavy load. Again: all
situations came out on heavy loads (copying from multiple
sources to 'one' destination).
Otherwise I could not understand why I get these errors
on sata1/sda without doing something on it?!?



Regards,
Anton Huber

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger





More information about the Virtual-pkg-base-maintainers mailing list