Bug#405919: checkarray does not report or fix mismatch_cnt issues

Tim Small tim at seoss.co.uk
Tue Sep 28 16:10:47 UTC 2010


Package: mdadm
Version: 3.1.4-1+8efb9d1
Severity: normal
Tags: patch

How about these patches?

Sorry for the earlier noise....

BTW, would you prefer that I open a new bug for this one?

Cheers,

Tim.

*** mdadm-logcheck-patch.diff
--- mdadm.orig	2010-09-28 16:45:03.000000000 +0100
+++ /etc/logcheck/ignore.d.server/mdadm	2010-09-28 16:58:25.000000000 +0100
@@ -17,7 +17,7 @@
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel:( \[ *[[:digit:]]+\.[[:digit:]]+\])? RAID([01456]|10) conf printout:$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel:( \[ *[[:digit:]]+\.[[:digit:]]+\])?[[:space:]]+---( [wrf]d:[[:digit:]]+){2,3}$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel:( \[ *[[:digit:]]+\.[[:digit:]]+\])?[[:space:]]+disk [[:digit:]]+,( wo:[[:digit:]]+,)? o:[[:digit:]]+, dev:[[:alnum:]]+$
-^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: Rebuild((Start|Finish)ed|[[:digit:]]+) event detected on md device /dev/[-_./[:alnum:]]+$
+^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: Rebuild((Start|Finish)ed|[[:digit:]]+) event detected on md device /dev/[-_./[:alnum:]]+(, component device  ?mismatches found: [[:digit:]]+)?$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: SpareActive event detected on md device /dev/[-_./[:alnum:]]+, component device /dev/[-_./[:alnum:]]+$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: (New|Degraded)Array event detected on md device /dev/[-_./[:alnum:]]+$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: DeviceDisappeared event detected on md device /dev/[-_./[:alnum:]]+$

*** /home/tim/mdadm-mismatch-fix.diff
--- /etc/cron.daily/mdadm.old	2010-09-28 15:35:15.954390947 +0100
+++ /etc/cron.daily/mdadm	2010-09-28 17:07:19.954518154 +0100
@@ -15,4 +15,59 @@
 MDADM=/sbin/mdadm
 [ -x $MDADM ] || exit 0 # package may be removed but not purged
 
+PRINT_SUMMARY=0
+
+for mcnt in /sys/block/md?/md/mismatch_cnt
+do
+	if [ -f $mcnt ]
+	then
+		read cnt < $mcnt
+		read level < $( dirname $mcnt )/level
+		if [ $cnt != 0 ] && ! ( [ "$level" = "raid10" ] || [ "$level" = "raid1" ])
+		then
+			cat << WARN_TEXT
+
+Warning - $mcnt indicates that the associated RAID
+device has $cnt blocks in which the data on one array member is inconsistent
+with the data on the other array member(s).
+WARN_TEXT
+			PRINT_SUMMARY=1
+		fi
+	fi
+done
+
+exit
+
+
+if [ $PRINT_SUMMARY != 0 ]
+then
+	cat << WARN_TEXT
+
+DATA LOSS MAY HAVE OCCURRED.
+
+This condition may have been caused by one or more of the following events:
+
+. A power failure whilst the array was being written-to.
+. Data corruption by faulty hard disk drive, drive controller, cabling, RAM,
+    motherboard, PSU etc. etc.
+. A kernel bug.
+. An array being forcibly created in an inconsistent state using the 
+    "--assume-clean" argument to mdadm.
+
+This count is updated when the md subsystem carries out a 'check' or
+'repair' action.  In the case of 'repair' it reflects the number of
+mismatched blocks prior to carrying out the repair.
+
+Once you have fixed the error, carry out a 'check' action to reset the count
+to zero.
+
+Note that this check is only applied to arrays which aren't RAID1 or RAID10,
+due to a kernel limitation.  See the md (section 4) manual page, and the
+following URL for details:
+
+https://raid.wiki.kernel.org/index.php/Linux_Raid#Frequently_Asked_Questions_-_FAQ
+
+WARN_TEXT
+fi
+
 exec $MDADM --monitor --scan --oneshot





More information about the pkg-mdadm-devel mailing list