Bug#599821: mdadm: mismatch_cnt is not well checked or reported

Tim Small tim at seoss.co.uk
Mon Oct 11 16:21:48 UTC 2010


Package: mdadm
Version: 3.1.4-1+8efb9d1
Severity: normal
Tags: patch

An array check is triggered monthly, and whilst this verifies that all
disk sectors which the array occupies are readable, it does not detect
latent data corruption which may have occured, and which results in the
contents of the array being inconsistent.  The mdadm package includes a
set of logcheck rules - so that if the logcheck package is installed,
the mismatch_cnt gets emailed out (but only if the package is
installed).

Unfortunately, this is frequently not the right thing to do, because the
mismatch count is not meaningful for RAID1 and RAID10 (due to what is
arguably a kernel bug).

The patch below fixes the logcheck rules, and more vocally complains
about inconsitencies for those arrays where it makes sense to do so....



*** mdadm-logcheck-patch.diff
--- mdadm.orig	2010-09-28 16:45:03.000000000 +0100
+++ /etc/logcheck/ignore.d.server/mdadm	2010-09-28 16:58:25.000000000 +0100
@@ -17,7 +17,7 @@
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel:( \[ *[[:digit:]]+\.[[:digit:]]+\])? RAID([01456]|10) conf printout:$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel:( \[ *[[:digit:]]+\.[[:digit:]]+\])?[[:space:]]+---( [wrf]d:[[:digit:]]+){2,3}$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel:( \[ *[[:digit:]]+\.[[:digit:]]+\])?[[:space:]]+disk [[:digit:]]+,( wo:[[:digit:]]+,)? o:[[:digit:]]+, dev:[[:alnum:]]+$
-^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: Rebuild((Start|Finish)ed|[[:digit:]]+) event detected on md device /dev/[-_./[:alnum:]]+$
+^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: Rebuild((Start|Finish)ed|[[:digit:]]+) event detected on md device /dev/[-_./[:alnum:]]+(, component device  ?mismatches found: [[:digit:]]+)?$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: SpareActive event detected on md device /dev/[-_./[:alnum:]]+, component device /dev/[-_./[:alnum:]]+$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: (New|Degraded)Array event detected on md device /dev/[-_./[:alnum:]]+$
 ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mdadm(\[[[:digit:]]+\])?: DeviceDisappeared event detected on md device /dev/[-_./[:alnum:]]+$

*** /home/tim/mdadm-mismatch-fix.diff
--- /etc/cron.daily/mdadm.old	2010-09-28 15:35:15.954390947 +0100
+++ /etc/cron.daily/mdadm	2010-09-28 17:07:19.954518154 +0100
@@ -15,4 +15,59 @@
 MDADM=/sbin/mdadm
 [ -x $MDADM ] || exit 0 # package may be removed but not purged
 
+PRINT_SUMMARY=0
+
+for mcnt in /sys/block/md?/md/mismatch_cnt
+do
+	if [ -f $mcnt ]
+	then
+		read cnt < $mcnt
+		read level < $( dirname $mcnt )/level
+		if [ $cnt != 0 ] && ! ( [ "$level" = "raid10" ] || [ "$level" = "raid1" ])
+		then
+			cat << WARN_TEXT
+
+Warning - $mcnt indicates that the associated RAID
+device has $cnt blocks in which the data on one array member is inconsistent
+with the data on the other array member(s).
+WARN_TEXT
+			PRINT_SUMMARY=1
+		fi
+	fi
+done
+
+exit
+
+
+if [ $PRINT_SUMMARY != 0 ]
+then
+	cat << WARN_TEXT
+
+DATA LOSS MAY HAVE OCCURRED.
+
+This condition may have been caused by one or more of the following events:
+
+. A power failure whilst the array was being written-to.
+. Data corruption by faulty hard disk drive, drive controller, cabling, RAM,
+    motherboard, PSU etc. etc.
+. A kernel bug.
+. An array being forcibly created in an inconsistent state using the 
+    "--assume-clean" argument to mdadm.
+
+This count is updated when the md subsystem carries out a 'check' or
+'repair' action.  In the case of 'repair' it reflects the number of
+mismatched blocks prior to carrying out the repair.
+
+Once you have fixed the error, carry out a 'check' action to reset the count
+to zero.
+
+Note that this check is only applied to arrays which aren't RAID1 or RAID10,
+due to a kernel limitation.  See the md (section 4) manual page, and the
+following URL for details:
+
+https://raid.wiki.kernel.org/index.php/Linux_Raid#Frequently_Asked_Questions_-_FAQ
+
+WARN_TEXT
+fi
+
 exec $MDADM --monitor --scan --oneshot





More information about the pkg-mdadm-devel mailing list