Bug#658701: mdadm: should send email if mismatches are reported by a check
Michael Tokarev
mjt at tls.msk.ru
Sun Feb 5 14:22:37 UTC 2012
On 05.02.2012 16:34, Russell Coker wrote:
> Package: mdadm
> Version: 3.2.3-2
> Severity: important
>
> Feb 5 22:55:09 xev mdadm[20730]: RebuildFinished event detected on md device /dev/md0, component device mismatches found: 20608 (on raid level 1)
>
> When a check initiated by /etc/cron.d/mdadm finds an error mdadm will discover
> this and log an error such as the above with facility DAEMON. But it doesn't
> send an email.
This is the same as discussed in #599821 and #588516. I'll think about
mergeing all 3 together.
> I believe that this is a serious bug, it seems to me that one of the most
> significant conditions it can encounter that should be immediately reported to
> the sysadmin is the fact that the contents of disks are changing and breaking
> RAID consistency!
Yes that's the condition it may encouner indeed. The question is WHY - under normal
conditions there should be no such errors.
There are two points there.
First, a formal one. Were it a serious issue if such a check weren't be done at
all? I think that in this case this bugreport didt'n exist to start with.
And second, more to the point, Neil gave a very good writeup of these checks and
repairs of raid arrays, about deciding which part/component of the array is
"more right". Unfortunately I can't find it right now.
>
> For a 3-disk mirror or a RAID-6 such an error can be reliably corrected as long
> as all the other disks are fine. If you have an array with double-redundancy
> and one disk fails entirely while another returns dodgey data then you lose,
> and obviously anyone who creates a doubly-redundant array wants protection
> against that sort of thing.
>
> With a RAID-1 or RAID-5 array every mismatch is an indication of real data
> corruption and is very important.
>
> The following patch makes mdadm send email about such events.
>
> --- /tmp/Monitor.c 2012-02-05 23:28:41.873079816 +1100
> +++ ./Monitor.c 2012-02-05 23:32:03.961132380 +1100
> @@ -364,6 +364,7 @@
> (strncmp(event, "Fail", 4)==0 ||
> strncmp(event, "Test", 4)==0 ||
> strncmp(event, "Spares", 6)==0 ||
> + (strncmp(event, "RebuildFinished", 15)==0 && disc) ||
> strncmp(event, "Degrade", 7)==0)) {
> FILE *mp = popen(Sendmail, "w");
> if (mp) {
>
This might be more interesting approach than already offered in two
other mentioned patches.
/mjt
More information about the pkg-mdadm-devel
mailing list