Bug#298495: [Logcheck-devel] Bug#298495: logcheck-database: add nagios unreachable filter

Geoff Crompton geoff.crompton at strategicdata.com.au
Wed Mar 23 01:35:56 UTC 2005


maximilian attems wrote:
> On Wed, 09 Mar 2005, Geoff Crompton wrote:
> 
> 
>>maximilian attems wrote:
>>
>>>>=== nagios
>>>>==================================================================
>>>>--- nagios  (revision 55)
>>>>+++ nagios  (local)
>>>>@@ -10,6 +10,7 @@
>>>>^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: SERVICE NOTIFICATION: 
>>>>[._[:alnum:]-]+;[._[:alnum:]-]+;[^;]+;OK;.*$
>>>>^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: 
>>>>[._[:alnum:]-]+;DOWN;(SOFT|HARD);.*$
>>>>^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: 
>>>>[._[:alnum:]-]+;UP;(SOFT|HARD);.*$
>>>>+^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: 
>>>>[._[:alnum:]-]+;UNREACHABLE;(SOFT|HARD);.*$
>>>>^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST NOTIFICATION: 
>>>>[._[:alnum:]-]+;[._[:alnum:]-]+;DOWN;.*$
>>>>^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST NOTIFICATION: 
>>>>[._[:alnum:]-]+;[._[:alnum:]-]+;UP;.*$
>>>>^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST DOWNTIME ALERT: 
>>>>[._[:alnum:]-]+;STOPPED;.*$
>>>>
>>>
>>>could you post some of the loglines they are intended to supress.
>>>
>>>.* should only used for remote supplied strings,
>>>where we have _no_ controll on what gets supplied.
>>>
>>
>>Here are some sample loglines: (Please excuse if they are linewrapped, 
>>I've separated them out to make it clear which ones are/were full lines)
>>
>>Mar  7 16:51:50 sd01 nagios: HOST ALERT: 
>>wire-server;UNREACHABLE;HARD;10;CRITICAL - Plugin timed out after 10 seconds
>>
>>Mar  7 17:40:50 sd01 nagios: HOST ALERT: 
>>wire-server;UNREACHABLE;HARD;10;/bin/ping 202.137.92.18 -n -c 1
>>
>>Mar  7 23:54:09 sd01 nagios: HOST ALERT: 
>>philoz-server;UNREACHABLE;HARD;10;PING CRITICAL - Packet loss = 0%, RTA 
>>= 8861.88 ms
>>
>>Mar  9 02:29:39 sd01 nagios: HOST ALERT: 
>>oe-server;UNREACHABLE;HARD;10;Socket timeout after
>>10 seconds
> 
> ok, but they are all only for the UNREACHABLE case,
> so i could add those 4 rules below to logcheck cvs,
> but that wont help you match yet.
> hope we can nail more of them.
> 
> ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: [._[:alnum:]-]+;UNREACHABLE;(SOFT|HARD);[0-9]+;CRITICAL - Plugin timed out after [0-9]+ seconds$
> ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: [._[:alnum:]-]+;UNREACHABLE;(SOFT|HARD);[0-9]+;/bin/ping [.0-9]{7,15} -n -c [0-9]$
> ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: [._[:alnum:]-]+;UNREACHABLE;(SOFT|HARD);[0-9]+;PING CRITICAL - Packet loss = [0-9]%, RTA = [.0-9]+ ms$
> ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ nagios: HOST ALERT: [._[:alnum:]-]+;UNREACHABLE;(SOFT|HARD);[0-9]+;Socket timeout after [0-9]+ seconds$
> 
> please try those rules above, 
> harden your rules if possible. no '.*' please.
> also send in the related messages so we can check.
> 
> thanks for your feedback.
> maks
> 

I only sent UNREACH examples, because those are the only lines that are 
showing up in my logcheck emails from nagios, (apart from ones that I 
should see, like nagios stoping or starting). The rest are already 
matched in /etc/logcheck/ignore.d.server/nagios.
I also gave you the '.*' rule in my patch because thats what the other 
rules in /etc/logcheck/ignore.d.server/nagios were doing.
The problem with nagios filtering is that all these lines are generated 
by plugins, so there is a huge variety of potential strings. In my mind, 
once you have matched upto that point, it doesn't matter what the rest 
of it says, but only because that information gets reported in other 
places (on the nagios web interface, sent via emails or pagers if 
configured). So the fact that an admin may not see it in a logcheck 
email doesn't matter, a user of nagios is using nagios precisely so they 
get alerted about these things in the way they choose.
What are your thoughts on this?

Also, I should mention that the patch I sent you had a bug. It should read:
^\w{3} [ :0-9]{11} [-._[:alnum:]]+ nagios: HOST ALERT: 
[-._[:alnum:]]+;UNREACHABLE;(SOFT|HARD);.*$
I think I was missing cases where the host was sd-01. (Or similar hosts 
with a '-' in their name).

-- 
Geoff Crompton
Debian System Administrator
Strategic Data
+61 3 9340 9000





More information about the Logcheck-devel mailing list