[Logcheck-devel] Bug#401259: logcheck: logcheck needs to override locale for grep
Frédéric Brière
fbriere at fbriere.net
Sun Aug 23 17:55:41 UTC 2009
On Sat, Dec 02, 2006 at 01:17:28AM -0500, Chris Hanson wrote:
> The reason it doesn't match is that the "R in a circle" character is
> encoded in the log file as using the ISO 8859-1 code 0xae, but this
> isn't a valid first byte of a UTF-8 code. Consequently, the "."
> pattern doesn't match it. In fact, I don't think there's _any_ way to
> match this byte sequence in a UTF-8 locale.
I guess [eg]libc's regex functions are a bit strict about their input.
However, grep also comes with its own DFA-based functions, which are
more lax about encoding errors; they are normally skipped for multibyte
encodings, but can be forced with GREP_USE_DFA=1.
> Unfortunately I'm not sure what to do about this, because it's not
> obvious how the log-file messages relate to the locale. This message
They don't, at least not reliably. There's stuff in there, like ssh
usernames, that comes directly from nefarious people who don't give a
rat's ass about your particular selection of encoding.
> One thing that works in this case is to set "LC_ALL=C" prior to
> calling grep. But if the log files sometimes contain UTF-8 coding,
> this will mess that up
I doubt this would be a problem. Pretty much everything that is matched
explicitly in any rule (hostname, IP address, process ID) is in ASCII.
Any chunk of arbitrary data should be matched with something like .* or
[^[:space:]]+, which will work whether it was decoded or not.
Now, it's true that POSIX restricts the "C" locale to 7-bit characters,
but both grep and elibc appear to deal with binary characters just fine.
One unfortunate side-effect is that any error messages from grep will
therefore be in English, but that's probably a lesser evil.
(LC_MESSAGES cannot be left as is, since mixing different encodings is
not supported.)
--
Never trust an operating system you don't have sources for. ;-)
-- Unknown source
More information about the Logcheck-devel
mailing list