[Popcon-developers] stable Packages.gz is UTF-8
Bill Allombert
Bill.Allombert at math.u-bordeaux1.fr
Mon May 5 20:03:15 UTC 2008
On Mon, May 05, 2008 at 08:55:32PM +0200, Petter Reinholdtsen wrote:
> [Bill Allombert]
> > So I would like to know the specific issue you met, and to fix it
> > properly.
>
> The issue I ran into was that popcon.debian.org was no longer being
> updated because popcon.pl crashed. I tracked it down to a problem
> with reading the Packages.gz file as UTF-8 and finding a non-UTF-8
> character. I solved it by picking the 8-bit charset that seemed to
> match the file best, ISO-8859.1. Any idea how to guess charset when
> the content is mixed?
ISO-8859-1 does not match 'best' at all. It just so happen that any file
is a valid ISO-8859-1, so you will not get an error, but a broken result
instead.
The file is (according to Debian policy) in UTF-8, so we should use
:encoding(UTF-8) instead of :encoding(ISO-8859-1), but not :utf-8, this
way non-UTF-8 characters get replaced by the � character.
Done in CVS.
Cheers,
--
Bill. <ballombe at debian.org>
Imagine a large red swirl here.
More information about the Popcon-developers
mailing list