Bug#717729: iceweasel: Fallback Character Encoding doesn't work

Francesco Poli invernomuto at paranoici.org
Tue Jul 15 17:17:17 UTC 2014


Control: found -1 iceweasel/30.0-2


On Wed, 24 Jul 2013 13:34:08 +0200 Carlo Stemberger wrote:

[...]
> Hi,
> I set the Fallback Character Encoding to UTF-8, but decoding doesn't
> work properly with this[1] page. No problem by using Chromium.

Hello,
I am also experiencing this issue.

Actually, the situation seems to be even worse with the version currently
in Debian testing (iceweasel/30.0-2).
I have the following locale settings:

  $ locale
  LANG=en_US.UTF-8
  LANGUAGE=
  LC_CTYPE="en_US.UTF-8"
  LC_NUMERIC="en_US.UTF-8"
  LC_TIME="en_US.UTF-8"
  LC_COLLATE="en_US.UTF-8"
  LC_MONETARY="en_US.UTF-8"
  LC_MESSAGES="en_US.UTF-8"
  LC_PAPER="en_US.UTF-8"
  LC_NAME="en_US.UTF-8"
  LC_ADDRESS="en_US.UTF-8"
  LC_TELEPHONE="en_US.UTF-8"
  LC_MEASUREMENT="en_US.UTF-8"
  LC_IDENTIFICATION="en_US.UTF-8"
  LC_ALL=

and I set Fallback Character Encoding to "Default for Current Locale"
(in Edit menu, Preferences, Content section, Advanced... dialog window),
which I understand should result in UTF-8 for my case!

Despite all this, I often see that pages which do not explicitly declare
the charset are displayed with "Western" character encoding (I see this
in the View menu, Character Encoding submenu). I would instead expect
to see them displayed with "Unicode" encoding...

One example is the web archive for Debian mailing lists, such as:
https://lists.debian.org/debian-security-tracker/2014/07/maillist.html

Another example is the following minimal HTML file:

  $ cat hello.html 
  <html>
  <head>
    <title>Hello!</title>
  </head>
  <body>
    <h1>Hello → to you!</h1>
  </body>
  </html>
  $ iceweasel -new-tab hello.html

which is incorrectly displayed with "Western" character encoding.
Manually setting "Unicode" encoding (View menu, Character Encoding
submenu) makes the arrow show up correctly.

Adding XML and DOCTYPE declarations does not seem to help:

  $ cat hello_strict.html 
  <?xml version="1.0" encoding="utf-8"?>
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  
  <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>Hello!</title>
  </head>
  <body>
    <h1>Hello → to you!</h1>
  </body>
  </html>
  $ iceweasel --new-tab hello_strict.html

again incorrectly displayed with "Western" character encoding.

Adding the Content-Type meta declaration finally makes Iceweasel
recognize the actual encoding (UTF-8):

  $ cat hello_strict_expchar.html 
  <?xml version="1.0" encoding="utf-8"?>
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  
  <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>Hello!</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  </head>
  <body>
    <h1>Hello → to you!</h1>
  </body>
  </html>
  $ iceweasel --new-tab hello_strict_expchar.html

but in this final case, I understand that the fallback mechanism is
not used at all (please correct me, if I am wrong).


Is there any progress on this bug?
Please fix it and/or forward the report upstream.

Thanks for your time!
Bye.




-- 
 http://www.inventati.org/frx/
 fsck is a four letter word...
..................................................... Francesco Poli .
 GnuPG key fpr == CA01 1147 9CD2 EFDF FB82  3925 3E1C 27E1 1F69 BFFE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-mozilla-maintainers/attachments/20140715/9cc11c0c/attachment.sig>


More information about the pkg-mozilla-maintainers mailing list