[Women-website] Language Negotiation and charsets

Thu, 31 Mar 2005 11:20:00 +0200

--Apple-Mail-4-809614549
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed

Am Donnerstag, 31.03.05 um 07:38 Uhr schrieb Thierry Reding:

> I think you misunderstood me. I do know how content negotiation works

You are sure, al members of the list know?

> The site to which I provided a link to in my previous mail handles 
> this by
> appending the language suffix to every link of a localized page. For 
> instance,
> if a page links to `foo/bar', then the link on the Spanish page would 
> become
> `foo/bar.es.html'

But what, if there is no spanish translation of that page? main debian 
website has a lot of such pages...
It would make the build process much more complicated, as it always has 
to look, if a translated page is available before writing a link and 
all the links have to be included in the build process or people 
editing the pages have to look manually, if the page is available.

As all current browsers know negotiation, we could assume, that people 
are able to read a linked documentation for setting up the browser 
language. If one wants to read another language, it is not that 
difficult to add the code in that page manually. I have done that a lot 
to review pages, when testing the changes for www.d.org.

>  so by clicking on that link from the Spanish page, it will
> take you to `foo/bar' in Spanish, and *not* use the negotiated 
> language. This
> way you can permanently stay with the language that you choose from the
> language bar.

Stay permanently with one language and not change the negitiotion 
settings seems to be a contradiction for me.

> I think that this is the way we should handle that on the Debian Women
> website. I believe it is the most intuitive way.

It would break things for people, who go to pages by accident and do 
not get their preferred language back automatically. And I think, there 
are more of those than others wanting to call pages your way.

> I wouldn't go so far as to call all those people lazy.

I used to stay to that (letting the server set the encoding) on German 
only websites a long time (we still have _very_ old pages. And so I 
would call it lazyness.

> After all some of them
> do a lot of work =) I also wouldn't say that it "is not the way it 
> should
> work".

On no, do not argue with amount of work. Some spend a lot of time on 
building web sites, which aren't accessible at all. I would not call 
that the way it should work...

> Besides, there is iconv which can convert files to pretty much any
> encoding you can think of. In my opinion, it is neither lazy nor 
> futile to
> try and only use one charset that can account for all others. In the 
> end, it
> makes life easier for all.

Having to remember to call iconv after each edit makes life easier?

> I have never encountered the same problem. It has happened to me that
> characters are not displayed correctly, but never were the contents of 
> a file
> damaged.

Sometimes it just does deny to write the changes. You are using Debian 
stable, aren't you? - Just a question referring to your never... I also 
have never encountered some problems people get, if they have testing 
installed.

>> - - perl 5.6 has bad and incomplete unicode support. To get the
>> makedictutf8.pl working I had to steel libs from perl 5.8 else there
>> was no way to get it working correctly. - Hmm... alioth is running
>> Woody, too. So there is no way to run makedictutf8.pl there without
>> making changes to the system.
>
> Alioth does have iconv. Maybe that could be included in the build 
> process of
> the dicts?

That would need a complete rewrite and testing of the dict build 
process. But the program is not restricted. So everyone may do. and 
keep track to the enhancements I or others include in the original 
source.

> I do know the problem. In my experience, the easiest (and in my 
> opinion best)
> way you can go about this is to use either UTF-8 or only US ASCII with 
> HTML
> entities. I do not think HTML entities are a viable option, though.

If US ASCII is used, you will not find HTML entities for all 
characters. But beside that really noone wants to write Thai, Punjabi, 
Cyrillic, Chinese or even German or Danish in HTML entities. I am 
fairly happy that we got rid of those silly things years ago.

>> Hope the summary of things I've found out while working on the dicts 
>> is
>> understandable. If not, feel free to ask.
>
> The way I see it, there are two possibilities: 1) we go and make 
> everything
> UTF-8 from the start or 2) we go the way of different charsets for 
> different
> languages.

The dictionary pages on debian-women website are utf-8 since more than 
a few weeks.
But the problem are the other pages, I think. And the dictionary source 
files. You may try to get people submitting dictionary entries 
convinced to deliver strict utf-8 or just find someone who has enough 
spare time to handle the additional work on updating the source files.

In general I would prefer
- making things as easy as possible for people contributing
- not prefer principles above peoples needs
- trying to find a solution that does not force people to spend time on 
additional work on things that could be solved better and without that 
much work

> I think by now it is clear that I prefer the first choice. This would 
> mainly
> involve either having a policy about only using UTF-8 when editing 
> files or
> processing every file which is not proper UTF-8 with iconv.

That should be discussed with all under a fitting subject. I would 
prefer using the existing mechanisms and just let the people decide, 
which charset they want for making translations. The conversion then 
may be done by the build scripts.

> The second approach would need the configuration on Alioth to not 
> provide
> UTF-8 as default charset and have us use a different header for 
> different
> languages.

I think WML has mechanism for doing such things

> Additionally, this would mean that files still need to be
> processed with iconv in case people working on the language are using a
> different encoding.

I someone wants to edit files in let me say chinese in utf-8 while 
chinese is set to another encoding because the first translators 
decided so. Yes then that one editing with utf-8 would need to convert 
with iconv twice (before and after editing) - same for me, if someone 
else does the main translation work on German pages and decides to have 
them in utf-8. If the HTML tags are edited only, that would work 
without conversion.

> Questions,
> comments, suggestions?

I am not sure, if all people who will be affected by that are 
subscribed here.

greetings

Jutta

-- 
http://www.witch.westfalen.de
http://witch.muensterland.org

--Apple-Mail-4-809614549
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iEYEARECAAYFAkJLwM8ACgkQOgZ5N97kHkd0iwCdEct1kx1KuAFuGMDkvIieuGsb
wuUAn3Oery3Jh1DculQtnYEEP8Fz+FdS
=NYLl
-----END PGP SIGNATURE-----

--Apple-Mail-4-809614549--