enrico at enricozini.org
Thu May 28 22:29:29 UTC 2009
I've just had a long IRC discussion with Arne Goetje with regards to
fonts and locales, which could kind of have kick started an effort, long
wished by Justin among many others, to replace culture::* with something
better. If we manage, we are in order for quite a celebration :)
To give us some context, let's quote from Justin B. Rye:
wbritish-huge, texlive-lang-ukenglish, iceweasel-l10n-en-gb, and so
on and so on (plus all their en-AU, en-CA, en-NZ and en-US variants)
are still untaggable. Unfortunately, the obvious candidate tagname,
culture::english, exposes the facet's confusion over what it's
trying to tag. Try telling the hosts of debconf7 that their culture
What this facet is actually useful for is distinguishing *locales*.
Not cultures, not countries, not languages; locales. So we should
just be following the established standard and using locale::en:gb,
or maybe l10n:en-gb.
(Anyone who thinks it distinguishes cultures needs to explain how
Norwegians are three times more culturally distinct than Cuba,
Chile, Spain and Mexico put together; anyone who thinks it's
countries can show us Esperantia on a map; and anyone who thinks
it's languages should explain why culture::taiwanese is being used
for Mandarin in traditional characters rather than the Taiwanese
language. No, no, it's *locales*, I tell you.)
ISO has 3 distinctions:
iso15924: script (latin, arabic, hindi, chinese...)
iso3166: territory (USA, United Kingdom, Italy...)
iso639: language (English, Italian, Spanish, Portuguese, ...)
iso15924 (script) can easily be turned into a facet:
Description: Writing script
Tags can be taken from the list in iso15924, which on my system
contains 137 entries. This can be useful for packages like fonts and
software like OCR, handwriting recognition, language teaching, input
- "iso15924" or "writing-script"? "script" is ambiguous,
"writing-script" is long, "iso15924" is obscure.
- should we create all 137 tags right away, or only add them on
request as the need arises, maybe in this case without the "at
least 7 packages" rule? I'd go for creating them on request.
If noone objects, in a week or so I can create a "iso15924" facet
with an initial selection of scripts (Arne tells me that we should have
font packages for all these scripts: Arab, Armn, Beng, Bopo, Brai, Cans,
Cyrl, Deva, Ethi, Geor, Grek, Gujr, Guru, Hang, Hani, Hans, Hant, Hebr,
Hira, Jpan, Kana, Khmr, Knda, Kore, Laoo, Latn, Mlym, Mong, Mymr, Orya,
Sinh, Syrc, Taml, Tavt, Telu, Thai, Tibt, Yiii, Zsym).
For what is not script, instead, the thing is more complicated.
For territory and languages, Justin is right, we want locales.
Locales could be encoded as tags: Justin gives the examples of
locale::en:gb or l10n:en-gb, and we can even do locale:en_GB or
locale:it_IT.UTF-8 without breaking any software I know of.
This could allow, for example, to take all uncommented lines in
/etc/locale.gen, remove the .encoding parts, turn those into tags and
look for packages that have such tags.
What other use cases could there be for a facet containing locales?
If everything can be done with tag operations that we already use for
other tags, we can get such a facet started.
Otherwise, we may want something more like rfc4647 (see
http://www.rfc-editor.org/rfc/rfc4647.txt) and in that case we are
better off with a different field in the Packages file.
My gut feeling is that tags are a way to work with locale information
that would end up being too restrictive. But I'd like to have feedback
GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enrico at enricozini.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 489 bytes
Desc: Digital signature
More information about the Debtags-devel