[Advacs-discuss] Standards
Helmut Wollmersdorfer
helmut@wollmersdorfer.at
Thu, 26 Aug 2004 14:39:25 +0200
Oliver Elphick wrote:
> Let me stress that, so far, all theses "standards" are up for
> discussion. I've just put my own ideas up. Once we agree them, though,
> we will need to be consistent, so we need to get them right now, if we
> can.
>
> On Thu, 2004-08-26 at 11:08, Helmut Wollmersdorfer wrote:
>
>>Documentation
>>-------------
>>
>>| Where possible, documentation should be written in SGML for the
>>| Docbook dtd.
>>
>>Should we choose docbook-xml? In writing man pages I experienced
>>problems like defining a character encoding explicitely. I googled
>>around nearly one day to solve this problem. The only thing I learned
>>was, that docbook-xml should be preferred.
>>
>>I have only experience in using docbook-sgml. But if docbook-xml is well
>>supported by debian-packaging utilities, we should use xml.
> I have no experience of it either. I will go along with this if people
> agree that it is a good thing. What we will also need to know is what
> tools to use to write for it; I like to use xemacs, but there are bugs
> in the psgml code in that. Is there a different module for xml? Does
> anybody know?
I use a battery of editors, each has its (dis)advantages. OpenOffice has
support for docbook-sgml, but this is very poor (and unusable). For docs
I use kate or kwrite, as they have syntax highligthing, support for
different character encodings (and conversion), and spell check.
We will have to find it out.
>>Character Encoding
>>------------------
>>We definitely should only support UTF-8. This means that everyting is
>>either "true" (7-bit) ASCII as this is compatible with UTF-8; or it is
>>interpreted as UTF-8.
>>
>>I don't know, how well this is supported by Python or Eiffel. But I
>>assume, that they can handle it in character functions.
> Yes for both, and I think Unicode is definitely the way to go.
Perl has also since 5.8 (AFAIK). You can also specify, that the source
code itself is in UTF-8. But using text-literals in source code is
always a bad idea for maintenance.
>>Up to now all my files in CVS are in UTF-8. I can convert them back to
>>ISO-8859-1 if necessary. But this would mean that we maintain a table
>>with default encodings for each language, and develop functions and
>>scripts for proper handling of different encodings.
> I noticed that there were strange characters showing; I suppose one has
> to be in a UTF8 locale to read them properly.
Look in the configuration options of your preferred editor or viewer.
AFAIK its not necessary to have UTF-8 as your locale, only your
interface should be able to handle it. You will need it only as locale
if you want to use it on the shell - I think.
Support of UTF as locale works well now under Sarge - AFAIK. There are
only problems with mc, but a patch can be applied.
There are also problems in communication between different configured
machines if the interfaces do not support conversion. Thus restricting
the _use_ of characters to the subset of ASCII as far as possible in the
naming of files, options, parameters etc. is always a good approach.
> This doesn't really affect British and American users much, since we
> don't use accented characters,
From the linquistic definition the British alphabet has accented
characters. But in modern technical English I never have seen them.
In spell checkers you can choose accented or unaccented English to be
applied.
> but most of the rest of the world will be
> affected. Should we assume that they know enough to get the display
> right? or do we need to write extra explanations?
For normal textfiles we need the explanation. This can be in "standards"
or a comment at the first line of a text file. Formats like xml or html
support definition of encoding, and most browsers and http-servers can
deal with this.
Helmut Wollmersdorfer