[Advacs-discuss] Standards

Helmut Wollmersdorfer helmut@wollmersdorfer.at
Thu, 26 Aug 2004 14:39:25 +0200


Oliver Elphick wrote:
> Let me stress that, so far, all theses "standards" are up for
> discussion.  I've just put my own ideas up.  Once we agree them, though,
> we will need to be consistent, so we need to get them right now, if we
> can.
> 
> On Thu, 2004-08-26 at 11:08, Helmut Wollmersdorfer wrote:
> 
>>Documentation
>>-------------
>>
>>| Where possible, documentation should be written in SGML for the
>>| Docbook dtd.
>>
>>Should we choose docbook-xml? In writing man pages I experienced 
>>problems like defining a character encoding explicitely. I googled 
>>around nearly one day to solve this problem. The only thing I learned 
>>was, that docbook-xml should be preferred.
>>
>>I have only experience in using docbook-sgml. But if docbook-xml is well 
>>supported by debian-packaging utilities, we should use xml.

> I have no experience of it either.  I will go along with this if people
> agree that it is a good thing.  What we will also need to know is what
> tools to use to write for it; I like to use xemacs, but there are bugs
> in the psgml code in that.  Is there a different module for xml?  Does
> anybody know?

I use a battery of editors, each has its (dis)advantages. OpenOffice has 
support for docbook-sgml, but this is very poor (and unusable). For docs 
I use kate or kwrite, as they have syntax highligthing, support for 
different character encodings (and conversion), and spell check.

We will have to find it out.

>>Character Encoding
>>------------------

>>We definitely should only support UTF-8. This means that everyting is 
>>either "true" (7-bit) ASCII as this is compatible with UTF-8; or it is 
>>interpreted as UTF-8.
>>
>>I don't know, how well this is supported by Python or Eiffel. But I 
>>assume, that they can handle it in character functions.

> Yes for both, and I think Unicode is definitely the way to go.

Perl has also since 5.8 (AFAIK). You can also specify, that the source 
code itself is in UTF-8. But using text-literals in source code is 
always a bad idea for maintenance.

>>Up to now all my files in CVS are in UTF-8. I can convert them back to 
>>ISO-8859-1 if necessary. But this would mean that we maintain a table 
>>with default encodings for each language, and develop functions and 
>>scripts for proper handling of different encodings.

> I noticed that there were strange characters showing; I suppose one has
> to be in a UTF8 locale to read them properly.

Look in the configuration options of your preferred editor or viewer. 
AFAIK its not necessary to have UTF-8 as your locale, only your 
interface should be able to handle it. You will need it only as locale 
if you want to use it on the shell - I think.

Support of UTF as locale works well now under Sarge - AFAIK. There are 
only problems with mc, but a patch can be applied.

There are also problems in communication between different configured 
machines if the interfaces do not support conversion. Thus restricting 
the _use_ of characters to the subset of ASCII as far as possible in the 
naming of files, options, parameters etc. is always a good approach.

> This doesn't really affect British and American users much, since we
> don't use accented characters, 

 From the linquistic definition the British alphabet has accented 
characters. But in modern technical English I never have seen them.
In spell checkers you can choose accented or unaccented English to be 
applied.

> but most of the rest of the world will be
> affected.  Should we assume that they know enough to get the display
> right? or do we need to write extra explanations?

For normal textfiles we need the explanation. This can be in "standards" 
or a comment at the first line of a text file. Formats like xml or html 
support definition of encoding, and most browsers and http-servers can 
deal with this.

Helmut Wollmersdorfer