Bug#290774: fixed in subversion 1.3.1-3

Vincent Lefevre vincent at vinc17.org
Sun May 7 13:17:03 UTC 2006


On 2006-05-07 04:51:46 -0500, Peter Samuelson wrote:
> [Vincent Lefevre]
> > There are also other file formats, such as ogg files, whose meta-data
> > are encoded in UTF-8.
> 
> You can't put subversion keywords in vorbis comments anyway; it's not a
> text-based format.

You're wrong. It is possible: meta-data are UTF-8 text directly
included in the ogg file. Look with "strings" for instance.

> The reason I ask is that the user's LC_CTYPE is already used to
> determine the encoding of filenames.

This is a different matter. This one doesn't hurt very much (it will
only break if users with different locales access the working copy)
and won't break file formats. Concerning the file contents, the
encoding is fixed on the repository side, and Subversion doesn't
perform any conversion into the user's locale encoding.

> I am showing the link between that and the content of XML documents,
> and why you want the two encodings to be the same.

I didn't say I wanted these two encodings to be the same.

> You're also biased toward UTF-8 content.

Yes, but one needs to make a choice, and UTF-8 is the common one.
Otherwise, the solution is to do transliteration into US-ASCII (it
could even be better).

> Files can be in any encoding. Why do you assume that users will
> never produce XML files, or indeed random source code, in
> ISO-8859-2?

In this case they shouldn't use keywords in them, and wait for bug
2332 to be fixed. Using the locales doesn't solve this problem anyway,
since different users may use different locales.

Also, ISO-8859-2 shouldn't be used in XML for files that are meant
to be shared (unless the users agree to use it), because an XML
parser isn't required to support ISO-8859-2. UTF-8 is OK.

> > Well, in his second sentence, Julian said: "... is better than
> > mixed locales."
> 
> He's agreeing with my objection, where I ask what the point is of
> localising the language of a date string but not localising the
> encoding.

Subversion doesn't localize the encoding. Your patch doesn't fix that.

Also, one may wonder if the date should be localized at all. IMHO,
this is an error to do that globally. For instance, why would you
include a French keyword expansion in an English file? The right
solution is to improve the keyword mechanism (e.g. bug 890). Your
patch is premature.

>  Are you trying to argue that the encoding is specific to the file
> but that the human language is not?

The problem is that the file encoding is fixed and doesn't depend on
the user's locales. So, this should be the same for keyword expansion
in order to avoid mixed locales.

>  That seems pretty absurd to me. Either localise both (what I think)
> or localise neither (what Ivan Zhakov thinks).

*Currently* it's much better to localize neither. This won't break
 anything.

> > > Ivan thinks keywords should not be localised at all, which also
> > > solves the problem, but that's a lot harder to implement.
> > 
> > No, it doesn't solve the problem.
> 
> Sure it does.

No, you'll still have the problems with non-ASCII characters (remember
that they can also occur in user names).

Here's a summary of the pros and the cons of different solutions
before charset information can be stored in properties (as suggested
in bug 2332):

    * Using UTF-8 (current behavior):
      + Pros: fixed encoding; no loss; compatible with file formats
        based on UTF-8, which are common (UTF-8 is more or less the
        default encoding nowadays).
      + Cons: may be incompatible with some documents.

    * Using US-ASCII (transliteration):
      + Pros: fixed encoding; compatible with any encoding (except
        EBCDIC, but this one is not tractable) and any file format.
      + Cons: small loss for non-ASCII characters.

    * Using the encoding specified by the locales:
      + Pros: compatible with tools that don't understand encodings
        different from the one specified by the locales.
      + Cons: all the documents using keywords should have the same
        encoding; also requires every user of the repository to use
        the same locales or compatible ones (which may require root
        access to install them, or may not even be available on some
        OS's); if externals are used, the corresponding repositories
        should assume compatible encodings; not backward compatible.

-- 
Vincent Lefèvre <vincent at vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA





More information about the pkg-subversion-maintainers mailing list