[Pkg-postgresql-public] Locale sanitizing [was: Re: Changing default encoding to unicode?]

Oliver Elphick olly@lfix.co.uk
Mon, 08 Nov 2004 12:24:14 +0000

On Mon, 2004-11-08 at 12:38 +0100, Martin Pitt wrote:
> I attach the current changeset here. If nobody
> objects, I will commit it tomorrow.

 #. Type: select
@@ -178,24 +188,25 @@
 msgid ""
 "Use of any locale but C will somewhat reduce the efficiency of index
access, "
 "because  sorting by national collating order is rather less efficient
than "
-"sorting by ASCII sequence."
+"sorting by ASCII sequence. But 'C' is not capable of representing any
+"characters outside the 7-bit ASCII range."
 msgstr ""
I think that is not entirely accurate.

My understanding is that locale C implies SQL_ASCII encoding.  Although
the ASCII set does not define characters above 127, SQL_ASCII does not
reject characters above 127 as your text implies; it simply accepts
whatever it is given, without interpretation.

If you feed non-ASCII data into a database encoded in SQL_ASCII, you can
get it out again provided that the locale or the client encoding you use
when retrieving it is the same as the one you were using when you
entered it.  But if you change client encoding, no translation is done
in consequence.

I suggest that last sentence should read:

"However, 'C' is not capable of interpreting characters outside the
7-bit ASCII range, and should probably not be used if any data is going
to contain such characters."
Oliver Elphick                                          olly@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
     "And whosoever liveth and believeth in me shall never 
      die. Believest thou this?"    John 11:26