[Pkg-postgresql-public] Locale sanitizing [was: Re: Changing
default encoding to unicode?]
Oliver Elphick
olly@lfix.co.uk
Mon, 08 Nov 2004 12:24:14 +0000
On Mon, 2004-11-08 at 12:38 +0100, Martin Pitt wrote:
> I attach the current changeset here. If nobody
> objects, I will commit it tomorrow.
#. Type: select
@@ -178,24 +188,25 @@
msgid ""
"Use of any locale but C will somewhat reduce the efficiency of index
access, "
"because sorting by national collating order is rather less efficient
than "
-"sorting by ASCII sequence."
+"sorting by ASCII sequence. But 'C' is not capable of representing any
"
+"characters outside the 7-bit ASCII range."
msgstr ""
I think that is not entirely accurate.
My understanding is that locale C implies SQL_ASCII encoding. Although
the ASCII set does not define characters above 127, SQL_ASCII does not
reject characters above 127 as your text implies; it simply accepts
whatever it is given, without interpretation.
If you feed non-ASCII data into a database encoded in SQL_ASCII, you can
get it out again provided that the locale or the client encoding you use
when retrieving it is the same as the one you were using when you
entered it. But if you change client encoding, no translation is done
in consequence.
I suggest that last sentence should read:
"However, 'C' is not capable of interpreting characters outside the
7-bit ASCII range, and should probably not be used if any data is going
to contain such characters."
--
Oliver Elphick olly@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"And whosoever liveth and believeth in me shall never
die. Believest thou this?" John 11:26