[Pkg-postgresql-public] Bug#790507: pg_createcluster should default to UTF8 clusters, not SQL_ASCII
josh at postgresql.org
Mon Jun 29 22:35:19 UTC 2015
Summary: pg_createcluster defaults to creating databases in SQL_ASCII.
It should use UTF-8, C locale instead.
What currently happens:
If locale is not configured before installation of PostgreSQL (i.e.
/etc/default/locale is missing), pg_createcluster creates a new database
cluster in SQL_ASCII encoding. This includes the initial creation of
the "main" cluster on installation.
What should happen instead:
The cluster should be created in encoding UTF8 with "C" locale.
SQL_ASCII clusters should only be created if the user creates them
manually using -e SQL_ASCII. This means that if installation can't
sort out the locale issues, a data directory should not be created.
Why this is a problem:
SQL_ASCII is not a real encoding; it just stores whatever string bytes
are handed to it, including completely invalid character codes. This
means that, if a user gets a SQL_ASCII database which they don't expect,
not only can the database store garbage which will cause application
issues, but it also becomes very hard for the user to move the data to a
real encoding because manual cleanup of all strings is required.
SQL_ASCII is only still supported by PostgreSQL for legacy reasons.
To be completely clear: SQL_ASCII "encoding" in a database is a trap,
and causes data corruption and other issues for users down the line. We
should not be creating SQL_ASCII databases by default; they should only
happen if users specifically request them (using -e).
Ubuntu 12.04, 14.04 and Debian Jessie do not create /etc/default/locale
by default on headless server installs (or container-based installs) in
my testing. This results in all users defaulting to SQL_ASCII.
AFAIK, this issue exists on all versions of pg_createcluster. I tested
Ubuntu 14.04 and 12.04 and Debian Jessie. Ubuntu was tested with the
PGDG packages; Jessie, with the official Debian packages.
"initdb --encoding UTF8 --locale C" works as expected on a system with
no /etc/default/locale, because locale=C always works. pg_createcluster
-e UTF8 also does the right thing.
I've looked over the code for pg_createcluster and for
maintscripts-functions, and I can't figure out where encoding is getting
set to SQL_ASCII.
More information about the Pkg-postgresql-public