[Debburn-devel] Character sets in UDF
Florent Rougon
f.rougon at free.fr
Wed Jan 24 15:47:20 CET 2007
Hi,
Many thanks for your helpful answer.
Peter Samuelson <peter at p12n.org> wrote:
> The kernel knows how to interpret filenames on the UDF filesystem; what
> it doesn't know is how you would like them presented to your process
> via syscalls like readdir() and open(). Your terminal settings and
> LC_CTYPE are per-process, not system-wide. That is why the iocharset
> mount option exists: so you can tell the kernel what character set you
> are using. Not so you can tell it what character set the DVD is using:
> it already knows that.
I have the impression that this design is broken, considering Unix is
multi-user. If the iocharset option indicates the charset the kernel
will use in readdir(), open() and friends, then this cannot work if
several users on the system use different charsets in LC_CTYPE.
What I would expect:
root mounts /dev/foobar somewhere accessible to the users
User 1: has an LC_CTYPE specifying ISO 8859-1
readdir() and friends should return strings in ISO 8859-1, no?
User 2: has an LC_CTYPE specifying UTF-8
readdir() and friends should return strings in UTF-8, no?
For this to work, the kernel would have to read the LC_CTYPE value for
the calling process when doing a readdir() or an open(). But maybe the
kernel devs don't want to do that (a syscall whose behavior depends on
an environment variable), or simply don't want to mess with locales, I
don't know.
> Now this is interesting - that should not have worked. The actual
> parameter you want is "iocharset=iso8859-15".
>
> The reason it worked is that you made a typo: you said "utf-8" instead
> of "utf8". So it failed to load it, and instead loaded the default NLS
> map, which is a kernel config option (CONFIG_NLS_DEFAULT) and in your
> case is probably set to either "iso8859-1" or "iso8859-15".
Exactly. Well spotted! The dmesg output confirms that "utf-8" wasn't
recognized, and my kernel was compiled with
CONFIG_NLS_DEFAULT="iso8859-15", as you guessed.
I then tried mounting with iocharset=iso8859-15, and it does work.
> If you think 'mount' should automatically parse LC_CTYPE and pass the
> appropriate iocharset= parameter to the kernel, you should take that up
> with the util-linux people.
I don't think it's the right thing to do, again because a mount can very
well be done by root for *several* users who use different charsets...
For the multi-user scenario to work, the charset and encoding to use for
filenames should not be determined at mount time, but whenever a process
accesses the filesystem.
> Same issue with ISO-9660, in fact it's even worse with Rock Ridge: UDF
> and Joliet have a well-defined character set (and the iocharset=
> parameter), but I _believe_ Rock Ridge does not - it just stores
> filenames with no reference to character set.
Well, for Rock Ridge, it is defined on page 6 of the 1.12 draft, which
can be downloaded at:
ftp://ftp.ymi.com/pub/rockridge/rrip112.ps
... but I don't have the answer, as "it depends", and they refer to "the
portable filename character set as defined in POSIX:2.2.2.60" (grmpf).
Anyway, I'm pretty sure it's possible with genisoimage to combine Joliet
and RR ; the former for filenames, the latter for Unix permissions,
symlinks, etc. (but I don't know what happens in this case if both
extensions specify different names for the same file...).
Regards,
--
Florent
More information about the Debburn-devel
mailing list