[Debburn-devel] Character sets in UDF

Peter Samuelson peter at p12n.org
Wed Jan 24 22:15:11 CET 2007


[Florent Rougon]
> I have the impression that this design is broken, considering Unix is
> multi-user. If the iocharset option indicates the charset the kernel
> will use in readdir(), open() and friends, then this cannot work if
> several users on the system use different charsets in LC_CTYPE.

Yes, it is a limitation.  I think this is one of the reasons Linux
distributions have been trying to push users towards UTF-8 in the past
few years.  For that matter, some apps (Gtk+) seem to believe all
filenames are UTF-8 regardless of their LC_CTYPE - I suppose that's
another solution.

Win32 solves the problem by having syscalls that use UTF-16LE
unconditionally.

> the kernel would have to read the LC_CTYPE value for the calling
> process when doing a readdir() or an open().

The Linux kernel developers would never go for that.  The kernel
doesn't and shouldn't know anything about parsing LANG and LC_*
variables; that's purely a userspace concern.

What would be possible instead, noting that every system call is
actually a thin wrapper function in libc6, would be for libc6 itself to
do these translations inside the file access syscalls.  However, that
too is a hard sell, for several reasons:

 - It breaks compatibility in the form of developer expectations (and
   thus will be sure to break some apps somewhere).

 - What should readdir() do if the LC_CTYPE charset cannot represent a
   particular filename?  If it returns anything, the app will expect to
   be able to stat() whatever filename it returns.

 - In the general case, Unix filesystems do not have locale information
   in them.  The only exceptions supported by Linux are vfat, ntfs,
   iso9660 (with Joliet), udf, and jfs.  (Note all of these except udf
   came from Windows or OS/2.)  Most Unix filesystems abide by the
   philosophy that a filename is just a string of bytes that can
   include any byte except "/" or NUL.  Thus a filename's character set
   is simply whatever the app that created the file used.

> > If you think 'mount' should automatically parse LC_CTYPE and pass the
> > appropriate iocharset= parameter to the kernel, you should take that up
> > with the util-linux people.
> 
> I don't think it's the right thing to do, again because a mount can very
> well be done by root for *several* users who use different charsets...

Right - but maybe better than nothing.  In the common case, I suspect
all users on a given system _are_ using the same charset.  And
particularly with removable media, it's often mounted by a non-root
user, and only that user is really interested in it.

> Well, for Rock Ridge, it is defined on page 6 of the 1.12 draft, which
> can be downloaded at:
> 
>   ftp://ftp.ymi.com/pub/rockridge/rrip112.ps

The portable filename character set is, I think, a subset of ASCII that
excludes not only "/" and NUL but several other bytes that can be
problematic on some OSes, like ":".  But when Rock Ridge (and POSIX)
talk about a "character set", they really mean a "set of bytes"; there
is no specific implied mapping between bytes and characters.

> Anyway, I'm pretty sure it's possible with genisoimage to combine Joliet
> and RR ; the former for filenames, the latter for Unix permissions,
> symlinks, etc. (but I don't know what happens in this case if both
> extensions specify different names for the same file...).

Right, "genisoimage -r -J" is fully supported and I usually use it.
Windows OSes use only the Joliet information; most Unix OSes use only
Rock Ridge.  Linux can use either one, but if both are present, it uses
only Rock Ridge.  (And ignores the iocharset= option in that case,
since the on-disc character set is not known.)

Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20070124/13a27a1d/attachment.pgp


More information about the Debburn-devel mailing list