[Debburn-devel] Character sets in UDF
Peter Samuelson
peter at p12n.org
Tue Jan 23 23:08:38 CET 2007
[Florent Rougon]
> # mount /dev/dvd
> # ls -l /media/dvd0
> total 998238
> -r--r--r-- 1 4294967295 4294967295 1022195172 2007-01-22 22:03 Test avec un nom un peu long comportant des mots accentués, voilà .foobar
>
> -> This is UTF-8 interpreted as ISO-8859-15 (my LC_CTYPE)
> (checked with iconv)
The kernel knows how to interpret filenames on the UDF filesystem; what
it doesn't know is how you would like them presented to your process
via syscalls like readdir() and open(). Your terminal settings and
LC_CTYPE are per-process, not system-wide. That is why the iocharset
mount option exists: so you can tell the kernel what character set you
are using. Not so you can tell it what character set the DVD is using:
it already knows that.
> # mount -o iocharset=utf-8 /dev/dvd
> # ls -l /media/dvd0
> total 998238
> -r--r--r-- 1 4294967295 4294967295 1022195172 2007-01-22 22:03 Test avec un nom un peu long comportant des mots accentués, voilà.foobar
>
> -> This is correct.
Now this is interesting - that should not have worked. The actual
parameter you want is "iocharset=iso8859-15".
The reason it worked is that you made a typo: you said "utf-8" instead
of "utf8". So it failed to load it, and instead loaded the default NLS
map, which is a kernel config option (CONFIG_NLS_DEFAULT) and in your
case is probably set to either "iso8859-1" or "iso8859-15". Since
ISO-8859-1 and ISO-8859-15 are almost identical, this works.
> I'm disappointed. If the charset and encoding for filenames were
> correctly specified in the UDF filesystem, I shouldn't need to pass
> any iocharset option to 'mount'.
If you think 'mount' should automatically parse LC_CTYPE and pass the
appropriate iocharset= parameter to the kernel, you should take that up
with the util-linux people.
> Since there is the 2 GB problem mentioned by Eduard, I'll try ISO-9660
> with Joliet and Rock Ridge extensions and see if it behaves better wrt
> charsets in file names.
Same issue with ISO-9660, in fact it's even worse with Rock Ridge: UDF
and Joliet have a well-defined character set (and the iocharset=
parameter), but I _believe_ Rock Ridge does not - it just stores
filenames with no reference to character set.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20070123/601fe1b8/attachment.pgp
More information about the Debburn-devel
mailing list