[Debburn-devel] Character sets in UDF

Peter Samuelson peter at p12n.org
Tue Jan 23 23:08:38 CET 2007


[Florent Rougon]
> # mount /dev/dvd
> # ls -l /media/dvd0
> total 998238
> -r--r--r--  1 4294967295 4294967295 1022195172 2007-01-22 22:03 Test avec un nom un peu long comportant des mots accentués, voilà.foobar
> 
>   -> This is UTF-8 interpreted as ISO-8859-15 (my LC_CTYPE)
>      (checked with iconv)

The kernel knows how to interpret filenames on the UDF filesystem; what
it doesn't know is how you would like them presented to your process
via syscalls like readdir() and open().  Your terminal settings and
LC_CTYPE are per-process, not system-wide.  That is why the iocharset
mount option exists: so you can tell the kernel what character set you
are using.  Not so you can tell it what character set the DVD is using:
it already knows that.


> # mount -o iocharset=utf-8 /dev/dvd
> # ls -l /media/dvd0
> total 998238
> -r--r--r--  1 4294967295 4294967295 1022195172 2007-01-22 22:03 Test avec un nom un peu long comportant des mots accentués, voilà.foobar
> 
>   -> This is correct.

Now this is interesting - that should not have worked.  The actual
parameter you want is "iocharset=iso8859-15".

The reason it worked is that you made a typo: you said "utf-8" instead
of "utf8".  So it failed to load it, and instead loaded the default NLS
map, which is a kernel config option (CONFIG_NLS_DEFAULT) and in your
case is probably set to either "iso8859-1" or "iso8859-15".  Since
ISO-8859-1 and ISO-8859-15 are almost identical, this works.


> I'm disappointed. If the charset and encoding for filenames were
> correctly specified in the UDF filesystem, I shouldn't need to pass
> any iocharset option to 'mount'.

If you think 'mount' should automatically parse LC_CTYPE and pass the
appropriate iocharset= parameter to the kernel, you should take that up
with the util-linux people.

> Since there is the 2 GB problem mentioned by Eduard, I'll try ISO-9660
> with Joliet and Rock Ridge extensions and see if it behaves better wrt
> charsets in file names.

Same issue with ISO-9660, in fact it's even worse with Rock Ridge: UDF
and Joliet have a well-defined character set (and the iocharset=
parameter), but I _believe_ Rock Ridge does not - it just stores
filenames with no reference to character set.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20070123/601fe1b8/attachment.pgp


More information about the Debburn-devel mailing list