[Dict-common-dev] UTF-8 and ispell
Rafael Laboissiere
rafael at debian.org
Sat Sep 29 11:17:43 UTC 2007
* Paul Boekholt <p.boekholt at gmail.com> [2007-09-29 12:43]:
> That sounds like a problem. I guess the string is longer than 256 characters.
> >From the S-Lang manual:
> Although there is no imposed limit on the length of a string, string
> literals must be less than 256 characters in length. It is possible
> to construct strings longer than this by string concatenation, e.g.,
>
> "This is the first part of a long string"
> + " and this is the second part"
>
> Since DictionariesCommon generates S-Lang code, this limitation applies.
>
> Broadly, there are three ways to fix this:
> - catch the "String too long" error in a catch block. Tough luck for
> Bulgarian speakers.
> - Split the string up, as suggested in the manual. An example of how to do
> this can be found in the autotext.sl mode.
> - Instead of generating S-Lang code, generate some data file and provide a
> S-Lang script to parse those data. One way to do this would be to
> generate XML and parse that with the expat module. Another way would be
> to store the data in a SQLite table, and in fact the next version of
> autotext.sl may do that. Or maybe the readascii.sl library provided with
> slsh can be used for this. Note that if the string isn't sourced by
> S-Lang, you don't get the "\x{__}" substitution.
>
> I think I'd go for the second option.
I would go for a simpler solution, as I wrote previously in this thread:
just ask the aspell-bg maintainer to convert the info-aspell file to the
national character encoding. Perhaps, we should also enforce this through
the dictionaires-common Policy.
I do not think that we would need more than 256 characters in a string in
jed-ispell-dicts.sl. If that happens in the future, I would go for the
second option as you suggested.
--
Rafael
More information about the Dict-common-dev
mailing list