[Pkg-texlive-maint] Re: [tex-live] Strange license of ukhyphen (fwd)
Petr Sojka
sojka at fi.muni.cz
Mon May 29 11:53:08 UTC 2006
> Werner LEMBERG wrote:
> > What about the following: Get a reliable list of UK English words
> > (probably sorted by frequency), apply the current UK patterns,
> > carefully check the results and regenerate the patterns.
> >
> good idea. curiously, my institution curates
> a 100 million word corpus of British English
> (http://www.natcorp.ox.ac.uk/), marked up
> to the word level; deriving a
> list of words from that would be a rather
> small bit of XML retrieval.
>
> If I get the list of words, does anyone
> else have the time and energy to make the
> experiment?
I am willing to do the patterns generation part.
But
-- BNC wordlist (which I have too) is full of non-English words,
proper names, ..., who will do the cleanup?
-- The most time-consuming step is checking the hyphenated
BNC wordlist by somebody knowing the ethymology
of English words -- this is the rule OUP
use in deciding on (UK) hyphenation points.
US people/publishers use quite different rules
(basically syllable-based).
Send me the cleaned UK wordlist and I'll do the bootstrap phase
(prepare the hyphenated list and list of
candidates for checking [potential exceptions]).
All the best
--ps
More information about the Pkg-texlive-maint
mailing list