[Shootout-list] Rule 30

John Skaller skaller@users.sourceforge.net
Fri, 20 May 2005 12:08:20 +1000


On Thu, 2005-05-19 at 13:32 -0700, Brent Fulgham wrote:

> Can anyone suggest a regular expression problem
> that involves:
> 
> 1)  A large input file so that we don't have to
>     have programs iterate over the same data multiple
>     times.
> 
> 2)  Involves useful regular expression features such
>     as capture?
> 
> I think we might need a couple of tests:
> 
> 1.  Find the elements in some big string.
> 2.  Revise some input document in some fashion.
> 
> Ideas?

Some kind of generic hyphenation problem?

Take a formatted article, and reformat it to a 
different column width. There will be tables of
rules about contexts which permit hyphenation,
and some contexts where the hyphens are mandatory
and must not be removed, and other where the hyphens
are optional and can be removed.

BTW: such an automaton is called a regular transducer.

BTW: as well as speed, the result can be judged for
quality. There is no reason to require everyone produce
the same output, since you can compare the outputs
by simply removing newlines and hyphens (except for
the non-removable hyphens rule).

Just an idea -- but one inspired by real production
typesetting software. 


-- 
John Skaller, skaller at users.sf.net
PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 
Download Felix here: http://felix.sf.net