[ucto] 01/03: New upstream version 0.9.8

Maarten van Gompel proycon-guest at moszumanska.debian.org
Thu Nov 2 20:15:47 UTC 2017


This is an automated email from the git hooks/post-receive script.

proycon-guest pushed a commit to branch master
in repository ucto.

commit 98fae82a0f2e7e7ed8335832ee9cdff8c4a5f0a7
Author: proycon <proycon at anaproy.nl>
Date:   Thu Nov 2 21:07:31 2017 +0100

    New upstream version 0.9.8
---
 ChangeLog                | 491 +++++++++++++++++++++++++++++++++++++++++
 INSTALL                  | 370 +++++++++++++++++++++++++++++++
 Makefile.am              |   2 +-
 Makefile.in              |  30 +--
 NEWS                     |  31 +++
 README                   | 113 ----------
 aclocal.m4               |   1 -
 bootstrap.sh             |   3 -
 config.guess             | 165 ++++++++------
 config.h.in              |   3 -
 config.sub               |  56 +++--
 config/Makefile.in       |  10 +-
 configure                | 403 ++++++++++++++++------------------
 configure.ac             |  63 ++----
 docs/Makefile.in         |  10 +-
 docs/ucto.1              |  31 ++-
 include/Makefile.in      |  10 +-
 include/ucto/Makefile.in |  10 +-
 include/ucto/setting.h   |   1 +
 include/ucto/textcat.h   |   2 +-
 include/ucto/tokenize.h  |  22 +-
 install-sh               |  23 +-
 ltmain.sh                |  39 ++--
 m4/Makefile.in           |  10 +-
 m4/ax_icu_check.m4       |  86 --------
 m4/libtool.m4            |  27 ++-
 m4/ltsugar.m4            |   7 +-
 m4/lt~obsolete.m4        |   7 +-
 m4/pkg.m4                | 217 +++++++-----------
 src/Makefile.am          |   5 +-
 src/Makefile.in          |  15 +-
 src/setting.cxx          |  39 +++-
 src/textcat.cxx          |   4 +-
 src/tokenize.cxx         | 560 +++++++++++++++++++++++++++++++++++------------
 src/ucto.cxx             | 268 +++++++++++++++++------
 src/unicode.cxx          |  17 +-
 tests/Makefile.in        |  10 +-
 ucto.pc.in               |   1 -
 38 files changed, 2116 insertions(+), 1046 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4cfc5fa..25afd34 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,490 @@
+2017-10-23  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testutt, tests/testutt.ok, tests/utt2.xml: added anothrer
+	utterance test.
+
+2017-10-22  Maarten van Gompel <proycon at anaproy.nl>
+
+	* src/tokenize.cxx: Attempted fix for utterance/sentence problem #37
+
+2017-10-22  Maarten van Gompel <proycon at anaproy.nl>
+
+	* src/tokenize.cxx: another related comment
+
+2017-10-22  Maarten van Gompel <proycon at anaproy.nl>
+
+	* src/tokenize.cxx: just added a comment/suggestion on detection
+	structure elements
+
+2017-10-19  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* NEWS: small folia ==> FoLiA edit
+
+2017-10-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac: bumped version after release
+
+2017-10-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* NEWS: some typos in NEWS
+
+2017-10-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* NEWS: Updated NEWS with old news from 23-01-2017
+
+2017-10-17  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* NEWS: some news
+
+2017-10-11  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx, tests/testfoliain.ok: fixed
+	textredundancy="full". Now it adds text upto the highest level.
+
+2017-10-11  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfoliain, tests/testfoliain.ok, tests/textproblem.xml: 
+	added and modified tests, after change in FoLiA parser
+
+2017-10-11  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx: added a
+	setTextRedundancy member
+
+2017-10-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/partest2_folia.nl.xml, tests/partest_folia.nl.xml,
+	tests/testfolia.ok, tests/testfolia2.ok, tests/testfoliain.ok,
+	tests/testlang.ok, tests/testutt.ok: adapted tests to changed
+	textredundancy level
+
+2017-10-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx, src/ucto.cxx: changed textredundancy default to
+	'minimal'
+
+2017-10-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfoliain.ok: adapted test to changed <br/> handling
+
+2017-10-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: for now, disable the <br/> handling. It is too
+	complicated.
+
+2017-10-02  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfolia2, tests/testfolia2.ok, tests/testfoliain.ok: fixed
+	tests
+
+2017-10-02  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx, src/ucto.cxx,
+	tests/testfolia, tests/testfoliain, tests/testfoliain.ok: 
+	implemented --textredundancy option (replaces --noredundanttext)
+
+2017-10-02  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx: removed an unused
+	function. Give a warning when attempting to set language on metadata
+	of non-native type
+
+2017-10-02  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac: re-instated --with-icu in configure.ac
+
+2017-09-28  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: added safeguards around set_metadata
+
+2017-09-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: the default is doRedundantText == true
+
+2017-09-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfoliain: adapted test to check automagically detecting
+	folia
+
+2017-09-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx: automatically switch to -F or -X when input or
+	outputfile have '.xml' extension(s)
+
+2017-09-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfolia2, tests/testfolia2.ok: modified test to also test
+	-T option
+
+2017-09-26  Maarten van Gompel <proycon at anaproy.nl>
+
+	* src/ucto.cxx: added CLST, Nijmegen to --version
+
+2017-09-26  Maarten van Gompel <proycon at anaproy.nl>
+
+	* src/ucto.cxx: Added shortcut option for --noredundanttext (-T) and
+	changed help text a bit #31
+
+2017-09-26  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfolia.ok: add updated file, missing from previous commit
+
+2017-09-26  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx, src/ucto.cxx,
+	tests/testfolia, tests/testfoliain, tests/testfoliain.ok: 
+	implemented an --noredundanttext option. and added tests
+
+2017-09-12  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac: be sure to use recent libfolia
+
+2017-09-12  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx, tests/testfoliain.ok: set textclass on <w> when
+	outputclass != inputclass
+
+2017-09-11  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac: use C++!
+
+2017-08-30  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* ucto.pc.in: removed icu requirement
+
+2017-08-30  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* : commit 5ee40601de62c8612f4660a7748151fee7ea9929 Author: Ko van
+	der Sloot <K.vanderSloot at let.ru.nl> Date:   Wed Aug 30 16:24:06 2017
+	+0200
+
+2017-08-30  Maarten van Gompel <proycon at anaproy.nl>
+
+	* docs/ucto_manual.tex: typo fix (and automatic trailing space
+	stuff)
+
+2017-08-21  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/folia9a.xml, tests/folia9b.xml, tests/testfoliain,
+	tests/testfoliain.ok: added test documents with embedded tabs,
+	newlines and multiple spaces.
+
+2017-08-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/folia8.xml: new file
+
+2017-08-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac, tests/testfoliain, tests/testfoliain.ok: added a
+	test wikt xml comment inside a <t>
+
+2017-08-17  Maarten van Gompel <proycon at anaproy.nl>
+
+	* src/tokenize.cxx, src/ucto.cxx: language fix
+
+2017-08-15  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: added some more debug lines
+
+2017-08-14  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: try to generate id's based on the parents ID or
+	there parents ID.
+
+2017-07-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: add libtar-dev too
+
+2017-07-25  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* : commit 00c3b9e94e36331b756f67110c0fc940ff83075d Author: Ko van
+	der Sloot <K.vanderSloot at let.ru.nl> Date:   Tue Jul 25 10:45:38 2017
+	+0200
+
+2017-07-20  Maarten van Gompel <proycon at anaproy.nl>
+
+	* tests/testall: use python2 explicitly
+
+2017-07-20  Maarten van Gompel <proycon at anaproy.nl>
+
+	* tests/test.py: use python 2 explicitly
+
+2017-07-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx, tests/testutt.ok: fixed utterance handling
+	(quite hacky)
+
+2017-07-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testall, tests/testutt, tests/utt.xml: added a (yet failing)
+	test
+
+2017-07-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: attempt to fix clang test on travis
+
+2017-07-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: disable filtering in XML files in more cases
+
+2017-06-28  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: attempt to fix build
+
+2017-06-28  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfoliain.ok: adaped test, now newline handling is fixed
+
+2017-06-28  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx: added code to handle
+	embedded newlines in FoLiA documents.
+
+2017-06-26  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: adapted to changed libfolis
+
+2017-06-01  Maarten van Gompel <proycon at anaproy.nl>
+
+	* : commit 2037878fff5e9bb47911c1a0c54b9c79291754fc Author: Maarten
+	van Gompel <proycon at anaproy.nl> Date:   Thu Jun 1 21:30:05 2017
+	+0200
+
+2017-05-22  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/setting.cxx, src/tokenize.cxx, src/ucto.cxx,
+	tests/testfiles2.ok, tests/testfoliain.ok, tests/testlang.ok,
+	tests/testoption2.ok, tests/testslash.ok: sorted out logging and
+	such a bit.
+
+2017-05-22  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testfoliain.ok, tests/testlang.ok, tests/testslash.ok: 
+	adaptes tests
+
+2017-05-22  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx: No longer SILENTLY set --filter=NO for FoLiA with
+	equal input ans output class
+
+2017-05-22  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx, tests/testnormalisation: added an --filter option.
+	superseeds -f (that could only switch filtering OFF)
+
+2017-05-17  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/folia1.xml, tests/testfoliain, tests/testfoliain.ok: 
+	enhanced and extended folia testing
+
+2017-05-17  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx, src/ucto.cxx, tests/testfoliain.ok: Disable
+	filtering of characters on FoLiA input with same inputclass and
+	outputclass
+
+2017-05-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/filter.xml, tests/testfoliain.ok, tests/testtext,
+	tests/testtext.ok: added a test, and adapted to changes results
+
+2017-05-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: now we adapt text on <s> and <p> to the lower
+	layers
+
+2017-05-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac: simplified configuration
+
+2017-05-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: added IRC notification
+
+2017-05-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/testlang.ok: adepted test after fix in libfolia
+
+2017-05-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* docs/ucto.1, src/ucto.cxx: update manpage. Fixed typo.
+
+2017-05-09  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* Makefile.am, configure.ac, ucto.pc.in: more configuration cleanup.
+
+2017-05-08  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* bootstrap.sh, configure.ac: modernized build system
+
+2017-05-03  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: still a leak was left. plugging...
+
+2017-05-03  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/setting.cxx, src/tokenize.cxx: fixed a memory leak
+
+2017-04-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: added some comment
+
+2017-04-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: better debug output
+
+2017-04-10  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* tests/folia7.xml, tests/testfolia, tests/testfoliain,
+	tests/testfoliain.ok: added a test
+
+2017-04-04  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: revert back to default g++
+
+2017-03-30  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: numb edits
+
+2017-03-28  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx, src/ucto.cxx,
+	tests/folia-lang-2.xml, tests/testlang: started implementing
+	language detection in FoLiA input too. Not done, nothing broke (yet)
+
+2017-03-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: fixed a problem with log token detection
+
+2017-03-14  Maarten van Gompel <proycon at anaproy.nl>
+
+	* : Merge pull request #17 from sanmai-NL/speed_up_CI_build Limit network transfers, add `ccache`
+
+2017-03-01  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: Oops. A function got lost... :{
+
+2017-02-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx: removed redundant mentioning of configfile. (is
+	empty > 90% of time)
+
+2017-02-27  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx: in case of problems in
+	tokenizeLine(), we display the offending line numner OR the FoLiA
+	element ID.
+
+2017-02-26  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: for extreme long 'words' display a part of the
+	offensive intput. Also typo corrected.
+
+2017-02-21  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/setting.cxx, src/ucto.cxx: give better information when
+	language is missing or wrong
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx: updated usage()
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* docs/ucto.1: updated ucto man page
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: another final attempt :{
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: final attempt
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: getting closer?
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: wow wat lastig
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: next try
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: another attempt
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: attempt to fix
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: modernized Travis config
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* .travis.yml: added dependency for travis
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx: Warn about use of unsupported languages. Don't use
+	'generic' by default.
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/ucto.cxx: check specified languages against the installed ones
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/setting.h, src/setting.cxx, src/ucto.cxx: use a set
+	to store resulte, not a vector
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/setting.h, src/setting.cxx, src/ucto.cxx: added a
+	function to search for installed languages
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: typo corrected
+
+2017-02-20  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx: choke on words from 2500 characters ore more
+
+2017-02-08  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/tokenize.cxx: some more repait
+	considering outputclass
+
+2017-02-08  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/tokenize.cxx, src/ucto.cxx: when using the --textclass option.
+	make sure --inputclass and --outputclass are not used.
+
+2017-02-07  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/tokenize.h, src/Makefile.am, src/tokenize.cxx: 
+	attempt to speed up some stuff
+
+2017-02-02  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* src/Makefile.am, src/tokenize.cxx: minor changes
+
+2017-01-24  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* include/ucto/textcat.h, src/Makefile.am, src/setting.cxx,
+	src/textcat.cxx, src/tokenize.cxx, src/ucto.cxx, src/unicode.cxx: 
+	some refactoring to satisfy static checkers
+
+2017-01-23  Ko van der Sloot <K.vanderSloot at let.ru.nl>
+
+	* configure.ac: bumped version after release
+
 2017-01-23  Maarten van Gompel <proycon at anaproy.nl>
 
 	* configure.ac: rely on uctodata 0.4
@@ -38,6 +525,10 @@
 	* config/Makefile.am, src/Makefile.am: unstall and look for
 	datafiles in $PREFIX/share/ucto
 
+2017-01-18  Sander Maijers <S.N.Maijers at gmail.com>
+
+	* .travis.yml: Speed up CI builds
+
 2017-01-18  Ko van der Sloot <K.vanderSloot at let.ru.nl>
 
 	* tests/test.nl.tok.V, tests/test.nl.txt: added more DATE testcases
diff --git a/INSTALL b/INSTALL
new file mode 100644
index 0000000..2099840
--- /dev/null
+++ b/INSTALL
@@ -0,0 +1,370 @@
+Installation Instructions
+*************************
+
+Copyright (C) 1994-1996, 1999-2002, 2004-2013 Free Software Foundation,
+Inc.
+
+   Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.  This file is offered as-is,
+without warranty of any kind.
+
+Basic Installation
+==================
+
+   Briefly, the shell command `./configure && make && make install'
+should configure, build, and install this package.  The following
+more-detailed instructions are generic; see the `README' file for
+instructions specific to this package.  Some packages provide this
+`INSTALL' file but do not implement all of the features documented
+below.  The lack of an optional feature in a given package is not
+necessarily a bug.  More recommendations for GNU packages can be found
+in *note Makefile Conventions: (standards)Makefile Conventions.
+
+   The `configure' shell script attempts to guess correct values for
+various system-dependent variables used during compilation.  It uses
+those values to create a `Makefile' in each directory of the package.
+It may also create one or more `.h' files containing system-dependent
+definitions.  Finally, it creates a shell script `config.status' that
+you can run in the future to recreate the current configuration, and a
+file `config.log' containing compiler output (useful mainly for
+debugging `configure').
+
+   It can also use an optional file (typically called `config.cache'
+and enabled with `--cache-file=config.cache' or simply `-C') that saves
+the results of its tests to speed up reconfiguring.  Caching is
+disabled by default to prevent problems with accidental use of stale
+cache files.
+
+   If you need to do unusual things to compile the package, please try
+to figure out how `configure' could check whether to do them, and mail
+diffs or instructions to the address given in the `README' so they can
+be considered for the next release.  If you are using the cache, and at
+some point `config.cache' contains results you don't want to keep, you
+may remove or edit it.
+
+   The file `configure.ac' (or `configure.in') is used to create
+`configure' by a program called `autoconf'.  You need `configure.ac' if
+you want to change it or regenerate `configure' using a newer version
+of `autoconf'.
+
+   The simplest way to compile this package is:
+
+  1. `cd' to the directory containing the package's source code and type
+     `./configure' to configure the package for your system.
+
+     Running `configure' might take a while.  While running, it prints
+     some messages telling which features it is checking for.
+
+  2. Type `make' to compile the package.
+
+  3. Optionally, type `make check' to run any self-tests that come with
+     the package, generally using the just-built uninstalled binaries.
+
+  4. Type `make install' to install the programs and any data files and
+     documentation.  When installing into a prefix owned by root, it is
+     recommended that the package be configured and built as a regular
+     user, and only the `make install' phase executed with root
+     privileges.
+
+  5. Optionally, type `make installcheck' to repeat any self-tests, but
+     this time using the binaries in their final installed location.
+     This target does not install anything.  Running this target as a
+     regular user, particularly if the prior `make install' required
+     root privileges, verifies that the installation completed
+     correctly.
+
+  6. You can remove the program binaries and object files from the
+     source code directory by typing `make clean'.  To also remove the
+     files that `configure' created (so you can compile the package for
+     a different kind of computer), type `make distclean'.  There is
+     also a `make maintainer-clean' target, but that is intended mainly
+     for the package's developers.  If you use it, you may have to get
+     all sorts of other programs in order to regenerate files that came
+     with the distribution.
+
+  7. Often, you can also type `make uninstall' to remove the installed
+     files again.  In practice, not all packages have tested that
+     uninstallation works correctly, even though it is required by the
+     GNU Coding Standards.
+
+  8. Some packages, particularly those that use Automake, provide `make
+     distcheck', which can by used by developers to test that all other
+     targets like `make install' and `make uninstall' work correctly.
+     This target is generally not run by end users.
+
+Compilers and Options
+=====================
+
+   Some systems require unusual options for compilation or linking that
+the `configure' script does not know about.  Run `./configure --help'
+for details on some of the pertinent environment variables.
+
+   You can give `configure' initial values for configuration parameters
+by setting variables in the command line or in the environment.  Here
+is an example:
+
+     ./configure CC=c99 CFLAGS=-g LIBS=-lposix
+
+   *Note Defining Variables::, for more details.
+
+Compiling For Multiple Architectures
+====================================
+
+   You can compile the package for more than one kind of computer at the
+same time, by placing the object files for each architecture in their
+own directory.  To do this, you can use GNU `make'.  `cd' to the
+directory where you want the object files and executables to go and run
+the `configure' script.  `configure' automatically checks for the
+source code in the directory that `configure' is in and in `..'.  This
+is known as a "VPATH" build.
+
+   With a non-GNU `make', it is safer to compile the package for one
+architecture at a time in the source code directory.  After you have
+installed the package for one architecture, use `make distclean' before
+reconfiguring for another architecture.
+
+   On MacOS X 10.5 and later systems, you can create libraries and
+executables that work on multiple system types--known as "fat" or
+"universal" binaries--by specifying multiple `-arch' options to the
+compiler but only a single `-arch' option to the preprocessor.  Like
+this:
+
+     ./configure CC="gcc -arch i386 -arch x86_64 -arch ppc -arch ppc64" \
+                 CXX="g++ -arch i386 -arch x86_64 -arch ppc -arch ppc64" \
+                 CPP="gcc -E" CXXCPP="g++ -E"
+
+   This is not guaranteed to produce working output in all cases, you
+may have to build one architecture at a time and combine the results
+using the `lipo' tool if you have problems.
+
+Installation Names
+==================
+
+   By default, `make install' installs the package's commands under
+`/usr/local/bin', include files under `/usr/local/include', etc.  You
+can specify an installation prefix other than `/usr/local' by giving
+`configure' the option `--prefix=PREFIX', where PREFIX must be an
+absolute file name.
+
+   You can specify separate installation prefixes for
+architecture-specific files and architecture-independent files.  If you
+pass the option `--exec-prefix=PREFIX' to `configure', the package uses
+PREFIX as the prefix for installing programs and libraries.
+Documentation and other data files still use the regular prefix.
+
+   In addition, if you use an unusual directory layout you can give
+options like `--bindir=DIR' to specify different values for particular
+kinds of files.  Run `configure --help' for a list of the directories
+you can set and what kinds of files go in them.  In general, the
+default for these options is expressed in terms of `${prefix}', so that
+specifying just `--prefix' will affect all of the other directory
+specifications that were not explicitly provided.
+
+   The most portable way to affect installation locations is to pass the
+correct locations to `configure'; however, many packages provide one or
+both of the following shortcuts of passing variable assignments to the
+`make install' command line to change installation locations without
+having to reconfigure or recompile.
+
+   The first method involves providing an override variable for each
+affected directory.  For example, `make install
+prefix=/alternate/directory' will choose an alternate location for all
+directory configuration variables that were expressed in terms of
+`${prefix}'.  Any directories that were specified during `configure',
+but not in terms of `${prefix}', must each be overridden at install
+time for the entire installation to be relocated.  The approach of
+makefile variable overrides for each directory variable is required by
+the GNU Coding Standards, and ideally causes no recompilation.
+However, some platforms have known limitations with the semantics of
+shared libraries that end up requiring recompilation when using this
+method, particularly noticeable in packages that use GNU Libtool.
+
+   The second method involves providing the `DESTDIR' variable.  For
+example, `make install DESTDIR=/alternate/directory' will prepend
+`/alternate/directory' before all installation names.  The approach of
+`DESTDIR' overrides is not required by the GNU Coding Standards, and
+does not work on platforms that have drive letters.  On the other hand,
+it does better at avoiding recompilation issues, and works well even
+when some directory options were not specified in terms of `${prefix}'
+at `configure' time.
+
+Optional Features
+=================
+
+   If the package supports it, you can cause programs to be installed
+with an extra prefix or suffix on their names by giving `configure' the
+option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
+
+   Some packages pay attention to `--enable-FEATURE' options to
+`configure', where FEATURE indicates an optional part of the package.
+They may also pay attention to `--with-PACKAGE' options, where PACKAGE
+is something like `gnu-as' or `x' (for the X Window System).  The
+`README' should mention any `--enable-' and `--with-' options that the
+package recognizes.
+
+   For packages that use the X Window System, `configure' can usually
+find the X include and library files automatically, but if it doesn't,
+you can use the `configure' options `--x-includes=DIR' and
+`--x-libraries=DIR' to specify their locations.
+
+   Some packages offer the ability to configure how verbose the
+execution of `make' will be.  For these packages, running `./configure
+--enable-silent-rules' sets the default to minimal output, which can be
+overridden with `make V=1'; while running `./configure
+--disable-silent-rules' sets the default to verbose, which can be
+overridden with `make V=0'.
+
+Particular systems
+==================
+
+   On HP-UX, the default C compiler is not ANSI C compatible.  If GNU
+CC is not installed, it is recommended to use the following options in
+order to use an ANSI C compiler:
+
+     ./configure CC="cc -Ae -D_XOPEN_SOURCE=500"
+
+and if that doesn't work, install pre-built binaries of GCC for HP-UX.
+
+   HP-UX `make' updates targets which have the same time stamps as
+their prerequisites, which makes it generally unusable when shipped
+generated files such as `configure' are involved.  Use GNU `make'
+instead.
+
+   On OSF/1 a.k.a. Tru64, some versions of the default C compiler cannot
+parse its `<wchar.h>' header file.  The option `-nodtk' can be used as
+a workaround.  If GNU CC is not installed, it is therefore recommended
+to try
+
+     ./configure CC="cc"
+
+and if that doesn't work, try
+
+     ./configure CC="cc -nodtk"
+
+   On Solaris, don't put `/usr/ucb' early in your `PATH'.  This
+directory contains several dysfunctional programs; working variants of
+these programs are available in `/usr/bin'.  So, if you need `/usr/ucb'
+in your `PATH', put it _after_ `/usr/bin'.
+
+   On Haiku, software installed for all users goes in `/boot/common',
+not `/usr/local'.  It is recommended to use the following options:
+
+     ./configure --prefix=/boot/common
+
+Specifying the System Type
+==========================
+
+   There may be some features `configure' cannot figure out
+automatically, but needs to determine by the type of machine the package
+will run on.  Usually, assuming the package is built to be run on the
+_same_ architectures, `configure' can figure that out, but if it prints
+a message saying it cannot guess the machine type, give it the
+`--build=TYPE' option.  TYPE can either be a short name for the system
+type, such as `sun4', or a canonical name which has the form:
+
+     CPU-COMPANY-SYSTEM
+
+where SYSTEM can have one of these forms:
+
+     OS
+     KERNEL-OS
+
+   See the file `config.sub' for the possible values of each field.  If
+`config.sub' isn't included in this package, then this package doesn't
+need to know the machine type.
+
+   If you are _building_ compiler tools for cross-compiling, you should
+use the option `--target=TYPE' to select the type of system they will
+produce code for.
+
+   If you want to _use_ a cross compiler, that generates code for a
+platform different from the build platform, you should specify the
+"host" platform (i.e., that on which the generated programs will
+eventually be run) with `--host=TYPE'.
+
+Sharing Defaults
+================
+
+   If you want to set default values for `configure' scripts to share,
+you can create a site shell script called `config.site' that gives
+default values for variables like `CC', `cache_file', and `prefix'.
+`configure' looks for `PREFIX/share/config.site' if it exists, then
+`PREFIX/etc/config.site' if it exists.  Or, you can set the
+`CONFIG_SITE' environment variable to the location of the site script.
+A warning: not all `configure' scripts look for a site script.
+
+Defining Variables
+==================
+
+   Variables not defined in a site shell script can be set in the
+environment passed to `configure'.  However, some packages may run
+configure again during the build, and the customized values of these
+variables may be lost.  In order to avoid this problem, you should set
+them in the `configure' command line, using `VAR=value'.  For example:
+
+     ./configure CC=/usr/local2/bin/gcc
+
+causes the specified `gcc' to be used as the C compiler (unless it is
+overridden in the site shell script).
+
+Unfortunately, this technique does not work for `CONFIG_SHELL' due to
+an Autoconf limitation.  Until the limitation is lifted, you can use
+this workaround:
+
+     CONFIG_SHELL=/bin/bash ./configure CONFIG_SHELL=/bin/bash
+
+`configure' Invocation
+======================
+
+   `configure' recognizes the following options to control how it
+operates.
+
+`--help'
+`-h'
+     Print a summary of all of the options to `configure', and exit.
+
+`--help=short'
+`--help=recursive'
+     Print a summary of the options unique to this package's
+     `configure', and exit.  The `short' variant lists options used
+     only in the top level, while the `recursive' variant lists options
+     also present in any nested packages.
+
+`--version'
+`-V'
+     Print the version of Autoconf used to generate the `configure'
+     script, and exit.
+
+`--cache-file=FILE'
+     Enable the cache: use and save the results of the tests in FILE,
+     traditionally `config.cache'.  FILE defaults to `/dev/null' to
+     disable caching.
+
+`--config-cache'
+`-C'
+     Alias for `--cache-file=config.cache'.
+
+`--quiet'
+`--silent'
+`-q'
+     Do not print messages saying which checks are being made.  To
+     suppress all normal output, redirect it to `/dev/null' (any error
+     messages will still be shown).
+
+`--srcdir=DIR'
+     Look for the package's source code in directory DIR.  Usually
+     `configure' can determine that directory automatically.
+
+`--prefix=DIR'
+     Use DIR as the installation prefix.  *note Installation Names::
+     for more details, including other options available for fine-tuning
+     the installation locations.
+
+`--no-create'
+`-n'
+     Run the configure checks, but stop before creating any output
+     files.
+
+`configure' also accepts some other, not widely useful, options.  Run
+`configure --help' for more details.
diff --git a/Makefile.am b/Makefile.am
index 76d6153..72104ba 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -5,7 +5,7 @@ SUBDIRS = src include m4 config docs tests
 EXTRA_DIST = bootstrap.sh AUTHORS TODO NEWS ucto.pc.in ucto-icu.pc.in
 
 pkgconfigdir = $(libdir)/pkgconfig
-pkgconfig_DATA = ucto.pc ucto-icu.pc
+pkgconfig_DATA = ucto.pc
 
 ChangeLog: NEWS
 	git pull; git2cl > ChangeLog
diff --git a/Makefile.in b/Makefile.in
index 0fb55a4..d6652da 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -90,8 +90,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -104,7 +103,7 @@ am__CONFIG_DISTCLEAN_FILES = config.status config.cache config.log \
  configure.lineno config.status.lineno
 mkinstalldirs = $(install_sh) -d
 CONFIG_HEADER = config.h
-CONFIG_CLEAN_FILES = ucto.pc ucto-icu.pc
+CONFIG_CLEAN_FILES = ucto.pc
 CONFIG_CLEAN_VPATH_FILES =
 AM_V_P = $(am__v_P_ at AM_V@)
 am__v_P_ = $(am__v_P_ at AM_DEFAULT_V@)
@@ -193,9 +192,9 @@ CTAGS = ctags
 CSCOPE = cscope
 DIST_SUBDIRS = $(SUBDIRS)
 am__DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/config.h.in \
-	$(srcdir)/ucto-icu.pc.in $(srcdir)/ucto.pc.in AUTHORS COPYING \
-	ChangeLog NEWS README TODO compile config.guess config.sub \
-	depcomp install-sh ltmain.sh missing
+	$(srcdir)/ucto.pc.in AUTHORS COPYING ChangeLog INSTALL NEWS \
+	TODO compile config.guess config.sub depcomp install-sh \
+	ltmain.sh missing
 DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
 distdir = $(PACKAGE)-$(VERSION)
 top_distdir = $(distdir)
@@ -269,13 +268,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -366,6 +359,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
@@ -382,7 +376,7 @@ ACLOCAL_AMFLAGS = -I m4 --install
 SUBDIRS = src include m4 config docs tests
 EXTRA_DIST = bootstrap.sh AUTHORS TODO NEWS ucto.pc.in ucto-icu.pc.in
 pkgconfigdir = $(libdir)/pkgconfig
-pkgconfig_DATA = ucto.pc ucto-icu.pc
+pkgconfig_DATA = ucto.pc
 all: config.h
 	$(MAKE) $(AM_MAKEFLAGS) all-recursive
 
@@ -437,8 +431,6 @@ distclean-hdr:
 	-rm -f config.h stamp-h1
 ucto.pc: $(top_builddir)/config.status $(srcdir)/ucto.pc.in
 	cd $(top_builddir) && $(SHELL) ./config.status $@
-ucto-icu.pc: $(top_builddir)/config.status $(srcdir)/ucto-icu.pc.in
-	cd $(top_builddir) && $(SHELL) ./config.status $@
 
 mostlyclean-libtool:
 	-rm -f *.lo
@@ -641,7 +633,7 @@ distdir: $(DISTFILES)
 	  ! -type d ! -perm -444 -exec $(install_sh) -c -m a+r {} {} \; \
 	|| chmod -R a+r "$(distdir)"
 dist-gzip: distdir
-	tardir=$(distdir) && $(am__tar) | eval GZIP= gzip $(GZIP_ENV) -c >$(distdir).tar.gz
+	tardir=$(distdir) && $(am__tar) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).tar.gz
 	$(am__post_remove_distdir)
 
 dist-bzip2: distdir
@@ -667,7 +659,7 @@ dist-shar: distdir
 	@echo WARNING: "Support for shar distribution archives is" \
 	               "deprecated." >&2
 	@echo WARNING: "It will be removed altogether in Automake 2.0" >&2
-	shar $(distdir) | eval GZIP= gzip $(GZIP_ENV) -c >$(distdir).shar.gz
+	shar $(distdir) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).shar.gz
 	$(am__post_remove_distdir)
 
 dist-zip: distdir
@@ -685,7 +677,7 @@ dist dist-all:
 distcheck: dist
 	case '$(DIST_ARCHIVES)' in \
 	*.tar.gz*) \
-	  eval GZIP= gzip $(GZIP_ENV) -dc $(distdir).tar.gz | $(am__untar) ;;\
+	  GZIP=$(GZIP_ENV) gzip -dc $(distdir).tar.gz | $(am__untar) ;;\
 	*.tar.bz2*) \
 	  bzip2 -dc $(distdir).tar.bz2 | $(am__untar) ;;\
 	*.tar.lz*) \
@@ -695,7 +687,7 @@ distcheck: dist
 	*.tar.Z*) \
 	  uncompress -c $(distdir).tar.Z | $(am__untar) ;;\
 	*.shar.gz*) \
-	  eval GZIP= gzip $(GZIP_ENV) -dc $(distdir).shar.gz | unshar ;;\
+	  GZIP=$(GZIP_ENV) gzip -dc $(distdir).shar.gz | unshar ;;\
 	*.zip*) \
 	  unzip $(distdir).zip ;;\
 	esac
diff --git a/NEWS b/NEWS
index b95d3ab..e747015 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,34 @@
+0.9.8 2017-10-23
+[Ko vd Sloot]
+Bugfix release.
+ * fixed utterance handling in FoLiA input. Don't try sentence detection!
+
+0.9.7 2017-10-17
+[Ko van der Sloot]
+ * added textredundancy option, default is 'minimal'
+ * small adaptations to work with FoLiA 1.5 specs
+   - set textclass on words when outputclass != inputclass
+   - DON'T filter special characters when inputclass == outputclass
+ * -F (folia input) is automatically set for .xml files
+ * more robust against texts with embedded tabs, etc.
+ * more and better tests added
+ * better logging and error messaging
+ * improved language handling. TODO: Language detection in FoLiA
+ * bug fixes:
+   - correctly handle xml-comment inside a <t>
+   - better id generation when parent has no id
+   - better reaction on overly long 'words'
+
+0.9.6 2017-01-23
+[Maarten van Gompel]
+* Moving data files from etc/ to share/, as they are more data files than
+  configuration files that should be edited.
+* Requires uctodata >= 0.4.
+* Should solve debian packaging issues (#18)
+* Minor updates to the manual (#2)
+* Some refactoring/code cleanup, temper expectations regarding ucto's
+  date-tagging abilities (#16, thanks also to @sanmai-NL)
+
 0.9.5 2017-01-06
 [Ko van der Sloot]
 Bug fix release:
diff --git a/README b/README
deleted file mode 100644
index 1cfd7f8..0000000
--- a/README
+++ /dev/null
@@ -1,113 +0,0 @@
-[![Build Status](https://travis-ci.org/LanguageMachines/ucto.svg?branch=master)](https://travis-ci.org/LanguageMachines/ucto) [![Language Machines Badge](http://applejack.science.ru.nl/lamabadge.php/ucto)](http://applejack.science.ru.nl/languagemachines/) 
-
-================================
-Ucto - A rule-based tokeniser
-================================
-
-    Centre for Language and Speech technology, Radboud University Nijmegen
-    Induction of Linguistic Knowledge Research Group, Tilburg University
-
-Website: https://languagemachines.github.io/ucto/
-
-Ucto tokenizes text files: it separates words from punctuation, and splits
-sentences. This is one of the first tasks for almost any Natural Language
-Processing application. Ucto offers several other basic preprocessing steps
-such as changing case that you can all use to make your text suited for further
-processing such as indexing, part-of-speech tagging, or machine translation.
-
-Ucto comes with tokenisation rules for several languages (packaged separately)
-and can be easily extended to suit other languages. It has been incorporated
-for tokenizing Dutch text in Frog (https://languagemachines.github.io/frog),
-our Dutch morpho-syntactic processor.
-
-The software is intended to be used from the command-line by researchers in
-Natural Language Processing or related areas, as well as software developers.
-An [Ucto python binding](https://github.com/proycon/python-ucto) is also available
-separately.
-
-Features:
-
-- Comes with tokenization rules for English, Dutch, French, Italian, Turkish,
-  Spanish, Portuguese and Swedish; easily extendible to other languages. Rules
-  consists of regular expressions and lists. They are
-  packaged separately as [uctodata](https://github.com/LanguageMachines/uctodata).
-- Recognizes units, currencies, abbreviations, and simple dates and times like dd-mm-yyyy
-- Recognizes paired quote spans, sentences, and paragraphs.
-- Produces UTF8 encoding and NFC output normalization, optionally accepting
-  other input encodings as well.
-- Ligature normalization (can undo for isntance fi,fl as single codepoints).
-- Optional conversion to all lowercase or uppercase.
-- Supports [FoLiA XML](https://proycon.github.io/folia)
-
-Ucto was written by Maarten van Gompel and Ko van der Sloot. Work on Ucto was
-funded by NWO, the Netherlands Organisation for Scientific Research, under the
-Implicit Linguistics project, the CLARIN-NL program, and the CLARIAH project.
-
-This software is available under the GNU Public License v3 (see the file
-COPYING).
-
-------------------------------------------------------------
-Installation
-------------------------------------------------------------
-
-To install ucto, first consult whether your distribution's package manager has an up-to-date package for it.
-If not, for easy installation of ucto and all dependencies, it is included as part of our software
-distribution [LaMachine](https://proycon.github.io/LaMachine).
-
-To compile and install manually from source, provided you have all the
-dependencies installed:
-
-    $ bash bootstrap.sh
-    $ ./configure
-    $ make
-    $ sudo make install
-
-You will need current versions of the following dependencies of our software:
-
-* [ticcutils](https://github.com/LanguageMachine/ticcutils) - A shared utility library
-* [libfolia](https://github.com/LanguageMachines/libfolia)  - A library for the FoLiA format.
-* [uctodata](https://github.com/LanguageMachines/uctodata)  - Data files for ucto, packaged separately
-
-As well as the following 3rd party dependencies:
-
-* ``icu`` - A C++ library for Unicode and Globalization support. On Debian/Ubuntu systems, install the package libicu-dev.
-* ``libxml2`` - An XML library. On Debian/Ubuntu systems install the package libxml2-dev.
-* A sane build environment with a C++ compiler (e.g. gcc or clang), autotools, libtool, pkg-config
-
-------------------------------------------------------------
-Usage
-------------------------------------------------------------
-
-Tokenize an english text file to standard output, tokens will be
-space-seperated, sentences delimiter by ``<utt>``:
-
-    $ ucto -L eng yourfile.txt 
-
-The -L flag specifies the language (as a three letter iso-639-3 code), provided
-a configuration file exists for that language. The configurations are provided
-separately, for various languages, in the
-[uctodata](https://github.com/LanguageMachines/uctodata) package. Note that
-older versions of ucto used different two-letter codes, so you may need to
-update the way you invoke ucto.
-
-To output to file instead of standard output, just add another
-positional argument with the desired output filename.
-
-If you want each sentence on a separate line (i.e. newline delimited rather than delimited by
-``<utt>``), then pass the ``-n`` flag. If each sentence is already on one line
-in the input and you want to leave it at that, pass the ``-m`` flag.
-
-Tokenize plaintext to [FoLiA XML](https://proycon.github.io/folia) using the ``-X`` flag, you can specify an ID
-for the FoLiA document using the ``--id=`` flag.
-
-    $ ucto -L eng -X --id=hamlet hamlet.txt hamlet.folia.xml
-
-Note that in the FoLiA XML output, ucto encodes the class of the token (date, url, smiley, etc...) based
-on the rule that matched.
-
-For further documentation consult the [ucto
-manual](https://github.com/LanguageMachines/ucto/blob/master/docs/ucto_manual.pdf)
-for further documentation.
-
-
-
diff --git a/aclocal.m4 b/aclocal.m4
index 0ce58dc..b5923f1 100644
--- a/aclocal.m4
+++ b/aclocal.m4
@@ -1150,7 +1150,6 @@ AC_SUBST([am__tar])
 AC_SUBST([am__untar])
 ]) # _AM_PROG_TAR
 
-m4_include([m4/ax_icu_check.m4])
 m4_include([m4/ax_lib_readline.m4])
 m4_include([m4/libtool.m4])
 m4_include([m4/ltoptions.m4])
diff --git a/bootstrap.sh b/bootstrap.sh
index 8a5b8bc..de12d31 100644
--- a/bootstrap.sh
+++ b/bootstrap.sh
@@ -1,6 +1,3 @@
-# $Id$
-# $URL$
-
 # bootstrap - script to bootstrap the distribution rolling engine
 
 # usage:
diff --git a/config.guess b/config.guess
index 6c32c86..2e9ad7f 100755
--- a/config.guess
+++ b/config.guess
@@ -1,8 +1,8 @@
 #! /bin/sh
 # Attempt to guess a canonical system name.
-#   Copyright 1992-2014 Free Software Foundation, Inc.
+#   Copyright 1992-2016 Free Software Foundation, Inc.
 
-timestamp='2014-11-04'
+timestamp='2016-10-02'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -27,7 +27,7 @@ timestamp='2014-11-04'
 # Originally written by Per Bothner; maintained since 2000 by Ben Elliston.
 #
 # You can get the latest version of this script from:
-# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
+# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess
 #
 # Please send patches to <config-patches at gnu.org>.
 
@@ -50,7 +50,7 @@ version="\
 GNU config.guess ($timestamp)
 
 Originally written by Per Bothner.
-Copyright 1992-2014 Free Software Foundation, Inc.
+Copyright 1992-2016 Free Software Foundation, Inc.
 
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -168,19 +168,29 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	# Note: NetBSD doesn't particularly care about the vendor
 	# portion of the name.  We always set it to "unknown".
 	sysctl="sysctl -n hw.machine_arch"
-	UNAME_MACHINE_ARCH=`(/sbin/$sysctl 2>/dev/null || \
-	    /usr/sbin/$sysctl 2>/dev/null || echo unknown)`
+	UNAME_MACHINE_ARCH=`(uname -p 2>/dev/null || \
+	    /sbin/$sysctl 2>/dev/null || \
+	    /usr/sbin/$sysctl 2>/dev/null || \
+	    echo unknown)`
 	case "${UNAME_MACHINE_ARCH}" in
 	    armeb) machine=armeb-unknown ;;
 	    arm*) machine=arm-unknown ;;
 	    sh3el) machine=shl-unknown ;;
 	    sh3eb) machine=sh-unknown ;;
 	    sh5el) machine=sh5le-unknown ;;
+	    earmv*)
+		arch=`echo ${UNAME_MACHINE_ARCH} | sed -e 's,^e\(armv[0-9]\).*$,\1,'`
+		endian=`echo ${UNAME_MACHINE_ARCH} | sed -ne 's,^.*\(eb\)$,\1,p'`
+		machine=${arch}${endian}-unknown
+		;;
 	    *) machine=${UNAME_MACHINE_ARCH}-unknown ;;
 	esac
 	# The Operating System including object format, if it has switched
-	# to ELF recently, or will in the future.
+	# to ELF recently (or will in the future) and ABI.
 	case "${UNAME_MACHINE_ARCH}" in
+	    earm*)
+		os=netbsdelf
+		;;
 	    arm*|i386|m68k|ns32k|sh3*|sparc|vax)
 		eval $set_cc_for_build
 		if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \
@@ -197,6 +207,13 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 		os=netbsd
 		;;
 	esac
+	# Determine ABI tags.
+	case "${UNAME_MACHINE_ARCH}" in
+	    earm*)
+		expr='s/^earmv[0-9]/-eabi/;s/eb$//'
+		abi=`echo ${UNAME_MACHINE_ARCH} | sed -e "$expr"`
+		;;
+	esac
 	# The OS release
 	# Debian GNU/NetBSD machines have a different userland, and
 	# thus, need a distinct triplet. However, they do not need
@@ -207,13 +224,13 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 		release='-gnu'
 		;;
 	    *)
-		release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'`
+		release=`echo ${UNAME_RELEASE} | sed -e 's/[-_].*//' | cut -d. -f1,2`
 		;;
 	esac
 	# Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM:
 	# contains redundant information, the shorter form:
 	# CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used.
-	echo "${machine}-${os}${release}"
+	echo "${machine}-${os}${release}${abi}"
 	exit ;;
     *:Bitrig:*:*)
 	UNAME_MACHINE_ARCH=`arch | sed 's/Bitrig.//'`
@@ -223,6 +240,10 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	UNAME_MACHINE_ARCH=`arch | sed 's/OpenBSD.//'`
 	echo ${UNAME_MACHINE_ARCH}-unknown-openbsd${UNAME_RELEASE}
 	exit ;;
+    *:LibertyBSD:*:*)
+	UNAME_MACHINE_ARCH=`arch | sed 's/^.*BSD\.//'`
+	echo ${UNAME_MACHINE_ARCH}-unknown-libertybsd${UNAME_RELEASE}
+	exit ;;
     *:ekkoBSD:*:*)
 	echo ${UNAME_MACHINE}-unknown-ekkobsd${UNAME_RELEASE}
 	exit ;;
@@ -235,6 +256,9 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
     *:MirBSD:*:*)
 	echo ${UNAME_MACHINE}-unknown-mirbsd${UNAME_RELEASE}
 	exit ;;
+    *:Sortix:*:*)
+	echo ${UNAME_MACHINE}-unknown-sortix
+	exit ;;
     alpha:OSF1:*:*)
 	case $UNAME_RELEASE in
 	*4.0)
@@ -251,42 +275,42 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^  The alpha \(.*\) processor.*$/\1/p' | head -n 1`
 	case "$ALPHA_CPU_TYPE" in
 	    "EV4 (21064)")
-		UNAME_MACHINE="alpha" ;;
+		UNAME_MACHINE=alpha ;;
 	    "EV4.5 (21064)")
-		UNAME_MACHINE="alpha" ;;
+		UNAME_MACHINE=alpha ;;
 	    "LCA4 (21066/21068)")
-		UNAME_MACHINE="alpha" ;;
+		UNAME_MACHINE=alpha ;;
 	    "EV5 (21164)")
-		UNAME_MACHINE="alphaev5" ;;
+		UNAME_MACHINE=alphaev5 ;;
 	    "EV5.6 (21164A)")
-		UNAME_MACHINE="alphaev56" ;;
+		UNAME_MACHINE=alphaev56 ;;
 	    "EV5.6 (21164PC)")
-		UNAME_MACHINE="alphapca56" ;;
+		UNAME_MACHINE=alphapca56 ;;
 	    "EV5.7 (21164PC)")
-		UNAME_MACHINE="alphapca57" ;;
+		UNAME_MACHINE=alphapca57 ;;
 	    "EV6 (21264)")
-		UNAME_MACHINE="alphaev6" ;;
+		UNAME_MACHINE=alphaev6 ;;
 	    "EV6.7 (21264A)")
-		UNAME_MACHINE="alphaev67" ;;
+		UNAME_MACHINE=alphaev67 ;;
 	    "EV6.8CB (21264C)")
-		UNAME_MACHINE="alphaev68" ;;
+		UNAME_MACHINE=alphaev68 ;;
 	    "EV6.8AL (21264B)")
-		UNAME_MACHINE="alphaev68" ;;
+		UNAME_MACHINE=alphaev68 ;;
 	    "EV6.8CX (21264D)")
-		UNAME_MACHINE="alphaev68" ;;
+		UNAME_MACHINE=alphaev68 ;;
 	    "EV6.9A (21264/EV69A)")
-		UNAME_MACHINE="alphaev69" ;;
+		UNAME_MACHINE=alphaev69 ;;
 	    "EV7 (21364)")
-		UNAME_MACHINE="alphaev7" ;;
+		UNAME_MACHINE=alphaev7 ;;
 	    "EV7.9 (21364A)")
-		UNAME_MACHINE="alphaev79" ;;
+		UNAME_MACHINE=alphaev79 ;;
 	esac
 	# A Pn.n version is a patched version.
 	# A Vn.n version is a released version.
 	# A Tn.n version is a released field test version.
 	# A Xn.n version is an unreleased experimental baselevel.
 	# 1.2 uses "1.2" for uname -r.
-	echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
+	echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz`
 	# Reset EXIT trap before exiting to avoid spurious non-zero exit code.
 	exitcode=$?
 	trap '' 0
@@ -359,16 +383,16 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	exit ;;
     i86pc:SunOS:5.*:* | i86xen:SunOS:5.*:*)
 	eval $set_cc_for_build
-	SUN_ARCH="i386"
+	SUN_ARCH=i386
 	# If there is a compiler, see if it is configured for 64-bit objects.
 	# Note that the Sun cc does not turn __LP64__ into 1 like gcc does.
 	# This test works for both compilers.
-	if [ "$CC_FOR_BUILD" != 'no_compiler_found' ]; then
+	if [ "$CC_FOR_BUILD" != no_compiler_found ]; then
 	    if (echo '#ifdef __amd64'; echo IS_64BIT_ARCH; echo '#endif') | \
-		(CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | \
+		(CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \
 		grep IS_64BIT_ARCH >/dev/null
 	    then
-		SUN_ARCH="x86_64"
+		SUN_ARCH=x86_64
 	    fi
 	fi
 	echo ${SUN_ARCH}-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
@@ -393,7 +417,7 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
 	exit ;;
     sun*:*:4.2BSD:*)
 	UNAME_RELEASE=`(sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null`
-	test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3
+	test "x${UNAME_RELEASE}" = x && UNAME_RELEASE=3
 	case "`/bin/arch`" in
 	    sun3)
 		echo m68k-sun-sunos${UNAME_RELEASE}
@@ -618,13 +642,13 @@ EOF
 		    sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null`
 		    sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null`
 		    case "${sc_cpu_version}" in
-		      523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0
-		      528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1
+		      523) HP_ARCH=hppa1.0 ;; # CPU_PA_RISC1_0
+		      528) HP_ARCH=hppa1.1 ;; # CPU_PA_RISC1_1
 		      532)                      # CPU_PA_RISC2_0
 			case "${sc_kernel_bits}" in
-			  32) HP_ARCH="hppa2.0n" ;;
-			  64) HP_ARCH="hppa2.0w" ;;
-			  '') HP_ARCH="hppa2.0" ;;   # HP-UX 10.20
+			  32) HP_ARCH=hppa2.0n ;;
+			  64) HP_ARCH=hppa2.0w ;;
+			  '') HP_ARCH=hppa2.0 ;;   # HP-UX 10.20
 			esac ;;
 		    esac
 		fi
@@ -663,11 +687,11 @@ EOF
 		    exit (0);
 		}
 EOF
-		    (CCOPTS= $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy`
+		    (CCOPTS="" $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy`
 		    test -z "$HP_ARCH" && HP_ARCH=hppa
 		fi ;;
 	esac
-	if [ ${HP_ARCH} = "hppa2.0w" ]
+	if [ ${HP_ARCH} = hppa2.0w ]
 	then
 	    eval $set_cc_for_build
 
@@ -680,12 +704,12 @@ EOF
 	    # $ CC_FOR_BUILD="cc +DA2.0w" ./config.guess
 	    # => hppa64-hp-hpux11.23
 
-	    if echo __LP64__ | (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) |
+	    if echo __LP64__ | (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) |
 		grep -q __LP64__
 	    then
-		HP_ARCH="hppa2.0w"
+		HP_ARCH=hppa2.0w
 	    else
-		HP_ARCH="hppa64"
+		HP_ARCH=hppa64
 	    fi
 	fi
 	echo ${HP_ARCH}-hp-hpux${HPUX_REV}
@@ -790,14 +814,14 @@ EOF
 	echo craynv-cray-unicosmp${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
 	exit ;;
     F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*)
-	FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
-	FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
+	FUJITSU_PROC=`uname -m | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz`
+	FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'`
 	FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'`
 	echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
 	exit ;;
     5000:UNIX_System_V:4.*:*)
-	FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
-	FUJITSU_REL=`echo ${UNAME_RELEASE} | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/ /_/'`
+	FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'`
+	FUJITSU_REL=`echo ${UNAME_RELEASE} | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/ /_/'`
 	echo "sparc-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
 	exit ;;
     i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*)
@@ -879,7 +903,7 @@ EOF
 	exit ;;
     *:GNU/*:*:*)
 	# other systems with GNU libc and userland
-	echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC}
+	echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr "[:upper:]" "[:lower:]"``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC}
 	exit ;;
     i*86:Minix:*:*)
 	echo ${UNAME_MACHINE}-pc-minix
@@ -902,7 +926,7 @@ EOF
 	  EV68*) UNAME_MACHINE=alphaev68 ;;
 	esac
 	objdump --private-headers /bin/sh | grep -q ld.so.1
-	if test "$?" = 0 ; then LIBC="gnulibc1" ; fi
+	if test "$?" = 0 ; then LIBC=gnulibc1 ; fi
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
     arc:Linux:*:* | arceb:Linux:*:*)
@@ -933,6 +957,9 @@ EOF
     crisv32:Linux:*:*)
 	echo ${UNAME_MACHINE}-axis-linux-${LIBC}
 	exit ;;
+    e2k:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	exit ;;
     frv:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
@@ -945,6 +972,9 @@ EOF
     ia64:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
+    k1om:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	exit ;;
     m32r*:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
 	exit ;;
@@ -970,6 +1000,9 @@ EOF
 	eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep '^CPU'`
 	test x"${CPU}" != x && { echo "${CPU}-unknown-linux-${LIBC}"; exit; }
 	;;
+    mips64el:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	exit ;;
     openrisc*:Linux:*:*)
 	echo or1k-unknown-linux-${LIBC}
 	exit ;;
@@ -1002,6 +1035,9 @@ EOF
     ppcle:Linux:*:*)
 	echo powerpcle-unknown-linux-${LIBC}
 	exit ;;
+    riscv32:Linux:*:* | riscv64:Linux:*:*)
+	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	exit ;;
     s390:Linux:*:* | s390x:Linux:*:*)
 	echo ${UNAME_MACHINE}-ibm-linux-${LIBC}
 	exit ;;
@@ -1021,7 +1057,7 @@ EOF
 	echo ${UNAME_MACHINE}-dec-linux-${LIBC}
 	exit ;;
     x86_64:Linux:*:*)
-	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
+	echo ${UNAME_MACHINE}-pc-linux-${LIBC}
 	exit ;;
     xtensa*:Linux:*:*)
 	echo ${UNAME_MACHINE}-unknown-linux-${LIBC}
@@ -1100,7 +1136,7 @@ EOF
 	# uname -m prints for DJGPP always 'pc', but it prints nothing about
 	# the processor, so we play safe by assuming i586.
 	# Note: whatever this is, it MUST be the same as what config.sub
-	# prints for the "djgpp" host, or else GDB configury will decide that
+	# prints for the "djgpp" host, or else GDB configure will decide that
 	# this is a cross-build.
 	echo i586-pc-msdosdjgpp
 	exit ;;
@@ -1249,6 +1285,9 @@ EOF
     SX-8R:SUPER-UX:*:*)
 	echo sx8r-nec-superux${UNAME_RELEASE}
 	exit ;;
+    SX-ACE:SUPER-UX:*:*)
+	echo sxace-nec-superux${UNAME_RELEASE}
+	exit ;;
     Power*:Rhapsody:*:*)
 	echo powerpc-apple-rhapsody${UNAME_RELEASE}
 	exit ;;
@@ -1262,9 +1301,9 @@ EOF
 	    UNAME_PROCESSOR=powerpc
 	fi
 	if test `echo "$UNAME_RELEASE" | sed -e 's/\..*//'` -le 10 ; then
-	    if [ "$CC_FOR_BUILD" != 'no_compiler_found' ]; then
+	    if [ "$CC_FOR_BUILD" != no_compiler_found ]; then
 		if (echo '#ifdef __LP64__'; echo IS_64BIT_ARCH; echo '#endif') | \
-		    (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) | \
+		    (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \
 		    grep IS_64BIT_ARCH >/dev/null
 		then
 		    case $UNAME_PROCESSOR in
@@ -1286,7 +1325,7 @@ EOF
 	exit ;;
     *:procnto*:*:* | *:QNX:[0123456789]*:*)
 	UNAME_PROCESSOR=`uname -p`
-	if test "$UNAME_PROCESSOR" = "x86"; then
+	if test "$UNAME_PROCESSOR" = x86; then
 		UNAME_PROCESSOR=i386
 		UNAME_MACHINE=pc
 	fi
@@ -1317,7 +1356,7 @@ EOF
 	# "uname -m" is not consistent, so use $cputype instead. 386
 	# is converted to i386 for consistency with other x86
 	# operating systems.
-	if test "$cputype" = "386"; then
+	if test "$cputype" = 386; then
 	    UNAME_MACHINE=i386
 	else
 	    UNAME_MACHINE="$cputype"
@@ -1359,7 +1398,7 @@ EOF
 	echo i386-pc-xenix
 	exit ;;
     i*86:skyos:*:*)
-	echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE}` | sed -e 's/ .*$//'
+	echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE} | sed -e 's/ .*$//'`
 	exit ;;
     i*86:rdos:*:*)
 	echo ${UNAME_MACHINE}-pc-rdos
@@ -1370,23 +1409,25 @@ EOF
     x86_64:VMkernel:*:*)
 	echo ${UNAME_MACHINE}-unknown-esx
 	exit ;;
+    amd64:Isilon\ OneFS:*:*)
+	echo x86_64-unknown-onefs
+	exit ;;
 esac
 
 cat >&2 <<EOF
 $0: unable to guess system type
 
-This script, last modified $timestamp, has failed to recognize
-the operating system you are using. It is advised that you
-download the most up to date version of the config scripts from
+This script (version $timestamp), has failed to recognize the
+operating system you are using. If your script is old, overwrite
+config.guess and config.sub with the latest versions from:
 
-  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
+  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess
 and
-  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD
+  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub
 
-If the version you run ($0) is already up to date, please
-send the following data and any information you think might be
-pertinent to <config-patches at gnu.org> in order to provide the needed
-information to handle your system.
+If $0 has already been updated, send the following data and any
+information you think might be pertinent to config-patches at gnu.org to
+provide the necessary information to handle your system.
 
 config.guess timestamp = $timestamp
 
diff --git a/config.h.in b/config.h.in
index 57d94b1..31c5bb2 100644
--- a/config.h.in
+++ b/config.h.in
@@ -6,9 +6,6 @@
 /* Define to 1 if you have the <history.h> header file. */
 #undef HAVE_HISTORY_H
 
-/* we want to use ICU */
-#undef HAVE_ICU
-
 /* Define to 1 if you have the <inttypes.h> header file. */
 #undef HAVE_INTTYPES_H
 
diff --git a/config.sub b/config.sub
index 7ffe373..dd2ca93 100755
--- a/config.sub
+++ b/config.sub
@@ -1,8 +1,8 @@
 #! /bin/sh
 # Configuration validation subroutine script.
-#   Copyright 1992-2014 Free Software Foundation, Inc.
+#   Copyright 1992-2016 Free Software Foundation, Inc.
 
-timestamp='2014-12-03'
+timestamp='2016-11-04'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -33,7 +33,7 @@ timestamp='2014-12-03'
 # Otherwise, we print the canonical config type on stdout and succeed.
 
 # You can get the latest version of this script from:
-# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD
+# http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub
 
 # This file is supposed to be the same for all GNU packages
 # and recognize all the CPU types, system types and aliases
@@ -53,8 +53,7 @@ timestamp='2014-12-03'
 me=`echo "$0" | sed -e 's,.*/,,'`
 
 usage="\
-Usage: $0 [OPTION] CPU-MFR-OPSYS
-       $0 [OPTION] ALIAS
+Usage: $0 [OPTION] CPU-MFR-OPSYS or ALIAS
 
 Canonicalize a configuration name.
 
@@ -68,7 +67,7 @@ Report bugs and patches to <config-patches at gnu.org>."
 version="\
 GNU config.sub ($timestamp)
 
-Copyright 1992-2014 Free Software Foundation, Inc.
+Copyright 1992-2016 Free Software Foundation, Inc.
 
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
@@ -117,8 +116,8 @@ maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'`
 case $maybe_os in
   nto-qnx* | linux-gnu* | linux-android* | linux-dietlibc | linux-newlib* | \
   linux-musl* | linux-uclibc* | uclinux-uclibc* | uclinux-gnu* | kfreebsd*-gnu* | \
-  knetbsd*-gnu* | netbsd*-gnu* | \
-  kopensolaris*-gnu* | \
+  knetbsd*-gnu* | netbsd*-gnu* | netbsd*-eabi* | \
+  kopensolaris*-gnu* | cloudabi*-eabi* | \
   storm-chaos* | os2-emx* | rtmk-nova*)
     os=-$maybe_os
     basic_machine=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\1/'`
@@ -255,12 +254,13 @@ case $basic_machine in
 	| arc | arceb \
 	| arm | arm[bl]e | arme[lb] | armv[2-8] | armv[3-8][lb] | armv7[arm] \
 	| avr | avr32 \
+	| ba \
 	| be32 | be64 \
 	| bfin \
 	| c4x | c8051 | clipper \
 	| d10v | d30v | dlx | dsp16xx \
-	| epiphany \
-	| fido | fr30 | frv \
+	| e2k | epiphany \
+	| fido | fr30 | frv | ft32 \
 	| h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \
 	| hexagon \
 	| i370 | i860 | i960 | ia64 \
@@ -301,11 +301,12 @@ case $basic_machine in
 	| open8 | or1k | or1knd | or32 \
 	| pdp10 | pdp11 | pj | pjl \
 	| powerpc | powerpc64 | powerpc64le | powerpcle \
+	| pru \
 	| pyramid \
 	| riscv32 | riscv64 \
 	| rl78 | rx \
 	| score \
-	| sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[34]eb | sheb | shbe | shle | sh[1234]le | sh3ele \
+	| sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[234]eb | sheb | shbe | shle | sh[1234]le | sh3ele \
 	| sh64 | sh64le \
 	| sparc | sparc64 | sparc64b | sparc64v | sparc86x | sparclet | sparclite \
 	| sparcv8 | sparcv9 | sparcv9b | sparcv9v \
@@ -376,12 +377,13 @@ case $basic_machine in
 	| alphapca5[67]-* | alpha64pca5[67]-* | arc-* | arceb-* \
 	| arm-*  | armbe-* | armle-* | armeb-* | armv*-* \
 	| avr-* | avr32-* \
+	| ba-* \
 	| be32-* | be64-* \
 	| bfin-* | bs2000-* \
 	| c[123]* | c30-* | [cjt]90-* | c4x-* \
 	| c8051-* | clipper-* | craynv-* | cydra-* \
 	| d10v-* | d30v-* | dlx-* \
-	| elxsi-* \
+	| e2k-* | elxsi-* \
 	| f30[01]-* | f700-* | fido-* | fr30-* | frv-* | fx80-* \
 	| h8300-* | h8500-* \
 	| hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \
@@ -427,13 +429,15 @@ case $basic_machine in
 	| orion-* \
 	| pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \
 	| powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* \
+	| pru-* \
 	| pyramid-* \
+	| riscv32-* | riscv64-* \
 	| rl78-* | romp-* | rs6000-* | rx-* \
 	| sh-* | sh[1234]-* | sh[24]a-* | sh[24]aeb-* | sh[23]e-* | sh[34]eb-* | sheb-* | shbe-* \
 	| shle-* | sh[1234]le-* | sh3ele-* | sh64-* | sh64le-* \
 	| sparc-* | sparc64-* | sparc64b-* | sparc64v-* | sparc86x-* | sparclet-* \
 	| sparclite-* \
-	| sparcv8-* | sparcv9-* | sparcv9b-* | sparcv9v-* | sv1-* | sx?-* \
+	| sparcv8-* | sparcv9-* | sparcv9b-* | sparcv9v-* | sv1-* | sx*-* \
 	| tahoe-* \
 	| tic30-* | tic4x-* | tic54x-* | tic55x-* | tic6x-* | tic80-* \
 	| tile*-* \
@@ -518,6 +522,9 @@ case $basic_machine in
 		basic_machine=i386-pc
 		os=-aros
 		;;
+	asmjs)
+		basic_machine=asmjs-unknown
+		;;
 	aux)
 		basic_machine=m68k-apple
 		os=-aux
@@ -638,6 +645,14 @@ case $basic_machine in
 		basic_machine=m68k-bull
 		os=-sysv3
 		;;
+	e500v[12])
+		basic_machine=powerpc-unknown
+		os=$os"spe"
+		;;
+	e500v[12]-*)
+		basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'`
+		os=$os"spe"
+		;;
 	ebmon29k)
 		basic_machine=a29k-amd
 		os=-ebmon
@@ -1017,7 +1032,7 @@ case $basic_machine in
 	ppc-* | ppcbe-*)
 		basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'`
 		;;
-	ppcle | powerpclittle | ppc-le | powerpc-little)
+	ppcle | powerpclittle)
 		basic_machine=powerpcle-unknown
 		;;
 	ppcle-* | powerpclittle-*)
@@ -1027,7 +1042,7 @@ case $basic_machine in
 		;;
 	ppc64-*) basic_machine=powerpc64-`echo $basic_machine | sed 's/^[^-]*-//'`
 		;;
-	ppc64le | powerpc64little | ppc64-le | powerpc64-little)
+	ppc64le | powerpc64little)
 		basic_machine=powerpc64le-unknown
 		;;
 	ppc64le-* | powerpc64little-*)
@@ -1373,18 +1388,18 @@ case $os in
 	      | -hpux* | -unos* | -osf* | -luna* | -dgux* | -auroraux* | -solaris* \
 	      | -sym* | -kopensolaris* | -plan9* \
 	      | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \
-	      | -aos* | -aros* \
+	      | -aos* | -aros* | -cloudabi* | -sortix* \
 	      | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \
 	      | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \
 	      | -hiux* | -386bsd* | -knetbsd* | -mirbsd* | -netbsd* \
-	      | -bitrig* | -openbsd* | -solidbsd* \
+	      | -bitrig* | -openbsd* | -solidbsd* | -libertybsd* \
 	      | -ekkobsd* | -kfreebsd* | -freebsd* | -riscix* | -lynxos* \
 	      | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \
 	      | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \
 	      | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \
 	      | -chorusos* | -chorusrdb* | -cegcc* \
 	      | -cygwin* | -msys* | -pe* | -psos* | -moss* | -proelf* | -rtems* \
-	      | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \
+	      | -midipix* | -mingw32* | -mingw64* | -linux-gnu* | -linux-android* \
 	      | -linux-newlib* | -linux-musl* | -linux-uclibc* \
 	      | -uxpv* | -beos* | -mpeix* | -udk* | -moxiebox* \
 	      | -interix* | -uwin* | -mks* | -rhapsody* | -darwin* | -opened* \
@@ -1393,7 +1408,8 @@ case $os in
 	      | -os2* | -vos* | -palmos* | -uclinux* | -nucleus* \
 	      | -morphos* | -superux* | -rtmk* | -rtmk-nova* | -windiss* \
 	      | -powermax* | -dnix* | -nx6 | -nx7 | -sei* | -dragonfly* \
-	      | -skyos* | -haiku* | -rdos* | -toppers* | -drops* | -es* | -tirtos*)
+	      | -skyos* | -haiku* | -rdos* | -toppers* | -drops* | -es* \
+	      | -onefs* | -tirtos* | -phoenix* | -fuchsia*)
 	# Remember, each alternative MUST END IN *, to match a version number.
 		;;
 	-qnx*)
@@ -1525,6 +1541,8 @@ case $os in
 		;;
 	-nacl*)
 		;;
+	-ios)
+		;;
 	-none)
 		;;
 	*)
diff --git a/config/Makefile.in b/config/Makefile.in
index 788454b..b71e2eb 100644
--- a/config/Makefile.in
+++ b/config/Makefile.in
@@ -90,8 +90,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = config
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -185,13 +184,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -282,6 +275,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
diff --git a/configure b/configure
index 5876020..7920b2e 100755
--- a/configure
+++ b/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.69 for ucto 0.9.6.
+# Generated by GNU Autoconf 2.69 for ucto 0.9.8.
 #
 # Report bugs to <lamasoftware at science.ru.nl>.
 #
@@ -590,8 +590,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='ucto'
 PACKAGE_TARNAME='ucto'
-PACKAGE_VERSION='0.9.6'
-PACKAGE_STRING='ucto 0.9.6'
+PACKAGE_VERSION='0.9.8'
+PACKAGE_STRING='ucto 0.9.8'
 PACKAGE_BUGREPORT='lamasoftware at science.ru.nl'
 PACKAGE_URL=''
 
@@ -644,17 +644,11 @@ folia_LIBS
 folia_CFLAGS
 XML2_LIBS
 XML2_CFLAGS
+ICU_LIBS
+ICU_CFLAGS
 PKG_CONFIG_LIBDIR
 PKG_CONFIG_PATH
 PKG_CONFIG
-ICU_IOLIBS
-ICU_LIBS
-ICU_LIBPATH
-ICU_VERSION
-ICU_CPPSEARCHPATH
-ICU_CXXFLAGS
-ICU_CFLAGS
-ICU_CONFIG
 CXXCPP
 CPP
 LT_SYS_LIBRARY_PATH
@@ -757,6 +751,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -790,8 +785,6 @@ with_gnu_ld
 with_sysroot
 enable_libtool_lock
 with_icu
-with_folia
-with_ticcutils
 '
       ac_precious_vars='build_alias
 host_alias
@@ -810,6 +803,8 @@ CXXCPP
 PKG_CONFIG
 PKG_CONFIG_PATH
 PKG_CONFIG_LIBDIR
+ICU_CFLAGS
+ICU_LIBS
 XML2_CFLAGS
 XML2_LIBS
 folia_CFLAGS
@@ -856,6 +851,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1108,6 +1104,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
     silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+    ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+    runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
     ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1245,7 +1250,7 @@ fi
 for ac_var in	exec_prefix prefix bindir sbindir libexecdir datarootdir \
 		datadir sysconfdir sharedstatedir localstatedir includedir \
 		oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-		libdir localedir mandir
+		libdir localedir mandir runstatedir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1358,7 +1363,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures ucto 0.9.6 to adapt to many kinds of systems.
+\`configure' configures ucto 0.9.8 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1398,6 +1403,7 @@ Fine tuning of the installation directories:
   --sysconfdir=DIR        read-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIR    modifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR     modifiable single-machine data [PREFIX/var]
+  --runstatedir=DIR       modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIR            object code libraries [EPREFIX/lib]
   --includedir=DIR        C header files [PREFIX/include]
   --oldincludedir=DIR     C header files for non-gcc [/usr/include]
@@ -1428,7 +1434,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of ucto 0.9.6:";;
+     short | recursive ) echo "Configuration of ucto 0.9.8:";;
    esac
   cat <<\_ACEOF
 
@@ -1459,13 +1465,7 @@ Optional Packages:
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
   --with-sysroot[=DIR]    Search for dependent libraries within DIR (or the
                           compiler's sysroot if not specified).
-  --with-icu=DIR       use ICU installed in <DIR>
-  --with-folia=DIR       use libfolia installed in <DIR>;
-               note that you can install folia in a non-default directory with
-               ./configure --prefix=<DIR> in the folia installation directory
-  --with-ticcutils=DIR       use ticcutils installed in <DIR>;
-               note that you can install ticcutils in a non-default directory with
-               ./configure --prefix=<DIR> in the ticcutils installation directory
+  --with-icu=DIR       use icu installed in <DIR>
 
 Some influential environment variables:
   CXX         C++ compiler command
@@ -1486,6 +1486,8 @@ Some influential environment variables:
               directories to add to pkg-config's search path
   PKG_CONFIG_LIBDIR
               path overriding pkg-config's built-in search path
+  ICU_CFLAGS  C compiler flags for ICU, overriding pkg-config
+  ICU_LIBS    linker flags for ICU, overriding pkg-config
   XML2_CFLAGS C compiler flags for XML2, overriding pkg-config
   XML2_LIBS   linker flags for XML2, overriding pkg-config
   folia_CFLAGS
@@ -1566,7 +1568,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-ucto configure 0.9.6
+ucto configure 0.9.8
 generated by GNU Autoconf 2.69
 
 Copyright (C) 2012 Free Software Foundation, Inc.
@@ -2186,7 +2188,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by ucto $as_me 0.9.6, which was
+It was created by ucto $as_me 0.9.8, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   $ $0 $@
@@ -3049,7 +3051,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='ucto'
- VERSION='0.9.6'
+ VERSION='0.9.8'
 
 
 cat >>confdefs.h <<_ACEOF
@@ -3168,7 +3170,7 @@ if test -z "$CXX"; then
     CXX=$CCC
   else
     if test -n "$ac_tool_prefix"; then
-  for ac_prog in g++ c++
+  for ac_prog in c++
   do
     # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args.
 set dummy $ac_tool_prefix$ac_prog; ac_word=$2
@@ -3212,7 +3214,7 @@ fi
 fi
 if test -z "$CXX"; then
   ac_ct_CXX=$CXX
-  for ac_prog in g++ c++
+  for ac_prog in c++
 do
   # Extract the first word of "$ac_prog", so it can be a program name with args.
 set dummy $ac_prog; ac_word=$2
@@ -5897,7 +5899,7 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-netbsd*)
+netbsd* | netbsdelf*-gnu)
   if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then
     lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$'
   else
@@ -9601,6 +9603,9 @@ $as_echo_n "checking whether the $compiler linker ($LD) supports shared librarie
   openbsd* | bitrig*)
     with_gnu_ld=no
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    link_all_deplibs=no
+    ;;
   esac
 
   ld_shlibs=yes
@@ -9855,7 +9860,7 @@ _LT_EOF
       fi
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	archive_cmds='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib'
 	wlarc=
@@ -10525,6 +10530,7 @@ $as_echo "$lt_cv_irix_exported_symbol" >&6; }
 	if test yes = "$lt_cv_irix_exported_symbol"; then
           archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations $wl-exports_file $wl$export_symbols -o $lib'
 	fi
+	link_all_deplibs=no
       else
 	archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib'
 	archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -exports_file $export_symbols -o $lib'
@@ -10546,7 +10552,7 @@ $as_echo "$lt_cv_irix_exported_symbol" >&6; }
       esac
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'  # a.out
       else
@@ -11661,6 +11667,18 @@ fi
   dynamic_linker='GNU/Linux ld.so'
   ;;
 
+netbsdelf*-gnu)
+  version_type=linux
+  need_lib_prefix=no
+  need_version=no
+  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
+  soname_spec='${libname}${release}${shared_ext}$major'
+  shlibpath_var=LD_LIBRARY_PATH
+  shlibpath_overrides_runpath=no
+  hardcode_into_libs=yes
+  dynamic_linker='NetBSD ld.elf_so'
+  ;;
+
 netbsd*)
   version_type=sunos
   need_lib_prefix=no
@@ -14555,7 +14573,7 @@ lt_prog_compiler_static_CXX=
 	    ;;
 	esac
 	;;
-      netbsd*)
+      netbsd* | netbsdelf*-gnu)
 	;;
       *qnx* | *nto*)
         # QNX uses GNU C++, but need to define -shared option too, otherwise
@@ -14930,6 +14948,9 @@ $as_echo_n "checking whether the $compiler linker ($LD) supports shared librarie
       ;;
     esac
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    link_all_deplibs_CXX=no
+    ;;
   *)
     export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
     ;;
@@ -15623,6 +15644,18 @@ fi
   dynamic_linker='GNU/Linux ld.so'
   ;;
 
+netbsdelf*-gnu)
+  version_type=linux
+  need_lib_prefix=no
+  need_version=no
+  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
+  soname_spec='${libname}${release}${shared_ext}$major'
+  shlibpath_var=LD_LIBRARY_PATH
+  shlibpath_overrides_runpath=no
+  hardcode_into_libs=yes
+  dynamic_linker='NetBSD ld.elf_so'
+  ;;
+
 netbsd*)
   version_type=sunos
   need_lib_prefix=no
@@ -16281,12 +16314,6 @@ done
   fi
 
 
-# ugly hack when PKG_CONFIG_PATH isn't defined.
-# couldn't get it to work otherwise
-if test "x$PKG_CONFIG_PATH" = x; then
-    export PKG_CONFIG_PATH=""
-fi
-#AC_MSG_NOTICE( [pkg-config search path:$PKG_CONFIG_PATH dus] )
 for ac_header in libexttextcat/textcat.h
 do :
   ac_fn_cxx_check_header_mongrel "$LINENO" "libexttextcat/textcat.h" "ac_cv_header_libexttextcat_textcat_h" "$ac_includes_default"
@@ -16391,153 +16418,8 @@ $as_echo "$as_me: Unable to find textcat library. textcat support not available"
 fi
 
 
-useICU=1;
-# inspired by feh-1.3.4/configure.ac.  Tnx Tom Gilbert and feh hackers.
-
-# Check whether --with-icu was given.
-if test "${with_icu+set}" = set; then :
-  withval=$with_icu; if test "$with_icu" = "no"; then
-           useICU=0
-	else
-	   CXXFLAGS="$CXXFLAGS -I$withval/include"
-           LIBS="-L$withval/lib $LIBS"
-	fi
-fi
-
-
-if test "$useICU" = "1"; then
-
-  succeeded=no
-
-  if test -z "$ICU_CONFIG"; then
-     # Extract the first word of "icu-config", so it can be a program name with args.
-set dummy icu-config; ac_word=$2
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-$as_echo_n "checking for $ac_word... " >&6; }
-if ${ac_cv_path_ICU_CONFIG+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  case $ICU_CONFIG in
-  [\\/]* | ?:[\\/]*)
-  ac_cv_path_ICU_CONFIG="$ICU_CONFIG" # Let the user override the test with a path.
-  ;;
-  *)
-  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  test -z "$as_dir" && as_dir=.
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-    ac_cv_path_ICU_CONFIG="$as_dir/$ac_word$ac_exec_ext"
-    $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-  test -z "$ac_cv_path_ICU_CONFIG" && ac_cv_path_ICU_CONFIG="no"
-  ;;
-esac
-fi
-ICU_CONFIG=$ac_cv_path_ICU_CONFIG
-if test -n "$ICU_CONFIG"; then
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_CONFIG" >&5
-$as_echo "$ICU_CONFIG" >&6; }
-else
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
-$as_echo "no" >&6; }
-fi
-
-
-  fi
-
-  if test "$ICU_CONFIG" = "no" ; then
-     echo "*** The icu-config script could not be found. Make sure it is"
-     echo "*** in your path, and that taglib is properly installed."
-     echo "*** Or see http://www.icu-project.org/"
-  else
-	ICU_VERSION=`$ICU_CONFIG --version`
-	{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ICU >= 5.2" >&5
-$as_echo_n "checking for ICU >= 5.2... " >&6; }
-	VERSION_CHECK=`expr $ICU_VERSION \>\= 5.2`
-	if test "$VERSION_CHECK" = "1" ; then
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
-$as_echo "yes" >&6; }
-	   succeeded=yes
-
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: checking ICU_CFLAGS" >&5
-$as_echo_n "checking ICU_CFLAGS... " >&6; }
-	   ICU_CFLAGS=`$ICU_CONFIG --cflags`
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_CFLAGS" >&5
-$as_echo "$ICU_CFLAGS" >&6; }
-
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: checking ICU_CPPSEARCHPATH" >&5
-$as_echo_n "checking ICU_CPPSEARCHPATH... " >&6; }
-	   ICU_CPPSEARCHPATH=`$ICU_CONFIG --cppflags-searchpath`
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_CPPSEARCHPATH" >&5
-$as_echo "$ICU_CPPSEARCHPATH" >&6; }
-
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: checking ICU_CXXFLAGS" >&5
-$as_echo_n "checking ICU_CXXFLAGS... " >&6; }
-	   ICU_CXXFLAGS=`$ICU_CONFIG --cxxflags`
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_CXXFLAGS" >&5
-$as_echo "$ICU_CXXFLAGS" >&6; }
-
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: checking ICU_LIBS" >&5
-$as_echo_n "checking ICU_LIBS... " >&6; }
-	   ICU_LIBS=`$ICU_CONFIG --ldflags-libsonly`
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_LIBS" >&5
-$as_echo "$ICU_LIBS" >&6; }
-
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: checking ICU_LIBPATH" >&5
-$as_echo_n "checking ICU_LIBPATH... " >&6; }
-	   ICU_LIBPATH=`$ICU_CONFIG --ldflags-searchpath`
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_LIBPATH" >&5
-$as_echo "$ICU_LIBPATH" >&6; }
-
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: checking ICU_IOLIBS" >&5
-$as_echo_n "checking ICU_IOLIBS... " >&6; }
-	   ICU_IOLIBS=`$ICU_CONFIG --ldflags-icuio`
-	   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ICU_IOLIBS" >&5
-$as_echo "$ICU_IOLIBS" >&6; }
-	else
-	   ICU_CFLAGS=""
-	   ICU_CXXFLAGS=""
-	   ICU_CPPSEARCHPATH=""
-	   ICU_LIBPATH=""
-	   ICU_LIBS=""
-	   ICU_IOLIBS=""
-	   ## If we have a custom action on failure, don't print errors, but
-	   ## do set a variable so people can do so.
-
-        fi
-
-
-
-
-
-
-
-
-  fi
-
-  if test $succeeded = yes; then
-     CXXFLAGS="$CXXFLAGS $ICU_CPPSEARCHPATH"
-		LIBS="$ICU_LIBPATH $ICU_LIBS $ICU_IOLIBS $LIBS"
-  else
-     { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "\"No ICU development environment found. Please check if libicu-dev or the like is installed\"
-See \`config.log' for more details" "$LINENO" 5; }
-  fi
-
-
-$as_echo "#define HAVE_ICU 1" >>confdefs.h
-
-else
-  as_fn_error $? "\"ICU support is required\"" "$LINENO" 5
+if test $prefix = "NONE"; then
+   prefix="$ac_default_prefix"
 fi
 
 
@@ -16661,6 +16543,114 @@ $as_echo "no" >&6; }
 	fi
 fi
 
+if test "x$PKG_CONFIG_PATH" = x; then
+    export PKG_CONFIG_PATH="$prefix/lib/pkgconfig"
+else
+    export PKG_CONFIG_PATH="$prefix/lib/pkgconfig:$PKG_CONFIG_PATH"
+fi
+
+
+# Check whether --with-icu was given.
+if test "${with_icu+set}" = set; then :
+  withval=$with_icu; PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$withval/lib/pkgconfig"
+fi
+
+
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ICU" >&5
+$as_echo_n "checking for ICU... " >&6; }
+
+if test -n "$ICU_CFLAGS"; then
+    pkg_cv_ICU_CFLAGS="$ICU_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"icu-uc >= 50 icu-io \""; } >&5
+  ($PKG_CONFIG --exists --print-errors "icu-uc >= 50 icu-io ") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_ICU_CFLAGS=`$PKG_CONFIG --cflags "icu-uc >= 50 icu-io " 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+if test -n "$ICU_LIBS"; then
+    pkg_cv_ICU_LIBS="$ICU_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"icu-uc >= 50 icu-io \""; } >&5
+  ($PKG_CONFIG --exists --print-errors "icu-uc >= 50 icu-io ") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_ICU_LIBS=`$PKG_CONFIG --libs "icu-uc >= 50 icu-io " 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+   	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+        _pkg_short_errors_supported=yes
+else
+        _pkg_short_errors_supported=no
+fi
+        if test $_pkg_short_errors_supported = yes; then
+	        ICU_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "icu-uc >= 50 icu-io " 2>&1`
+        else
+	        ICU_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "icu-uc >= 50 icu-io " 2>&1`
+        fi
+	# Put the nasty error message in config.log where it belongs
+	echo "$ICU_PKG_ERRORS" >&5
+
+	as_fn_error $? "Package requirements (icu-uc >= 50 icu-io ) were not met:
+
+$ICU_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables ICU_CFLAGS
+and ICU_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+     	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+	{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old.  Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables ICU_CFLAGS
+and ICU_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+	ICU_CFLAGS=$pkg_cv_ICU_CFLAGS
+	ICU_LIBS=$pkg_cv_ICU_LIBS
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+CXXFLAGS="$CXXFLAGS $ICU_CFLAGS"
+LIBS="$ICU_LIBS $LIBS"
+
+
 pkg_failed=no
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for XML2" >&5
 $as_echo_n "checking for XML2... " >&6; }
@@ -16755,15 +16745,6 @@ CXXFLAGS="$CXXFLAGS $XML2_CFLAGS"
 LIBS="$LIBS $XML2_LIBS"
 
 
-# Check whether --with-folia was given.
-if test "${with_folia+set}" = set; then :
-  withval=$with_folia; PKG_CONFIG_PATH="$withval/lib/pkgconfig:$PKG_CONFIG_PATH"
-else
-  PKG_CONFIG_PATH="$prefix/lib/pkgconfig:$PKG_CONFIG_PATH"
-fi
-
-#AC_MSG_NOTICE( [pkg-config search path: $PKG_CONFIG_PATH] )
-
 pkg_failed=no
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for folia" >&5
 $as_echo_n "checking for folia... " >&6; }
@@ -16772,12 +16753,12 @@ if test -n "$folia_CFLAGS"; then
     pkg_cv_folia_CFLAGS="$folia_CFLAGS"
  elif test -n "$PKG_CONFIG"; then
     if test -n "$PKG_CONFIG" && \
-    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"folia >= 1.0 \""; } >&5
-  ($PKG_CONFIG --exists --print-errors "folia >= 1.0 ") 2>&5
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"folia >= 1.10 \""; } >&5
+  ($PKG_CONFIG --exists --print-errors "folia >= 1.10 ") 2>&5
   ac_status=$?
   $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-  pkg_cv_folia_CFLAGS=`$PKG_CONFIG --cflags "folia >= 1.0 " 2>/dev/null`
+  pkg_cv_folia_CFLAGS=`$PKG_CONFIG --cflags "folia >= 1.10 " 2>/dev/null`
 		      test "x$?" != "x0" && pkg_failed=yes
 else
   pkg_failed=yes
@@ -16789,12 +16770,12 @@ if test -n "$folia_LIBS"; then
     pkg_cv_folia_LIBS="$folia_LIBS"
  elif test -n "$PKG_CONFIG"; then
     if test -n "$PKG_CONFIG" && \
-    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"folia >= 1.0 \""; } >&5
-  ($PKG_CONFIG --exists --print-errors "folia >= 1.0 ") 2>&5
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"folia >= 1.10 \""; } >&5
+  ($PKG_CONFIG --exists --print-errors "folia >= 1.10 ") 2>&5
   ac_status=$?
   $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-  pkg_cv_folia_LIBS=`$PKG_CONFIG --libs "folia >= 1.0 " 2>/dev/null`
+  pkg_cv_folia_LIBS=`$PKG_CONFIG --libs "folia >= 1.10 " 2>/dev/null`
 		      test "x$?" != "x0" && pkg_failed=yes
 else
   pkg_failed=yes
@@ -16815,14 +16796,14 @@ else
         _pkg_short_errors_supported=no
 fi
         if test $_pkg_short_errors_supported = yes; then
-	        folia_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "folia >= 1.0 " 2>&1`
+	        folia_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "folia >= 1.10 " 2>&1`
         else
-	        folia_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "folia >= 1.0 " 2>&1`
+	        folia_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "folia >= 1.10 " 2>&1`
         fi
 	# Put the nasty error message in config.log where it belongs
 	echo "$folia_PKG_ERRORS" >&5
 
-	as_fn_error $? "Package requirements (folia >= 1.0 ) were not met:
+	as_fn_error $? "Package requirements (folia >= 1.10 ) were not met:
 
 $folia_PKG_ERRORS
 
@@ -16858,15 +16839,6 @@ CXXFLAGS="$folia_CFLAGS $CXXFLAGS"
 LIBS="$folia_LIBS $LIBS"
 
 
-# Check whether --with-ticcutils was given.
-if test "${with_ticcutils+set}" = set; then :
-  withval=$with_ticcutils; PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$withval/lib/pkgconfig"
-else
-  PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$prefix/lib/pkgconfig"
-fi
-
-#  AC_MSG_NOTICE( [pkg-config search path: $PKG_CONFIG_PATH] )
-
 pkg_failed=no
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ticcutils" >&5
 $as_echo_n "checking for ticcutils... " >&6; }
@@ -17129,7 +17101,7 @@ fi
 fi
 # Checks for library functions.
 
-ac_config_files="$ac_config_files Makefile ucto.pc ucto-icu.pc m4/Makefile config/Makefile docs/Makefile src/Makefile tests/Makefile include/Makefile include/ucto/Makefile"
+ac_config_files="$ac_config_files Makefile ucto.pc m4/Makefile config/Makefile docs/Makefile src/Makefile tests/Makefile include/Makefile include/ucto/Makefile"
 
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
@@ -17665,7 +17637,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by ucto $as_me 0.9.6, which was
+This file was extended by ucto $as_me 0.9.8, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -17731,7 +17703,7 @@ _ACEOF
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
 ac_cs_version="\\
-ucto config.status 0.9.6
+ucto config.status 0.9.8
 configured by $0, generated by GNU Autoconf 2.69,
   with options \\"\$ac_cs_config\\"
 
@@ -18246,7 +18218,6 @@ do
     "libtool") CONFIG_COMMANDS="$CONFIG_COMMANDS libtool" ;;
     "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
     "ucto.pc") CONFIG_FILES="$CONFIG_FILES ucto.pc" ;;
-    "ucto-icu.pc") CONFIG_FILES="$CONFIG_FILES ucto-icu.pc" ;;
     "m4/Makefile") CONFIG_FILES="$CONFIG_FILES m4/Makefile" ;;
     "config/Makefile") CONFIG_FILES="$CONFIG_FILES config/Makefile" ;;
     "docs/Makefile") CONFIG_FILES="$CONFIG_FILES docs/Makefile" ;;
diff --git a/configure.ac b/configure.ac
index ca95513..604a15d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2,7 +2,7 @@
 # Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.59)
-AC_INIT([ucto], [0.9.6], [lamasoftware at science.ru.nl])
+AC_INIT([ucto], [0.9.8], [lamasoftware at science.ru.nl])
 AM_INIT_AUTOMAKE([foreign])
 AC_CONFIG_SRCDIR([configure.ac])
 AC_CONFIG_MACRO_DIR([m4])
@@ -19,7 +19,7 @@ else
 fi
 
 # Checks for programs.
-AC_PROG_CXX( [g++ c++] )
+AC_PROG_CXX( [c++] )
 
 if $cxx_flags_were_set; then
   CXXFLAGS=$CXXFLAGS
@@ -50,12 +50,6 @@ AC_TYPE_INT32_T
 
 AX_LIB_READLINE
 
-# ugly hack when PKG_CONFIG_PATH isn't defined.
-# couldn't get it to work otherwise
-if test "x$PKG_CONFIG_PATH" = x; then
-    export PKG_CONFIG_PATH=""
-fi
-#AC_MSG_NOTICE( [pkg-config search path:$PKG_CONFIG_PATH dus] )
 AC_CHECK_HEADERS([libexttextcat/textcat.h],
 	         [CXXFLAGS="$CXXFLAGS -I$prefix/include"],
 	         [AC_CHECK_HEADERS([libtextcat/textcat.h],
@@ -67,49 +61,35 @@ AC_CHECK_HEADERS([libexttextcat/textcat.h],
 AC_SEARCH_LIBS([textcat_Init],[exttextcat-2.0 exttextcat textcat],[AC_DEFINE(HAVE_TEXTCAT_LIB, 1, textcat_lib)],
   [AC_MSG_NOTICE([Unable to find textcat library. textcat support not available])])
 
-useICU=1;
-# inspired by feh-1.3.4/configure.ac.  Tnx Tom Gilbert and feh hackers.
-AC_ARG_WITH(icu,
-       [  --with-icu=DIR       use ICU installed in <DIR>],
-       [if test "$with_icu" = "no"; then
-           useICU=0
-	else
-	   CXXFLAGS="$CXXFLAGS -I$withval/include"
-           LIBS="-L$withval/lib $LIBS"
-	fi] )
-
-if test "$useICU" = "1"; then
-  AX_ICU_CHECK( [5.2],
-		[CXXFLAGS="$CXXFLAGS $ICU_CPPSEARCHPATH"
-		LIBS="$ICU_LIBPATH $ICU_LIBS $ICU_IOLIBS $LIBS"],
-		[AC_MSG_FAILURE( "No ICU development environment found. Please check if libicu-dev or the like is installed" )] )
-  AC_DEFINE(HAVE_ICU, 1, we want to use ICU )
+if test $prefix = "NONE"; then
+   prefix="$ac_default_prefix"
+fi
+
+PKG_PROG_PKG_CONFIG
+
+if test "x$PKG_CONFIG_PATH" = x; then
+    export PKG_CONFIG_PATH="$prefix/lib/pkgconfig"
 else
-  AC_MSG_ERROR("ICU support is required")
+    export PKG_CONFIG_PATH="$prefix/lib/pkgconfig:$PKG_CONFIG_PATH"
 fi
 
+AC_ARG_WITH(icu,
+       [  --with-icu=DIR       use icu installed in <DIR>],
+       [PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$withval/lib/pkgconfig"],
+       [])
+
+PKG_CHECK_MODULES([ICU], [icu-uc >= 50 icu-io] )
+CXXFLAGS="$CXXFLAGS $ICU_CFLAGS"
+LIBS="$ICU_LIBS $LIBS"
+
 PKG_CHECK_MODULES([XML2], [libxml-2.0 >= 2.6.16] )
 CXXFLAGS="$CXXFLAGS $XML2_CFLAGS"
 LIBS="$LIBS $XML2_LIBS"
 
-AC_ARG_WITH(folia,
-       [  --with-folia=DIR       use libfolia installed in <DIR>;
-               note that you can install folia in a non-default directory with
-               ./configure --prefix=<DIR> in the folia installation directory],
-       [PKG_CONFIG_PATH="$withval/lib/pkgconfig:$PKG_CONFIG_PATH"],
-       [PKG_CONFIG_PATH="$prefix/lib/pkgconfig:$PKG_CONFIG_PATH"])
-#AC_MSG_NOTICE( [pkg-config search path: $PKG_CONFIG_PATH] )
-PKG_CHECK_MODULES([folia], [folia >= 1.0] )
+PKG_CHECK_MODULES([folia], [folia >= 1.10] )
 CXXFLAGS="$folia_CFLAGS $CXXFLAGS"
 LIBS="$folia_LIBS $LIBS"
 
-AC_ARG_WITH(ticcutils,
-       [  --with-ticcutils=DIR       use ticcutils installed in <DIR>;
-               note that you can install ticcutils in a non-default directory with
-               ./configure --prefix=<DIR> in the ticcutils installation directory],
-       [PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$withval/lib/pkgconfig"],
-       [PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$prefix/lib/pkgconfig"])
-#  AC_MSG_NOTICE( [pkg-config search path: $PKG_CONFIG_PATH] )
 PKG_CHECK_MODULES([ticcutils], [ticcutils >= 0.6] )
 CXXFLAGS="$CXXFLAGS $ticcutils_CFLAGS"
 LIBS="$LIBS $ticcutils_LIBS"
@@ -135,7 +115,6 @@ PKG_CHECK_MODULES(
 AC_OUTPUT([
   Makefile
   ucto.pc
-  ucto-icu.pc
   m4/Makefile
   config/Makefile
   docs/Makefile
diff --git a/docs/Makefile.in b/docs/Makefile.in
index f1784e5..29cf52a 100644
--- a/docs/Makefile.in
+++ b/docs/Makefile.in
@@ -92,8 +92,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = docs
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -189,13 +188,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -286,6 +279,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
diff --git a/docs/ucto.1 b/docs/ucto.1
index ef02a3a..3c2c3c4 100644
--- a/docs/ucto.1
+++ b/docs/ucto.1
@@ -1,4 +1,4 @@
-.TH ucto 1 "2014 december 2"
+.TH ucto 1 "2017 may 10"
 
 .SH NAME
 ucto \- Unicode Tokenizer
@@ -40,11 +40,17 @@ disable filtering of special characters
 
 .BR \-L " language"
 .RS
- Automatically selects a configuration file by language code. 
+ Automatically selects a configuration file by language code.
  The language code is generally a three-letter iso-639-3 code.
 For example, 'fra' will select the file tokconfig\(hyfra from the installation directory
 .RE
 
+.BR \-\-detectlanguages =<lang1,lang2,..langn>
+.RS
+try to detect all the specified languages. The default language will be 'lang1'.
+(only useful for FoLiA output)
+.RE
+
 .BR \-l
 .RS
 Convert to all lowercase
@@ -60,6 +66,11 @@ Convert to all uppercase
 Emit one sentence per line on output
 .RE
 
+.BR \-\-normalize=class1,class2,..,classn
+.RS
+map all occurrences of tokens with class1,...class to their generic names. e.g \-\-normalize=DATE will map all dates to the word {{DATE}}. Very useful to normalize tokens like URL's, DATE's, E\-mail addresses and so on.
+.RE
+
 .BR \-m
 .RS
 Assume one sentence per line on input
@@ -72,7 +83,7 @@ Don't tokenize, but perform input decoding and simple token role detection
 
 .BR \-\-filterpunct
 .RS
-remove most of the punctuation from the output. (not from abreviations!)
+remove most of the punctuation from the output. (not from abreviations and embeddded punctuation like John's )
 .RE
 
 .B \-P
@@ -111,11 +122,23 @@ set Verbose mode
 Read a FoLiA XML document, tokenize it, and output the modified doc. (this disables usage of most other options: \-nulPQvsS)
 .RE
 
-.BR \-\-textclass "cls"
+.B \-\-inputclass "cls"
 .RS
 When tokenizing a FoLiA XML document, search for text nodes of class 'cls'
 .RE
 
+.B \-\-outputclass "cls"
+.RS
+When tokenizing a FoLiA XML document, output the tokenized text in text nodes of class 'cls'
+.RE
+
+.B \-\-textclass "cls" (obsolete)
+.RS
+use 'cls' for input and output of text from FoLiA. Equivalent to both \-\-inputclass='cls' and \-\-outputclass='cls')
+
+This option is obsolete and NOT recommended. Please use the separate \-\-inputclass= and \-\-outputclass options.
+.RE
+
 .B \-X
 .RS
 Output FoLiA XML. (this disables usage of most other options: \-nulPQvsS)
diff --git a/include/Makefile.in b/include/Makefile.in
index 4a6c158..69b750f 100644
--- a/include/Makefile.in
+++ b/include/Makefile.in
@@ -91,8 +91,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = include
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -217,13 +216,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -314,6 +307,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
diff --git a/include/ucto/Makefile.in b/include/ucto/Makefile.in
index cda6ffd..489b330 100644
--- a/include/ucto/Makefile.in
+++ b/include/ucto/Makefile.in
@@ -90,8 +90,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = include/ucto
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -204,13 +203,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -301,6 +294,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
diff --git a/include/ucto/setting.h b/include/ucto/setting.h
index 92cf1ac..6c630ca 100644
--- a/include/ucto/setting.h
+++ b/include/ucto/setting.h
@@ -94,6 +94,7 @@ namespace Tokenizer {
     void add_rule( const UnicodeString&, const std::vector<UnicodeString>& );
     void sortRules( std::map<UnicodeString, Rule *>&,
 		    const std::vector<UnicodeString>& );
+    static std::set<std::string> installed_languages();
     UnicodeString eosmarkers;
     std::vector<Rule *> rules;
     std::map<UnicodeString, Rule *> rulesmap;
diff --git a/include/ucto/textcat.h b/include/ucto/textcat.h
index 807a4dc..6a57720 100644
--- a/include/ucto/textcat.h
+++ b/include/ucto/textcat.h
@@ -55,7 +55,7 @@ extern "C" {
 
 class TextCat {
  public:
-  TextCat( const std::string& cf );
+  explicit TextCat( const std::string& cf );
   TextCat( const TextCat& in );
   ~TextCat();
   bool isInit() const { return TC != 0; };
diff --git a/include/ucto/tokenize.h b/include/ucto/tokenize.h
index f2938e3..da70e1b 100644
--- a/include/ucto/tokenize.h
+++ b/include/ucto/tokenize.h
@@ -51,8 +51,7 @@ namespace Tokenizer {
     BEGINQUOTE                  = 16,
     ENDQUOTE                    = 32,
     TEMPENDOFSENTENCE           = 64,
-    LISTITEM                    = 128, //reserved for future use
-    TITLE                       = 256 //reserved for future use
+    LINEBREAK                   = 128
   };
 
   std::ostream& operator<<( std::ostream&, const TokenRole& );
@@ -149,6 +148,7 @@ namespace Tokenizer {
 
     //return the sentence with the specified index in a Token vector;
     std::vector<Token> getSentence( int );
+    void extractSentencesAndFlush( int, std::vector<Token>&, const std::string& );
 
     //Get all sentences as a vector of strings (UTF-8 encoded)
     std::vector<std::string> getSentences();
@@ -185,6 +185,10 @@ namespace Tokenizer {
     bool setQuoteDetection( bool b=true ) { bool t = detectQuotes; detectQuotes = b; return t; }
     bool getQuoteDetection() const { return detectQuotes; }
 
+    //Enable language detection
+    bool setLangDetection( bool b=true ) { bool t = doDetectLang; doDetectLang = b; return t; }
+    bool getLangDetection() const { return doDetectLang; }
+
     //Enable filtering
     bool setFiltering( bool b=true ) {
       bool t = doFilter; doFilter = b; return t;
@@ -196,6 +200,8 @@ namespace Tokenizer {
     }
     bool getPunctFilter() const { return doPunctFilter; };
 
+    std::string setTextRedundancy( const std::string& );
+
     // set normalization mode
     std::string setNormalization( const std::string& s ) {
       return normalizer.setMode( s );
@@ -227,6 +233,7 @@ namespace Tokenizer {
     const std::string setTextClass( const std::string& cls) {
       std::string res = inputclass;
       inputclass = cls;
+      outputclass = cls;
       return res;
     }
     const std::string getInputClass( ) const { return inputclass; }
@@ -261,6 +268,9 @@ namespace Tokenizer {
 		       bool,
 		       const std::string&,
 		       const UnicodeString& ="" );
+    int tokenizeLine( const UnicodeString&,
+		      const std::string&,
+		      const std::string& );
 
     bool detectEos( size_t, const UnicodeString&, const Quoting& ) const;
     void detectSentenceBounds( const int offset,
@@ -276,7 +286,6 @@ namespace Tokenizer {
     bool u_isquote( UChar32,
 		    const Quoting& ) const;
     std::string checkBOM( std::istream& );
-    void outputTokensDoc( folia::Document&, const std::vector<Token>& ) const;
     void outputTokensDoc_init( folia::Document& ) const;
 
     int outputTokensXML( folia::FoliaElement *,
@@ -321,6 +330,13 @@ namespace Tokenizer {
     //has a paragraph been signaled?
     bool paragraphsignal;
 
+    //has do we attempt to assign languages?
+    bool doDetectLang;
+
+    //has do we percolate text up from <w> to <s> and <p> nodes? (FoLiA)
+    // values should be: 'full', 'minimal' or 'none'
+    std::string text_redundancy;
+
     //one sentence per line output
     bool sentenceperlineoutput;
     bool sentenceperlineinput;
diff --git a/install-sh b/install-sh
index 0b0fdcb..59990a1 100755
--- a/install-sh
+++ b/install-sh
@@ -1,7 +1,7 @@
 #!/bin/sh
 # install - install a program, script, or datafile
 
-scriptversion=2013-12-25.23; # UTC
+scriptversion=2014-09-12.12; # UTC
 
 # This originates from X11R5 (mit/util/scripts/install.sh), which was
 # later released in X11R6 (xc/config/util/install.sh) with the
@@ -324,34 +324,41 @@ do
             # is incompatible with FreeBSD 'install' when (umask & 300) != 0.
             ;;
           *)
+            # $RANDOM is not portable (e.g. dash);  use it when possible to
+            # lower collision chance
             tmpdir=${TMPDIR-/tmp}/ins$RANDOM-$$
-            trap 'ret=$?; rmdir "$tmpdir/d" "$tmpdir" 2>/dev/null; exit $ret' 0
+            trap 'ret=$?; rmdir "$tmpdir/a/b" "$tmpdir/a" "$tmpdir" 2>/dev/null; exit $ret' 0
 
+            # As "mkdir -p" follows symlinks and we work in /tmp possibly;  so
+            # create the $tmpdir first (and fail if unsuccessful) to make sure
+            # that nobody tries to guess the $tmpdir name.
             if (umask $mkdir_umask &&
-                exec $mkdirprog $mkdir_mode -p -- "$tmpdir/d") >/dev/null 2>&1
+                $mkdirprog $mkdir_mode "$tmpdir" &&
+                exec $mkdirprog $mkdir_mode -p -- "$tmpdir/a/b") >/dev/null 2>&1
             then
               if test -z "$dir_arg" || {
                    # Check for POSIX incompatibilities with -m.
                    # HP-UX 11.23 and IRIX 6.5 mkdir -m -p sets group- or
                    # other-writable bit of parent directory when it shouldn't.
                    # FreeBSD 6.1 mkdir -m -p sets mode of existing directory.
-                   ls_ld_tmpdir=`ls -ld "$tmpdir"`
+                   test_tmpdir="$tmpdir/a"
+                   ls_ld_tmpdir=`ls -ld "$test_tmpdir"`
                    case $ls_ld_tmpdir in
                      d????-?r-*) different_mode=700;;
                      d????-?--*) different_mode=755;;
                      *) false;;
                    esac &&
-                   $mkdirprog -m$different_mode -p -- "$tmpdir" && {
-                     ls_ld_tmpdir_1=`ls -ld "$tmpdir"`
+                   $mkdirprog -m$different_mode -p -- "$test_tmpdir" && {
+                     ls_ld_tmpdir_1=`ls -ld "$test_tmpdir"`
                      test "$ls_ld_tmpdir" = "$ls_ld_tmpdir_1"
                    }
                  }
               then posix_mkdir=:
               fi
-              rmdir "$tmpdir/d" "$tmpdir"
+              rmdir "$tmpdir/a/b" "$tmpdir/a" "$tmpdir"
             else
               # Remove any dirs left behind by ancient mkdir implementations.
-              rmdir ./$mkdir_mode ./-p ./-- 2>/dev/null
+              rmdir ./$mkdir_mode ./-p ./-- "$tmpdir" 2>/dev/null
             fi
             trap '' 0;;
         esac;;
diff --git a/ltmain.sh b/ltmain.sh
index 0f0a2da..a736cf9 100644
--- a/ltmain.sh
+++ b/ltmain.sh
@@ -31,7 +31,7 @@
 
 PROGRAM=libtool
 PACKAGE=libtool
-VERSION=2.4.6
+VERSION="2.4.6 Debian-2.4.6-2"
 package_revision=2.4.6
 
 
@@ -2068,12 +2068,12 @@ include the following information:
        compiler:       $LTCC
        compiler flags: $LTCFLAGS
        linker:         $LD (gnu? $with_gnu_ld)
-       version:        $progname (GNU libtool) 2.4.6
+       version:        $progname $scriptversion Debian-2.4.6-2
        automake:       `($AUTOMAKE --version) 2>/dev/null |$SED 1q`
        autoconf:       `($AUTOCONF --version) 2>/dev/null |$SED 1q`
 
 Report bugs to <bug-libtool at gnu.org>.
-GNU libtool home page: <http://www.gnu.org/software/libtool/>.
+GNU libtool home page: <http://www.gnu.org/s/libtool/>.
 General help using GNU software: <http://www.gnu.org/gethelp/>."
     exit 0
 }
@@ -7272,10 +7272,13 @@ func_mode_link ()
       # -tp=*                Portland pgcc target processor selection
       # --sysroot=*          for sysroot support
       # -O*, -g*, -flto*, -fwhopr*, -fuse-linker-plugin GCC link-time optimization
+      # -specs=*             GCC specs files
       # -stdlib=*            select c++ std lib with clang
+      # -fsanitize=*         Clang/GCC memory and address sanitizer
       -64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*| \
       -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*|--sysroot=*| \
-      -O*|-g*|-flto*|-fwhopr*|-fuse-linker-plugin|-fstack-protector*|-stdlib=*)
+      -O*|-g*|-flto*|-fwhopr*|-fuse-linker-plugin|-fstack-protector*|-stdlib=*| \
+      -specs=*|-fsanitize=*)
         func_quote_for_eval "$arg"
 	arg=$func_quote_for_eval_result
         func_append compile_command " $arg"
@@ -7568,7 +7571,10 @@ func_mode_link ()
 	case $pass in
 	dlopen) libs=$dlfiles ;;
 	dlpreopen) libs=$dlprefiles ;;
-	link) libs="$deplibs %DEPLIBS% $dependency_libs" ;;
+	link)
+	  libs="$deplibs %DEPLIBS%"
+	  test "X$link_all_deplibs" != Xno && libs="$libs $dependency_libs"
+	  ;;
 	esac
       fi
       if test lib,dlpreopen = "$linkmode,$pass"; then
@@ -7887,19 +7893,19 @@ func_mode_link ()
 	    # It is a libtool convenience library, so add in its objects.
 	    func_append convenience " $ladir/$objdir/$old_library"
 	    func_append old_convenience " $ladir/$objdir/$old_library"
+	    tmp_libs=
+	    for deplib in $dependency_libs; do
+	      deplibs="$deplib $deplibs"
+	      if $opt_preserve_dup_deps; then
+		case "$tmp_libs " in
+		*" $deplib "*) func_append specialdeplibs " $deplib" ;;
+		esac
+	      fi
+	      func_append tmp_libs " $deplib"
+	    done
 	  elif test prog != "$linkmode" && test lib != "$linkmode"; then
 	    func_fatal_error "'$lib' is not a convenience library"
 	  fi
-	  tmp_libs=
-	  for deplib in $dependency_libs; do
-	    deplibs="$deplib $deplibs"
-	    if $opt_preserve_dup_deps; then
-	      case "$tmp_libs " in
-	      *" $deplib "*) func_append specialdeplibs " $deplib" ;;
-	      esac
-	    fi
-	    func_append tmp_libs " $deplib"
-	  done
 	  continue
 	fi # $pass = conv
 
@@ -8823,6 +8829,9 @@ func_mode_link ()
 	    revision=$number_minor
 	    lt_irix_increment=no
 	    ;;
+	  *)
+	    func_fatal_configuration "$modename: unknown library version type '$version_type'"
+	    ;;
 	  esac
 	  ;;
 	no)
diff --git a/m4/Makefile.in b/m4/Makefile.in
index 6fec252..823f8bc 100644
--- a/m4/Makefile.in
+++ b/m4/Makefile.in
@@ -92,8 +92,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = m4
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -158,13 +157,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -255,6 +248,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
diff --git a/m4/ax_icu_check.m4 b/m4/ax_icu_check.m4
deleted file mode 100644
index 3ffe425..0000000
--- a/m4/ax_icu_check.m4
+++ /dev/null
@@ -1,86 +0,0 @@
-dnl @synopsis AX_ICU_CHECK([version], [action-if], [action-if-not])
-dnl
-dnl Test for ICU support
-dnl
-dnl This will define ICU_LIBS, ICU_CFLAGS, ICU_CXXFLAGS, ICU_IOLIBS.
-dnl
-dnl Based on ac_check_icu (http://autoconf-archive.cryp.to/ac_check_icu.html)
-dnl by Akos Maroy <darkeye at tyrell.hu>.
-dnl
-dnl Portions Copyright 2005 Akos Maroy <darkeye at tyrell.hu>
-dnl Copying and distribution of this file, with or without modification,
-dnl are permitted in any medium without royalty provided the copyright
-dnl notice and this notice are preserved.
-dnl
-dnl @author Hunter Morris <huntermorris at gmail.com>
-dnl @version 2008-03-18
-AC_DEFUN([AX_ICU_CHECK], [
-  succeeded=no
-
-  if test -z "$ICU_CONFIG"; then
-     AC_PATH_PROG(ICU_CONFIG, icu-config, no)
-  fi
-
-  if test "$ICU_CONFIG" = "no" ; then
-     echo "*** The icu-config script could not be found. Make sure it is"
-     echo "*** in your path, and that taglib is properly installed."
-     echo "*** Or see http://www.icu-project.org/"
-  else
-	ICU_VERSION=`$ICU_CONFIG --version`
-	AC_MSG_CHECKING(for ICU >= $1)
-	VERSION_CHECK=`expr $ICU_VERSION \>\= $1`
-	if test "$VERSION_CHECK" = "1" ; then
-	   AC_MSG_RESULT(yes)
-	   succeeded=yes
-
-	   AC_MSG_CHECKING(ICU_CFLAGS)
-	   ICU_CFLAGS=`$ICU_CONFIG --cflags`
-	   AC_MSG_RESULT($ICU_CFLAGS)
-
-	   AC_MSG_CHECKING(ICU_CPPSEARCHPATH)
-	   ICU_CPPSEARCHPATH=`$ICU_CONFIG --cppflags-searchpath`
-	   AC_MSG_RESULT($ICU_CPPSEARCHPATH)
-
-	   AC_MSG_CHECKING(ICU_CXXFLAGS)
-	   ICU_CXXFLAGS=`$ICU_CONFIG --cxxflags`
-	   AC_MSG_RESULT($ICU_CXXFLAGS)
-	   
-	   AC_MSG_CHECKING(ICU_LIBS)
-	   ICU_LIBS=`$ICU_CONFIG --ldflags-libsonly`
-	   AC_MSG_RESULT($ICU_LIBS)
-
-	   AC_MSG_CHECKING(ICU_LIBPATH)
-	   ICU_LIBPATH=`$ICU_CONFIG --ldflags-searchpath`
-	   AC_MSG_RESULT($ICU_LIBPATH)
-
-	   AC_MSG_CHECKING(ICU_IOLIBS)
-	   ICU_IOLIBS=`$ICU_CONFIG --ldflags-icuio`
-	   AC_MSG_RESULT($ICU_IOLIBS)
-	else
-	   ICU_CFLAGS=""
-	   ICU_CXXFLAGS=""
-	   ICU_CPPSEARCHPATH=""
-	   ICU_LIBPATH=""
-	   ICU_LIBS=""
-	   ICU_IOLIBS=""
-	   ## If we have a custom action on failure, don't print errors, but
-	   ## do set a variable so people can do so.
-	   ifelse([$3], ,echo "can't find ICU >= $1",)
-        fi
-
-	AC_SUBST(ICU_CFLAGS)
-	AC_SUBST(ICU_CXXFLAGS)
-	AC_SUBST(ICU_CPPSEARCHPATH)
-	AC_SUBST(ICU_VERSION)
-	AC_SUBST(ICU_LIBPATH)
-	AC_SUBST(ICU_LIBS)
-	AC_SUBST(ICU_IOLIBS)
-  fi
-
-  if test $succeeded = yes; then
-     ifelse([$2], , :, [$2])
-  else
-     ifelse([$3], , AC_MSG_ERROR([Library requirements (ICU) not met.]), [$3])
-  fi
-])
-
diff --git a/m4/libtool.m4 b/m4/libtool.m4
index a3bc337..10ab284 100644
--- a/m4/libtool.m4
+++ b/m4/libtool.m4
@@ -2887,6 +2887,18 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   dynamic_linker='GNU/Linux ld.so'
   ;;
 
+netbsdelf*-gnu)
+  version_type=linux
+  need_lib_prefix=no
+  need_version=no
+  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
+  soname_spec='${libname}${release}${shared_ext}$major'
+  shlibpath_var=LD_LIBRARY_PATH
+  shlibpath_overrides_runpath=no
+  hardcode_into_libs=yes
+  dynamic_linker='NetBSD ld.elf_so'
+  ;;
+
 netbsd*)
   version_type=sunos
   need_lib_prefix=no
@@ -3546,7 +3558,7 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-netbsd*)
+netbsd* | netbsdelf*-gnu)
   if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then
     lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$'
   else
@@ -4424,7 +4436,7 @@ m4_if([$1], [CXX], [
 	    ;;
 	esac
 	;;
-      netbsd*)
+      netbsd* | netbsdelf*-gnu)
 	;;
       *qnx* | *nto*)
         # QNX uses GNU C++, but need to define -shared option too, otherwise
@@ -4936,6 +4948,9 @@ m4_if([$1], [CXX], [
       ;;
     esac
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    _LT_TAGVAR(link_all_deplibs, $1)=no
+    ;;
   *)
     _LT_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
     ;;
@@ -4998,6 +5013,9 @@ dnl Note also adjust exclude_expsyms for C++ above.
   openbsd* | bitrig*)
     with_gnu_ld=no
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    _LT_TAGVAR(link_all_deplibs, $1)=no
+    ;;
   esac
 
   _LT_TAGVAR(ld_shlibs, $1)=yes
@@ -5252,7 +5270,7 @@ _LT_EOF
       fi
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	_LT_TAGVAR(archive_cmds, $1)='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib'
 	wlarc=
@@ -5773,6 +5791,7 @@ _LT_EOF
 	if test yes = "$lt_cv_irix_exported_symbol"; then
           _LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations $wl-exports_file $wl$export_symbols -o $lib'
 	fi
+	_LT_TAGVAR(link_all_deplibs, $1)=no
       else
 	_LT_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib'
 	_LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -exports_file $export_symbols -o $lib'
@@ -5794,7 +5813,7 @@ _LT_EOF
       esac
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	_LT_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'  # a.out
       else
diff --git a/m4/ltsugar.m4 b/m4/ltsugar.m4
index 48bc934..9000a05 100644
--- a/m4/ltsugar.m4
+++ b/m4/ltsugar.m4
@@ -1,7 +1,6 @@
 # ltsugar.m4 -- libtool m4 base layer.                         -*-Autoconf-*-
 #
-# Copyright (C) 2004-2005, 2007-2008, 2011-2015 Free Software
-# Foundation, Inc.
+# Copyright (C) 2004, 2005, 2007, 2008 Free Software Foundation, Inc.
 # Written by Gary V. Vaughan, 2004
 #
 # This file is free software; the Free Software Foundation gives
@@ -34,7 +33,7 @@ m4_define([_lt_join],
 # ------------
 # Manipulate m4 lists.
 # These macros are necessary as long as will still need to support
-# Autoconf-2.59, which quotes differently.
+# Autoconf-2.59 which quotes differently.
 m4_define([lt_car], [[$1]])
 m4_define([lt_cdr],
 [m4_if([$#], 0, [m4_fatal([$0: cannot be called without arguments])],
@@ -45,7 +44,7 @@ m4_define([lt_unquote], $1)
 
 # lt_append(MACRO-NAME, STRING, [SEPARATOR])
 # ------------------------------------------
-# Redefine MACRO-NAME to hold its former content plus 'SEPARATOR''STRING'.
+# Redefine MACRO-NAME to hold its former content plus `SEPARATOR'`STRING'.
 # Note that neither SEPARATOR nor STRING are expanded; they are appended
 # to MACRO-NAME as is (leaving the expansion for when MACRO-NAME is invoked).
 # No SEPARATOR is output if MACRO-NAME was previously undefined (different
diff --git a/m4/lt~obsolete.m4 b/m4/lt~obsolete.m4
index c6b26f8..c573da9 100644
--- a/m4/lt~obsolete.m4
+++ b/m4/lt~obsolete.m4
@@ -1,7 +1,6 @@
 # lt~obsolete.m4 -- aclocal satisfying obsolete definitions.    -*-Autoconf-*-
 #
-#   Copyright (C) 2004-2005, 2007, 2009, 2011-2015 Free Software
-#   Foundation, Inc.
+#   Copyright (C) 2004, 2005, 2007, 2009 Free Software Foundation, Inc.
 #   Written by Scott James Remnant, 2004.
 #
 # This file is free software; the Free Software Foundation gives
@@ -12,7 +11,7 @@
 
 # These exist entirely to fool aclocal when bootstrapping libtool.
 #
-# In the past libtool.m4 has provided macros via AC_DEFUN (or AU_DEFUN),
+# In the past libtool.m4 has provided macros via AC_DEFUN (or AU_DEFUN)
 # which have later been changed to m4_define as they aren't part of the
 # exported API, or moved to Autoconf or Automake where they belong.
 #
@@ -26,7 +25,7 @@
 # included after everything else.  This provides aclocal with the
 # AC_DEFUNs it wants, but when m4 processes it, it doesn't do anything
 # because those macros already exist, or will be overwritten later.
-# We use AC_DEFUN over AU_DEFUN for compatibility with aclocal-1.6.
+# We use AC_DEFUN over AU_DEFUN for compatibility with aclocal-1.6. 
 #
 # Anytime we withdraw an AC_DEFUN or AU_DEFUN, remember to add it here.
 # Yes, that means every name once taken will need to remain here until
diff --git a/m4/pkg.m4 b/m4/pkg.m4
index 82bea96..c5b26b5 100644
--- a/m4/pkg.m4
+++ b/m4/pkg.m4
@@ -1,60 +1,29 @@
-dnl pkg.m4 - Macros to locate and utilise pkg-config.   -*- Autoconf -*-
-dnl serial 11 (pkg-config-0.29.1)
-dnl
-dnl Copyright © 2004 Scott James Remnant <scott at netsplit.com>.
-dnl Copyright © 2012-2015 Dan Nicholson <dbn.lists at gmail.com>
-dnl
-dnl This program is free software; you can redistribute it and/or modify
-dnl it under the terms of the GNU General Public License as published by
-dnl the Free Software Foundation; either version 2 of the License, or
-dnl (at your option) any later version.
-dnl
-dnl This program is distributed in the hope that it will be useful, but
-dnl WITHOUT ANY WARRANTY; without even the implied warranty of
-dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-dnl General Public License for more details.
-dnl
-dnl You should have received a copy of the GNU General Public License
-dnl along with this program; if not, write to the Free Software
-dnl Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
-dnl 02111-1307, USA.
-dnl
-dnl As a special exception to the GNU General Public License, if you
-dnl distribute this file as part of a program that contains a
-dnl configuration script generated by Autoconf, you may include it under
-dnl the same distribution terms that you use for the rest of that
-dnl program.
-
-dnl PKG_PREREQ(MIN-VERSION)
-dnl -----------------------
-dnl Since: 0.29
-dnl
-dnl Verify that the version of the pkg-config macros are at least
-dnl MIN-VERSION. Unlike PKG_PROG_PKG_CONFIG, which checks the user's
-dnl installed version of pkg-config, this checks the developer's version
-dnl of pkg.m4 when generating configure.
-dnl
-dnl To ensure that this macro is defined, also add:
-dnl m4_ifndef([PKG_PREREQ],
-dnl     [m4_fatal([must install pkg-config 0.29 or later before running autoconf/autogen])])
-dnl
-dnl See the "Since" comment for each macro you use to see what version
-dnl of the macros you require.
-m4_defun([PKG_PREREQ],
-[m4_define([PKG_MACROS_VERSION], [0.29.1])
-m4_if(m4_version_compare(PKG_MACROS_VERSION, [$1]), -1,
-    [m4_fatal([pkg.m4 version $1 or higher is required but ]PKG_MACROS_VERSION[ found])])
-])dnl PKG_PREREQ
-
-dnl PKG_PROG_PKG_CONFIG([MIN-VERSION])
-dnl ----------------------------------
-dnl Since: 0.16
-dnl
-dnl Search for the pkg-config tool and set the PKG_CONFIG variable to
-dnl first found in the path. Checks that the version of pkg-config found
-dnl is at least MIN-VERSION. If MIN-VERSION is not specified, 0.9.0 is
-dnl used since that's the first version where most current features of
-dnl pkg-config existed.
+# pkg.m4 - Macros to locate and utilise pkg-config.            -*- Autoconf -*-
+# serial 1 (pkg-config-0.24)
+# 
+# Copyright © 2004 Scott James Remnant <scott at netsplit.com>.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+#
+# As a special exception to the GNU General Public License, if you
+# distribute this file as part of a program that contains a
+# configuration script generated by Autoconf, you may include it under
+# the same distribution terms that you use for the rest of that program.
+
+# PKG_PROG_PKG_CONFIG([MIN-VERSION])
+# ----------------------------------
 AC_DEFUN([PKG_PROG_PKG_CONFIG],
 [m4_pattern_forbid([^_?PKG_[A-Z_]+$])
 m4_pattern_allow([^PKG_CONFIG(_(PATH|LIBDIR|SYSROOT_DIR|ALLOW_SYSTEM_(CFLAGS|LIBS)))?$])
@@ -76,19 +45,18 @@ if test -n "$PKG_CONFIG"; then
 		PKG_CONFIG=""
 	fi
 fi[]dnl
-])dnl PKG_PROG_PKG_CONFIG
-
-dnl PKG_CHECK_EXISTS(MODULES, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])
-dnl -------------------------------------------------------------------
-dnl Since: 0.18
-dnl
-dnl Check to see whether a particular set of modules exists. Similar to
-dnl PKG_CHECK_MODULES(), but does not set variables or print errors.
-dnl
-dnl Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG])
-dnl only at the first occurence in configure.ac, so if the first place
-dnl it's called might be skipped (such as if it is within an "if", you
-dnl have to call PKG_CHECK_EXISTS manually
+])# PKG_PROG_PKG_CONFIG
+
+# PKG_CHECK_EXISTS(MODULES, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])
+#
+# Check to see whether a particular set of modules exists.  Similar
+# to PKG_CHECK_MODULES(), but does not set variables or print errors.
+#
+# Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG])
+# only at the first occurence in configure.ac, so if the first place
+# it's called might be skipped (such as if it is within an "if", you
+# have to call PKG_CHECK_EXISTS manually
+# --------------------------------------------------------------
 AC_DEFUN([PKG_CHECK_EXISTS],
 [AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
 if test -n "$PKG_CONFIG" && \
@@ -98,10 +66,8 @@ m4_ifvaln([$3], [else
   $3])dnl
 fi])
 
-dnl _PKG_CONFIG([VARIABLE], [COMMAND], [MODULES])
-dnl ---------------------------------------------
-dnl Internal wrapper calling pkg-config via PKG_CONFIG and setting
-dnl pkg_failed based on the result.
+# _PKG_CONFIG([VARIABLE], [COMMAND], [MODULES])
+# ---------------------------------------------
 m4_define([_PKG_CONFIG],
 [if test -n "$$1"; then
     pkg_cv_[]$1="$$1"
@@ -113,11 +79,10 @@ m4_define([_PKG_CONFIG],
  else
     pkg_failed=untried
 fi[]dnl
-])dnl _PKG_CONFIG
+])# _PKG_CONFIG
 
-dnl _PKG_SHORT_ERRORS_SUPPORTED
-dnl ---------------------------
-dnl Internal check to see if pkg-config supports short errors.
+# _PKG_SHORT_ERRORS_SUPPORTED
+# -----------------------------
 AC_DEFUN([_PKG_SHORT_ERRORS_SUPPORTED],
 [AC_REQUIRE([PKG_PROG_PKG_CONFIG])
 if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
@@ -125,17 +90,19 @@ if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
 else
         _pkg_short_errors_supported=no
 fi[]dnl
-])dnl _PKG_SHORT_ERRORS_SUPPORTED
-
-
-dnl PKG_CHECK_MODULES(VARIABLE-PREFIX, MODULES, [ACTION-IF-FOUND],
-dnl   [ACTION-IF-NOT-FOUND])
-dnl --------------------------------------------------------------
-dnl Since: 0.4.0
-dnl
-dnl Note that if there is a possibility the first call to
-dnl PKG_CHECK_MODULES might not happen, you should be sure to include an
-dnl explicit call to PKG_PROG_PKG_CONFIG in your configure.ac
+])# _PKG_SHORT_ERRORS_SUPPORTED
+
+
+# PKG_CHECK_MODULES(VARIABLE-PREFIX, MODULES, [ACTION-IF-FOUND],
+# [ACTION-IF-NOT-FOUND])
+#
+#
+# Note that if there is a possibility the first call to
+# PKG_CHECK_MODULES might not happen, you should be sure to include an
+# explicit call to PKG_PROG_PKG_CONFIG in your configure.ac
+#
+#
+# --------------------------------------------------------------
 AC_DEFUN([PKG_CHECK_MODULES],
 [AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
 AC_ARG_VAR([$1][_CFLAGS], [C compiler flags for $1, overriding pkg-config])dnl
@@ -189,40 +156,16 @@ else
         AC_MSG_RESULT([yes])
 	$3
 fi[]dnl
-])dnl PKG_CHECK_MODULES
-
-
-dnl PKG_CHECK_MODULES_STATIC(VARIABLE-PREFIX, MODULES, [ACTION-IF-FOUND],
-dnl   [ACTION-IF-NOT-FOUND])
-dnl ---------------------------------------------------------------------
-dnl Since: 0.29
-dnl
-dnl Checks for existence of MODULES and gathers its build flags with
-dnl static libraries enabled. Sets VARIABLE-PREFIX_CFLAGS from --cflags
-dnl and VARIABLE-PREFIX_LIBS from --libs.
-dnl
-dnl Note that if there is a possibility the first call to
-dnl PKG_CHECK_MODULES_STATIC might not happen, you should be sure to
-dnl include an explicit call to PKG_PROG_PKG_CONFIG in your
-dnl configure.ac.
-AC_DEFUN([PKG_CHECK_MODULES_STATIC],
-[AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
-_save_PKG_CONFIG=$PKG_CONFIG
-PKG_CONFIG="$PKG_CONFIG --static"
-PKG_CHECK_MODULES($@)
-PKG_CONFIG=$_save_PKG_CONFIG[]dnl
-])dnl PKG_CHECK_MODULES_STATIC
+])# PKG_CHECK_MODULES
 
 
-dnl PKG_INSTALLDIR([DIRECTORY])
-dnl -------------------------
-dnl Since: 0.27
-dnl
-dnl Substitutes the variable pkgconfigdir as the location where a module
-dnl should install pkg-config .pc files. By default the directory is
-dnl $libdir/pkgconfig, but the default can be changed by passing
-dnl DIRECTORY. The user can override through the --with-pkgconfigdir
-dnl parameter.
+# PKG_INSTALLDIR(DIRECTORY)
+# -------------------------
+# Substitutes the variable pkgconfigdir as the location where a module
+# should install pkg-config .pc files. By default the directory is
+# $libdir/pkgconfig, but the default can be changed by passing
+# DIRECTORY. The user can override through the --with-pkgconfigdir
+# parameter.
 AC_DEFUN([PKG_INSTALLDIR],
 [m4_pushdef([pkg_default], [m4_default([$1], ['${libdir}/pkgconfig'])])
 m4_pushdef([pkg_description],
@@ -233,18 +176,16 @@ AC_ARG_WITH([pkgconfigdir],
 AC_SUBST([pkgconfigdir], [$with_pkgconfigdir])
 m4_popdef([pkg_default])
 m4_popdef([pkg_description])
-])dnl PKG_INSTALLDIR
+]) dnl PKG_INSTALLDIR
 
 
-dnl PKG_NOARCH_INSTALLDIR([DIRECTORY])
-dnl --------------------------------
-dnl Since: 0.27
-dnl
-dnl Substitutes the variable noarch_pkgconfigdir as the location where a
-dnl module should install arch-independent pkg-config .pc files. By
-dnl default the directory is $datadir/pkgconfig, but the default can be
-dnl changed by passing DIRECTORY. The user can override through the
-dnl --with-noarch-pkgconfigdir parameter.
+# PKG_NOARCH_INSTALLDIR(DIRECTORY)
+# -------------------------
+# Substitutes the variable noarch_pkgconfigdir as the location where a
+# module should install arch-independent pkg-config .pc files. By
+# default the directory is $datadir/pkgconfig, but the default can be
+# changed by passing DIRECTORY. The user can override through the
+# --with-noarch-pkgconfigdir parameter.
 AC_DEFUN([PKG_NOARCH_INSTALLDIR],
 [m4_pushdef([pkg_default], [m4_default([$1], ['${datadir}/pkgconfig'])])
 m4_pushdef([pkg_description],
@@ -255,15 +196,13 @@ AC_ARG_WITH([noarch-pkgconfigdir],
 AC_SUBST([noarch_pkgconfigdir], [$with_noarch_pkgconfigdir])
 m4_popdef([pkg_default])
 m4_popdef([pkg_description])
-])dnl PKG_NOARCH_INSTALLDIR
+]) dnl PKG_NOARCH_INSTALLDIR
 
 
-dnl PKG_CHECK_VAR(VARIABLE, MODULE, CONFIG-VARIABLE,
-dnl [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])
-dnl -------------------------------------------
-dnl Since: 0.28
-dnl
-dnl Retrieves the value of the pkg-config variable for the given module.
+# PKG_CHECK_VAR(VARIABLE, MODULE, CONFIG-VARIABLE,
+# [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])
+# -------------------------------------------
+# Retrieves the value of the pkg-config variable for the given module.
 AC_DEFUN([PKG_CHECK_VAR],
 [AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
 AC_ARG_VAR([$1], [value of $3 for $2, overriding pkg-config])dnl
@@ -272,4 +211,4 @@ _PKG_CONFIG([$1], [variable="][$3]["], [$2])
 AS_VAR_COPY([$1], [pkg_cv_][$1])
 
 AS_VAR_IF([$1], [""], [$5], [$4])dnl
-])dnl PKG_CHECK_VAR
+])# PKG_CHECK_VAR
diff --git a/src/Makefile.am b/src/Makefile.am
index 74ff1ae..32693e7 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -1,8 +1,5 @@
-# $Id$
-# $URL $
-
 AM_CPPFLAGS = -I at top_srcdir@/include
-AM_CXXFLAGS = -DSYSCONF_PATH=\"$(datadir)\" -std=c++0x # -Weffc++
+AM_CXXFLAGS = -DSYSCONF_PATH=\"$(datadir)\" -std=c++0x -W -Wall -pedantic -O3 -g
 
 bin_PROGRAMS = ucto
 
diff --git a/src/Makefile.in b/src/Makefile.in
index 29a9f70..d73786d 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -14,9 +14,6 @@
 
 @SET_MAKE@
 
-# $Id$
-# $URL $
-
 
 VPATH = @srcdir@
 am__is_gnu_make = { \
@@ -95,8 +92,7 @@ host_triplet = @host@
 bin_PROGRAMS = ucto$(EXEEXT)
 subdir = src
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -423,13 +419,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -520,6 +510,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
@@ -533,7 +524,7 @@ top_srcdir = @top_srcdir@
 uctodata_CFLAGS = @uctodata_CFLAGS@
 uctodata_LIBS = @uctodata_LIBS@
 AM_CPPFLAGS = -I at top_srcdir@/include
-AM_CXXFLAGS = -DSYSCONF_PATH=\"$(datadir)\" -std=c++0x # -Weffc++
+AM_CXXFLAGS = -DSYSCONF_PATH=\"$(datadir)\" -std=c++0x -W -Wall -pedantic -O3 -g
 LDADD = libucto.la
 ucto_SOURCES = ucto.cxx
 lib_LTLIBRARIES = libucto.la
diff --git a/src/setting.cxx b/src/setting.cxx
index e87f466..9d7ec69 100644
--- a/src/setting.cxx
+++ b/src/setting.cxx
@@ -117,7 +117,7 @@ namespace Tokenizer {
 
   class uLogicError: public std::logic_error {
   public:
-    uLogicError( const string& s ): logic_error( "ucto: logic error:" + s ){};
+    explicit uLogicError( const string& s ): logic_error( "ucto: logic error:" + s ){};
   };
 
   ostream& operator<<( ostream& os, const Quoting& q ){
@@ -230,12 +230,27 @@ namespace Tokenizer {
       delete rule;
     }
     rulesmap.clear();
-    delete theErrLog;
+  }
+
+  set<string> Setting::installed_languages() {
+    // we only return 'languages' which are installed as 'tokconfig-*'
+    //
+    vector<string> files = TiCC::searchFilesMatch( defaultConfigDir, "tokconfig-*" );
+    set<string> result;
+    for ( auto const& f : files ){
+      string base = TiCC::basename(f);
+      size_t pos = base.find("tokconfig-");
+      if ( pos == 0 ){
+	string lang = base.substr( 10 );
+	result.insert( lang );
+      }
+    }
+    return result;
   }
 
   bool Setting::readrules( const string& fname ){
     if ( tokDebug > 0 ){
-      *theErrLog << "%include " << fname << endl;
+      LOG << "%include " << fname << endl;
     }
     ifstream f( fname );
     if ( !f ){
@@ -248,7 +263,7 @@ namespace Tokenizer {
 	line.trim();
 	if ((line.length() > 0) && (line[0] != '#')) {
 	  if ( tokDebug >= 5 ){
-	    *theErrLog << "include line = " << rawline << endl;
+	    LOG << "include line = " << rawline << endl;
 	  }
 	  const int splitpoint = line.indexOf("=");
 	  if ( splitpoint < 0 ){
@@ -266,14 +281,14 @@ namespace Tokenizer {
 
   bool Setting::readfilters( const string& fname ){
     if ( tokDebug > 0 ){
-      *theErrLog << "%include " << fname << endl;
+      LOG << "%include " << fname << endl;
     }
     return filter.fill( fname );
   }
 
   bool Setting::readquotes( const string& fname ){
     if ( tokDebug > 0 ){
-      *theErrLog << "%include " << fname << endl;
+      LOG << "%include " << fname << endl;
     }
     ifstream f( fname );
     if ( !f ){
@@ -286,7 +301,7 @@ namespace Tokenizer {
 	line.trim();
 	if ((line.length() > 0) && (line[0] != '#')) {
 	  if ( tokDebug >= 5 ){
-	    *theErrLog << "include line = " << rawline << endl;
+	    LOG << "include line = " << rawline << endl;
 	  }
 	  int splitpoint = line.indexOf(" ");
 	  if ( splitpoint == -1 )
@@ -314,7 +329,7 @@ namespace Tokenizer {
 
   bool Setting::readeosmarkers( const string& fname ){
     if ( tokDebug > 0 ){
-      *theErrLog << "%include " << fname << endl;
+      LOG << "%include " << fname << endl;
     }
     ifstream f( fname );
     if ( !f ){
@@ -327,7 +342,7 @@ namespace Tokenizer {
 	line.trim();
 	if ((line.length() > 0) && (line[0] != '#')) {
 	  if ( tokDebug >= 5 ){
-	    *theErrLog << "include line = " << rawline << endl;
+	    LOG << "include line = " << rawline << endl;
 	  }
 	  if ( ( line.startsWith("\\u") && line.length() == 6 ) ||
 	       ( line.startsWith("\\U") && line.length() == 10 ) ){
@@ -346,7 +361,7 @@ namespace Tokenizer {
   bool Setting::readabbreviations( const string& fname,
 				   UnicodeString& abbreviations ){
     if ( tokDebug > 0 ){
-      *theErrLog << "%include " << fname << endl;
+      LOG << "%include " << fname << endl;
     }
     ifstream f( fname );
     if ( !f ){
@@ -359,7 +374,7 @@ namespace Tokenizer {
 	line.trim();
 	if ((line.length() > 0) && (line[0] != '#')) {
 	  if ( tokDebug >= 5 ){
-	    *theErrLog << "include line = " << rawline << endl;
+	    LOG << "include line = " << rawline << endl;
 	  }
 	  if ( !abbreviations.isEmpty())
 	    abbreviations += '|';
@@ -661,7 +676,7 @@ namespace Tokenizer {
 	    }
 	      break;
 	    default:
-	      throw uLogicError("unhandled case in switch");
+	      throw uLogicError( "unhandled case in switch" );
 	    }
 	  }
 	}
diff --git a/src/textcat.cxx b/src/textcat.cxx
index ae5d97d..3fa0040 100644
--- a/src/textcat.cxx
+++ b/src/textcat.cxx
@@ -71,11 +71,11 @@ string TextCat::get_language( const string& in ) const {
 #else
 TextCat::~TextCat() {}
 
-TextCat::TextCat( const std::string& cf ) {
+TextCat::TextCat( const std::string& cf ): TC(0) {
   throw runtime_error( "TextCat::TextCat(" + cf + "): TextCat Support not available" );
 }
 
-TextCat::TextCat( const TextCat& in ) {
+TextCat::TextCat( const TextCat& in ): TC(0) {
   throw runtime_error( "TextCat::TextCat(): TextCat Support not available" );
 }
 
diff --git a/src/tokenize.cxx b/src/tokenize.cxx
index 502078a..274e1b8 100644
--- a/src/tokenize.cxx
+++ b/src/tokenize.cxx
@@ -74,17 +74,17 @@ namespace Tokenizer {
 
   class uRangeError: public std::out_of_range {
   public:
-    uRangeError( const string& s ): out_of_range( "ucto: out of range:" + s ){};
+    explicit uRangeError( const string& s ): out_of_range( "ucto: out of range:" + s ){};
   };
 
   class uLogicError: public std::logic_error {
   public:
-    uLogicError( const string& s ): logic_error( "ucto: logic error:" + s ){};
+    explicit uLogicError( const string& s ): logic_error( "ucto: logic error:" + s ){};
   };
 
   class uCodingError: public std::runtime_error {
   public:
-    uCodingError( const string& s ): runtime_error( "ucto: coding problem:" + s ){};
+    explicit uCodingError( const string& s ): runtime_error( "ucto: coding problem:" + s ){};
   };
 
 
@@ -153,18 +153,21 @@ namespace Tokenizer {
     doPunctFilter(false),
     detectPar(true),
     paragraphsignal(true),
+    doDetectLang(false),
+    text_redundancy("minimal"),
     sentenceperlineoutput(false),
     sentenceperlineinput(false),
     lowercase(false),
     uppercase(false),
     xmlout(false),
+    xmlin(false),
     passthru(false),
     inputclass("current"),
     outputclass("current"),
     tc( 0 )
   {
-    theErrLog = new TiCC::LogStream(cerr);
-    theErrLog->setstamp( NoStamp );
+    theErrLog = new TiCC::LogStream(cerr, "ucto" );
+    theErrLog->setstamp( StampMessage );
 #ifdef ENABLE_TEXTCAT
     string textcat_cfg = string(SYSCONF_PATH) + "/ucto/textcat.cfg";
     tc = new TextCat( textcat_cfg );
@@ -172,8 +175,21 @@ namespace Tokenizer {
   }
 
   TokenizerClass::~TokenizerClass(){
-    //    delete setting;
+    Setting *d = 0;
+    for ( const auto& s : settings ){
+      if ( s.first == "default" ){
+	// the 'default' may also return as a real 'language'
+	// avoud delettng it twice
+	d = s.second;
+	delete d;
+      }
+      if ( s.second != d ){
+	delete s.second;
+      }
+
+    }
     delete theErrLog;
+    delete tc;
   }
 
   bool TokenizerClass::reset( const string& lang ){
@@ -204,6 +220,18 @@ namespace Tokenizer {
     return old;
   }
 
+  string TokenizerClass::setTextRedundancy( const std::string& tr ){
+    if ( tr == "none" || tr == "minimal" || tr == "full" ){
+      string s = text_redundancy;
+      text_redundancy = tr;
+      return s;
+    }
+    else {
+      throw runtime_error( "illegal value '" + tr + "' for textredundancy. "
+			   "expected 'full', 'minimal' or 'none'." );
+    }
+  }
+
   void stripCR( string& s ){
     string::size_type pos = s.rfind( '\r' );
     if ( pos != string::npos ){
@@ -211,6 +239,63 @@ namespace Tokenizer {
     }
   }
 
+  void TokenizerClass::extractSentencesAndFlush( int numS,
+						 vector<Token>& outputTokens,
+						 const string& lang ){
+    int count = 0;
+    const int size = tokens.size();
+    short quotelevel = 0;
+    size_t begin = 0;
+    size_t end = 0;
+    for ( int i = 0; i < size; ++i ) {
+      if (tokens[i].role & NEWPARAGRAPH) {
+	quotelevel = 0;
+      }
+      else if (tokens[i].role & ENDQUOTE) {
+	--quotelevel;
+      }
+      if ( (tokens[i].role & BEGINOFSENTENCE)
+	   && (quotelevel == 0)) {
+	begin = i;
+      }
+      //FBK: QUOTELEVEL GOES UP BEFORE begin IS UPDATED... RESULTS IN DUPLICATE OUTPUT
+      if (tokens[i].role & BEGINQUOTE) {
+	++quotelevel;
+      }
+      if ((tokens[i].role & ENDOFSENTENCE) && (quotelevel == 0)) {
+	end = i+1;
+	tokens[begin].role |= BEGINOFSENTENCE;  //sanity check
+	if (tokDebug >= 1){
+	  LOG << "[tokenize] extracted sentence " << count << ", begin="<<begin << ",end="<< end << endl;
+	}
+	for ( size_t i=begin; i < end; ++i ){
+	  outputTokens.push_back( tokens[i] );
+	}
+	if ( ++count == numS ){
+	  if (tokDebug >= 1){
+	    LOG << "[tokenize] erase " << end  << " tokens from " << tokens.size() << endl;
+	  }
+	  tokens.erase( tokens.begin(),tokens.begin()+end );
+	  if ( !passthru ){
+	    if ( !settings[lang]->quotes.emptyStack() ) {
+	      settings[lang]->quotes.flushStack( end );
+	    }
+	  }
+	  //After flushing, the first token still in buffer (if any) is always a BEGINOFSENTENCE:
+	  if (!tokens.empty()) {
+	    tokens[0].role |= BEGINOFSENTENCE;
+	  }
+	  return;
+	}
+      }
+    }
+    if ( count < numS ){
+      throw uRangeError( "Not enough sentences exists in the buffer: ("
+			 + toString( count ) + " found. " + toString( numS)
+			 + " wanted)" );
+    }
+  }
+
   vector<Token> TokenizerClass::tokenizeStream( istream& IN,
 						const string& lang ) {
     vector<Token> outputTokens;
@@ -289,7 +374,7 @@ namespace Tokenizer {
 	    }
 	    language = lan;
 	  }
-	  tokenizeLine( input_line, language );
+	  tokenizeLine( input_line, language, "" );
 	}
 	numS = countSentences(); //count full sentences in token buffer
       }
@@ -297,15 +382,7 @@ namespace Tokenizer {
 	if ( tokDebug > 0 ){
 	  LOG << "[tokenize] " << numS << " sentence(s) in buffer, processing..." << endl;
 	}
-	for (int i = 0; i < numS; i++) {
-	  vector<Token> v = getSentence( i );
-	  outputTokens.insert( outputTokens.end(), v.begin(), v.end() );
-	}
-	// clear processed sentences from buffer
-	if ( tokDebug > 0 ){
-	  LOG << "[tokenize] flushing " << numS << " sentence(s) from buffer..." << endl;
-	}
-	flushSentences(numS, lang );
+	extractSentencesAndFlush( numS, outputTokens, lang );
 	return outputTokens;
       }
       else {
@@ -355,7 +432,7 @@ namespace Tokenizer {
 	if ( passthru )
 	  passthruLine( line, bos );
 	else
-	  tokenizeLine( line );
+	  tokenizeLine( line, lang );
 	numS = countSentences(); //count full sentences in token buffer
       }
       if ( numS > 0 ) {
@@ -385,7 +462,7 @@ namespace Tokenizer {
   folia::Document *TokenizerClass::tokenize( istream& IN ) {
     inputEncoding = checkBOM( IN );
     folia::Document *doc = new folia::Document( "id='" + docid + "'" );
-    if ( default_language != "none" ){
+    if ( /*doDetectLang &&*/ default_language != "none" ){
       if ( tokDebug > 0 ){
 	LOG << "[tokenize](stream): SET document language=" << default_language << endl;
       }
@@ -396,16 +473,23 @@ namespace Tokenizer {
     int parCount = 0;
     vector<Token> buffer;
     do {
-	vector<Token> v = tokenizeStream( IN );
-	for ( auto const& token : v ) {
-	  if ( token.role & NEWPARAGRAPH) {
-	    //process the buffer
-	    parCount = outputTokensXML( root, buffer, parCount );
-	    buffer.clear();
-	  }
-	  buffer.push_back( token );
+      if ( tokDebug > 0 ){
+	LOG << "[tokenize] looping on stream" << endl;
+      }
+      vector<Token> v = tokenizeStream( IN );
+      for ( auto const& token : v ) {
+	if ( token.role & NEWPARAGRAPH) {
+	  //process the buffer
+	  parCount = outputTokensXML( root, buffer, parCount );
+	  buffer.clear();
 	}
-    } while ( IN );
+	buffer.push_back( token );
+      }
+    }
+    while ( IN );
+    if ( tokDebug > 0 ){
+      LOG << "[tokenize] end of stream reached" << endl;
+    }
     if (!buffer.empty()){
       outputTokensXML( root, buffer, parCount);
     }
@@ -427,8 +511,8 @@ namespace Tokenizer {
       else {
 	IN = new ifstream( ifile );
 	if ( !IN || !IN->good() ){
-	  cerr << "Error: problems opening inputfile " << ifile << endl;
-	  cerr << "Courageously refusing to start..."  << endl;
+	  cerr << "ucto: problems opening inputfile " << ifile << endl;
+	  cerr << "ucto: Courageously refusing to start..."  << endl;
 	  throw runtime_error( "unable to find or read file: '" + ifile + "'" );
 	}
       }
@@ -437,6 +521,11 @@ namespace Tokenizer {
     else {
       folia::Document doc;
       doc.readFromFile(ifile);
+      if ( xmlin && inputclass == outputclass ){
+	LOG << "ucto: --filter=NO is automatically set. inputclass equals outputclass!"
+	    << endl;
+	setFiltering(false);
+      }
       this->tokenize(doc);
       *OUT << doc << endl;
     }
@@ -490,12 +579,18 @@ namespace Tokenizer {
       int i = 0;
       inputEncoding = checkBOM( IN );
       do {
+	if ( tokDebug > 0 ){
+	  LOG << "[tokenize] looping on stream" << endl;
+	}
 	vector<Token> v = tokenizeStream( IN );
 	if ( !v.empty() ) {
 	  outputTokens( OUT, v , (i>0) );
 	}
 	++i;
       } while ( IN );
+      if ( tokDebug > 0 ){
+	LOG << "[tokenize] end_of_stream" << endl;
+      }
       OUT << endl;
     }
   }
@@ -504,16 +599,29 @@ namespace Tokenizer {
     if ( tokDebug >= 2 ){
       LOG << "tokenize doc " << doc << endl;
     }
-    string lan = doc.doc()->language();
-    if ( lan.empty() && default_language != "none" ){
-      if ( tokDebug > 1 ){
-	LOG << "[tokenize](FoLiA) SET document language=" << default_language << endl;
-      }
-      doc.set_metadata( "language", default_language );
+    if ( xmlin && inputclass == outputclass ){
+      LOG << "ucto: --filter=NO is automatically set. inputclass equals outputclass!"
+	  << endl;
+      setFiltering(false);
     }
-    else {
-      if ( tokDebug >= 2 ){
-	LOG << "[tokenize](FoLiA) Document has language " << lan << endl;
+    if ( true /*doDetectLang*/ ){
+      string lan = doc.doc()->language();
+      if ( lan.empty() && default_language != "none" ){
+	if ( tokDebug > 1 ){
+	  LOG << "[tokenize](FoLiA) SET document language=" << default_language << endl;
+	}
+	if ( doc.metadatatype() == "native" ){
+	  doc.set_metadata( "language", default_language );
+	}
+	else {
+	  LOG << "[WARNING] cannot set the language on FoLiA documents of type "
+	      << doc.metadatatype() << endl;
+	}
+      }
+      else {
+	if ( tokDebug >= 2 ){
+	  LOG << "[tokenize](FoLiA) Document has language " << lan << endl;
+	}
       }
     }
     for ( size_t i = 0; i < doc.doc()->size(); i++) {
@@ -527,25 +635,55 @@ namespace Tokenizer {
 
   void appendText( folia::FoliaElement *root,
 		   const string& outputclass  ){
-    //    cerr << endl << "appendText:" << root->id() << endl;
+    // set the textcontent of root to that of it's children
     if ( root->hastext( outputclass ) ){
+      // there is already text, bail out.
       return;
     }
     UnicodeString utxt = root->text( outputclass, false, false );
-    // cerr << "untok: '" << utxt << "'" << endl;
-    // UnicodeString txt = root->text( outputclass, true );
-    // cerr << "  tok: '" << txt << "'" << endl;
+    // so get Untokenized text from the children, and set it
     root->settext( folia::UnicodeToUTF8(utxt), outputclass );
   }
 
+  void removeText( folia::FoliaElement *root,
+		   const string& outputclass  ){
+    // remove the textcontent in outputclass of root
+    root->cleartextcontent( outputclass );
+  }
+
+  const string get_language( folia::FoliaElement *f ) {
+    // get the language of this element, if any, don't look up.
+    // we search in ALL possible sets!
+    string st = "";
+    std::set<folia::ElementType> exclude;
+    vector<folia::LangAnnotation*> v
+      = f->select<folia::LangAnnotation>( st, exclude, false );
+    string result;
+    if ( v.size() > 0 ){
+      result = v[0]->cls();
+    }
+    return result;
+  }
+
+  void set_language( folia::FoliaElement* e, const string& lan ){
+    // set or reset the language: append a LangAnnotation child of class 'lan'
+    folia::KWargs args;
+    args["class"] = lan;
+    args["set"] = ISO_SET;
+    folia::LangAnnotation *node = new folia::LangAnnotation( e->doc() );
+    node->setAttributes( args );
+    e->replace( node );
+  }
 
-  void TokenizerClass::tokenizeElement(folia::FoliaElement * element) {
+  void TokenizerClass::tokenizeElement( folia::FoliaElement * element) {
     if ( element->isinstance(folia::Word_t)
 	 || element->isinstance(folia::TextContent_t))
       // shortcut
       return;
     if ( tokDebug >= 2 ){
-      LOG << "[tokenizeElement] Processing FoLiA element " << element->id() << endl;
+      LOG << "[tokenizeElement] Processing FoLiA element " << element->xmltag()
+	  << "(" << element->id() << ")" << endl;
+      LOG << "[tokenizeElement] inputclass=" << inputclass << " outputclass=" << outputclass << endl;
     }
     if ( element->hastext( inputclass ) ) {
       // We have an element which contains text. That's nice
@@ -597,15 +735,36 @@ namespace Tokenizer {
 	}
       }
       // now let's check our language
-      string lan = element->language(); // remember thus recurses upward
-      // to get a language from the node, it's parents OR the doc
-      if ( lan.empty() || default_language == "none" ){
-	lan = "default";
+      string lan;
+      if ( doDetectLang ){
+	lan = get_language( element ); // is there a local element language?
+	if ( lan.empty() ){
+	  // no, so try to detect it!
+	  UnicodeString temp = element->text( inputclass );
+	  temp.toLower();
+	  lan = tc->get_language( folia::UnicodeToUTF8(temp) );
+	  if ( lan.empty() ){
+	    // too bad
+	    lan = "default";
+	  }
+	  else {
+	    if ( tokDebug >= 2 ){
+	      LOG << "[tokenizeElement] textcat found a supported language: " << lan << endl;
+	    }
+	  }
+	}
+      }
+      else {
+	lan = element->language(); // remember thus recurses upward
+	// to get a language from the node, it's parents OR the doc
+	if ( lan.empty() || default_language == "none" ){
+	  lan = "default";
+	}
       }
       auto const it = settings.find(lan);
       if ( it != settings.end() ){
 	if ( tokDebug >= 2 ){
-	  LOG << "[tokenizeElement] Found a supported language! " << lan << endl;
+	  LOG << "[tokenizeElement] Found a supported language: " << lan << endl;
 	}
       }
       else if ( !default_language.empty() ){
@@ -630,12 +789,7 @@ namespace Tokenizer {
 	if ( tokDebug >= 2 ){
 	  LOG << "[tokenizeElement] set language to " << lan << endl;
 	}
-	folia::KWargs args;
-	args["class"] = lan;
-	args["set"] = ISO_SET;
-	folia::LangAnnotation *node = new folia::LangAnnotation( element->doc() );
-	node->setAttributes( args );
-	element->append( node );
+	set_language( element, lan );
       }
       tokenizeSentenceElement( element, lan );
       return;
@@ -647,9 +801,27 @@ namespace Tokenizer {
     for ( size_t i = 0; i < element->size(); i++) {
       tokenizeElement( element->index(i));
     }
+    if ( text_redundancy == "full" ){
+      if ( tokDebug > 0 ) {
+	LOG << "[tokenizeElement] Creating text on " << element->id() << endl;
+      }
+      appendText( element, outputclass );
+    }
+    else if ( text_redundancy == "none" ){
+      if ( tokDebug > 0 ) {
+	LOG << "[tokenizeElement] Removing text from: " << element->id() << endl;
+      }
+      removeText( element, outputclass );
+    }
     return;
   }
 
+  int split_nl( const UnicodeString& line,
+		   vector<UnicodeString>& parts ){
+    static UnicodeRegexMatcher nl_split( "\\n", "newline_splitter" );
+    return nl_split.split( line, parts );
+  }
+
   void TokenizerClass::tokenizeSentenceElement( folia::FoliaElement *element,
 						const string& lang ){
     folia::Document *doc = element->doc();
@@ -662,7 +834,7 @@ namespace Tokenizer {
 		    "annotator='ucto', annotatortype='auto', datetime='now()'" );
     }
     if  ( tokDebug > 0 ){
-      cerr << "tokenize sentence element: " << element->id() << endl;
+      LOG << "[tokenizeSentenceElement] " << element->id() << endl;
     }
     UnicodeString line = element->stricttext( inputclass );
     if ( line.isEmpty() ){
@@ -679,17 +851,32 @@ namespace Tokenizer {
       passthruLine( line, bos );
     }
     else {
-      tokenizeLine( line, lang );
+      // folia may encode newlines. These should be converted to <br/> nodes
+      // but Linebreak and newline handling is very dangerous and complicated
+      // so for now is is disabled!
+      vector<UnicodeString> parts;
+      parts.push_back( line ); // just one part
+      //split_nl( line, parts ); // disabled multipart
+      for ( auto const& l : parts ){
+	if ( tokDebug >= 1 ){
+	  LOG << "[tokenizeSentenceElement] tokenize part: " << l << endl;
+	}
+	tokenizeLine( l, lang, element->id() );
+	if ( &l != &parts.back() ){
+	  // append '<br'>
+	  Token T( "type_linebreak", "\n", LINEBREAK, "" );
+	  if ( tokDebug >= 1 ){
+	    LOG << "[tokenizeSentenceElement] added LINEBREAK token " << endl;
+	  }
+	  tokens.push_back( T );
+	}
+      }
     }
     //ignore EOL data, we have by definition only one sentence:
     int numS = countSentences(true); //force buffer to empty
     vector<Token> outputTokens;
-    for (int i = 0; i < numS; i++) {
-      vector<Token> v = getSentence( i );
-      outputTokens.insert( outputTokens.end(), v.begin(), v.end() );
-    }
+    extractSentencesAndFlush( numS, outputTokens, lang );
     outputTokensXML( element, outputTokens, 0 );
-    flushSentences( numS, lang );
   }
 
   void TokenizerClass::outputTokensDoc_init( folia::Document& doc ) const {
@@ -707,25 +894,6 @@ namespace Tokenizer {
     doc.append( text );
   }
 
-  void TokenizerClass::outputTokensDoc( folia::Document& doc,
-					const vector<Token>& tv ) const {
-    folia::FoliaElement *root = doc.doc()->index(0);
-    string lan = doc.doc()->language();
-    if ( lan.empty() ){
-      if ( tokDebug >= 1 ){
-	LOG << "[outputTokensDoc] SET docuemnt language="
-	    << default_language << endl;
-      }
-      doc.set_metadata( "language", default_language );
-    }
-    else {
-      if ( tokDebug >= 2 ){
-	LOG << "[outputTokensDoc] Document has language " << lan << endl;
-      }
-    }
-    outputTokensXML(root, tv );
-  }
-
   int TokenizerClass::outputTokensXML( folia::FoliaElement *root,
 				       const vector<Token>& tv,
 				       int parCount ) const {
@@ -741,11 +909,12 @@ namespace Tokenizer {
     if ( root->isinstance( folia::Sentence_t ) ){
       root_is_sentence = true;
     }
-    else if ( root->isinstance( folia::Paragraph_t )
+    else if ( root->isinstance( folia::Paragraph_t ) //TODO: can't we do this smarter?
 	      || root->isinstance( folia::Head_t )
 	      || root->isinstance( folia::Note_t )
 	      || root->isinstance( folia::ListItem_t )
 	      || root->isinstance( folia::Part_t )
+	      || root->isinstance( folia::Utterance_t )
 	      || root->isinstance( folia::Caption_t )
 	      || root->isinstance( folia::Event_t ) ){
       root_is_structure_element = true;
@@ -753,16 +922,27 @@ namespace Tokenizer {
 
     bool in_paragraph = false;
     for ( const auto& token : tv ) {
-      if ( ( !root_is_structure_element && !root_is_sentence )
+      if ( ( !root_is_structure_element && !root_is_sentence ) //TODO: instead of !root_is_structurel check if is_structure and accepts paragraphs?
 	   &&
 	   ( (token.role & NEWPARAGRAPH) || !in_paragraph ) ) {
-	if ( in_paragraph ){
-	  appendText( root, outputclass );
-	  root = root->parent();
-	}
 	if ( tokDebug > 0 ) {
 	  LOG << "[outputTokensXML] Creating paragraph" << endl;
 	}
+	if ( in_paragraph ){
+	  if ( text_redundancy == "full" ){
+	    if ( tokDebug > 0 ) {
+	      LOG << "[outputTokensXML] Creating text on root: " << root->id() << endl;
+	    }
+	    appendText( root, outputclass );
+	  }
+	  else if ( text_redundancy == "none" ){
+	    if ( tokDebug > 0 ) {
+	      LOG << "[outputTokensXML] Removing text from root: " << root->id() << endl;
+	    }
+	    removeText( root, outputclass );
+	  }
+	  root = root->parent();
+	}
 	folia::KWargs args;
 	args["id"] = root->doc()->id() + ".p." +  toString(++parCount);
 	folia::FoliaElement *p = new folia::Paragraph( args, root->doc() );
@@ -782,12 +962,27 @@ namespace Tokenizer {
 	  LOG << "[outputTokensXML] back to " << root->classname() << endl;
 	}
       }
-      if (( token.role & BEGINOFSENTENCE) && (!root_is_sentence)) {
+      if ( ( token.role & LINEBREAK) ){
+	if  (tokDebug > 0) {
+	  LOG << "[outputTokensXML] LINEBREAK!" << endl;
+	}
+	folia::FoliaElement *lb = new folia::Linebreak();
+	root->append( lb );
+	if  (tokDebug > 0){
+	  LOG << "[outputTokensXML] back to " << root->classname() << endl;
+	}
+      }
+      if ( ( token.role & BEGINOFSENTENCE)
+	   && !root_is_sentence
+	   && !root->isinstance( folia::Utterance_t ) ) {
 	folia::KWargs args;
-	if ( root->id().empty() )
-	  args["generate_id"] = root->parent()->id();
-	else
-	  args["generate_id"] = root->id();
+	string id = root->id();
+	if ( id.empty() ){
+	  id = root->parent()->id();
+	}
+	if ( !id.empty() ){
+	  args["generate_id"] = id;
+	}
 	if ( tokDebug > 0 ) {
 	  LOG << "[outputTokensXML] Creating sentence in '"
 			  << args["generate_id"] << "'" << endl;
@@ -807,62 +1002,86 @@ namespace Tokenizer {
 	  }
 	  s->doc()->declare( folia::AnnotationType::LANG,
 			     ISO_SET, "annotator='ucto'" );
-	  folia::KWargs args;
-	  args["class"] = tok_lan;
-	  args["set"] = ISO_SET;
-	  folia::LangAnnotation *node = new folia::LangAnnotation( s->doc() );
-	  node->setAttributes( args );
-	  s->append( node );
+	  set_language( s, tok_lan );
 	}
 	root = s;
 	lastS = root;
       }
-      if  (tokDebug > 0) {
-	LOG << "[outputTokensXML] Creating word element for " << token.us << endl;
-      }
-      folia::KWargs args;
-      args["generate_id"] = lastS->id();
-      args["class"] = folia::UnicodeToUTF8( token.type );
-      if ( passthru ){
-	args["set"] = "passthru";
-      }
-      else {
-	auto it = settings.find(token.lc);
-	if ( it == settings.end() ){
-	  it = settings.find("default");
+      if ( !(token.role & LINEBREAK) ){
+	if  (tokDebug > 0) {
+	  LOG << "[outputTokensXML] Creating word element for " << token.us << endl;
+	}
+	folia::KWargs args;
+	string id = lastS->id();
+	if ( id.empty() ){
+	  id = lastS->parent()->id();
+	}
+	if ( !id.empty() ){
+	  args["generate_id"] = id;
+	}
+	args["class"] = folia::UnicodeToUTF8( token.type );
+	if ( passthru ){
+	  args["set"] = "passthru";
+	}
+	else {
+	  auto it = settings.find(token.lc);
+	  if ( it == settings.end() ){
+	    it = settings.find("default");
+	  }
+	  args["set"] = it->second->set_file;
+	}
+	if ( token.role & NOSPACE) {
+	  args["space"]= "no";
+	}
+	if ( outputclass != inputclass ){
+	  args["textclass"] = outputclass;
+	}
+	folia::FoliaElement *w = new folia::Word( args, root->doc() );
+	root->append( w );
+	UnicodeString out = token.us;
+	if (lowercase) {
+	  out.toLower();
+	}
+	else if (uppercase) {
+	  out.toUpper();
+	}
+	w->settext( folia::UnicodeToUTF8( out ), outputclass );
+	if ( tokDebug > 1 ) {
+	  LOG << "created " << w << " text= " <<  token.us  << "(" << outputclass << ")" << endl;
 	}
-	args["set"] = it->second->set_file;
-      }
-      if ( token.role & NOSPACE) {
-	args["space"]= "no";
-      }
-      folia::FoliaElement *w = new folia::Word( args, root->doc() );
-      UnicodeString out = token.us;
-      if (lowercase) {
-	out.toLower();
-      }
-      else if (uppercase) {
-	out.toUpper();
       }
-      w->settext( folia::UnicodeToUTF8( out ), outputclass );
-      //      LOG << "created " << w << " text= " <<  token.us << endl;
-      root->append( w );
       if ( token.role & BEGINQUOTE) {
 	if  (tokDebug > 0) {
 	  LOG << "[outputTokensXML] Creating quote element" << endl;
 	}
-	folia::FoliaElement *q = new folia::Quote( folia::getArgs( "generate_id='" + root->id() + "'"),
-						    root->doc() );
+	folia::KWargs args;
+	string id = root->id();
+	if ( id.empty() ){
+	  id = root->parent()->id();
+	}
+	if ( !id.empty() ){
+	  args["generate_id"] = id;
+	}
+	folia::FoliaElement *q = new folia::Quote( args, root->doc() );
 	//	LOG << "created " << q << endl;
 	root->append( q );
 	root = q;
 	quotelevel++;
       }
-      if ( ( token.role & ENDOFSENTENCE) && (!root_is_sentence) ) {
+      if ( ( token.role & ENDOFSENTENCE ) && (!root_is_sentence) && (!root->isinstance(folia::Utterance_t))) {
 	if  (tokDebug > 0) {
 	  LOG << "[outputTokensXML] End of sentence" << endl;
 	}
-	appendText( root, outputclass );
+	if ( text_redundancy == "full" ){
+	  appendText( root, outputclass );
+	}
+	else if ( text_redundancy == "none" ){
+	  removeText( root, outputclass );
+	}
+	if ( token.role & LINEBREAK ){
+	  folia::FoliaElement *lb = new folia::Linebreak();
+	  root->append( lb );
+	}
 	root = root->parent();
 	lastS = root;
 	if  (tokDebug > 0){
@@ -872,7 +1091,21 @@ namespace Tokenizer {
       in_paragraph = true;
     }
     if ( tv.size() > 0 ){
-      appendText( root, outputclass );
+      if ( text_redundancy == "full" ){
+	if ( tokDebug > 0 ) {
+	  LOG << "[outputTokensXML] Creating text on root: " << root->id() << endl;
+	}
+	appendText( root, outputclass );
+      }
+      else if ( text_redundancy == "none" ){
+	if ( tokDebug > 0 ) {
+	  LOG << "[outputTokensXML] Removing text from root: " << root->id() << endl;
+	}
+	removeText( root, outputclass );
+      }
+    }
+    if ( tokDebug > 0 ) {
+      LOG << "[outputTokensXML] Done. parCount= " << parCount << endl;
     }
     return parCount;
   }
@@ -949,7 +1182,7 @@ namespace Tokenizer {
     }
   }
 
-  int TokenizerClass::countSentences(bool forceentirebuffer) {
+  int TokenizerClass::countSentences( bool forceentirebuffer ) {
     //Return the number of *completed* sentences in the token buffer
 
     //Performs  extra sanity checks at the same time! Making sure
@@ -1053,14 +1286,21 @@ namespace Tokenizer {
     short quotelevel = 0;
     size_t begin = 0;
     size_t end = 0;
-    for ( int i = 0; i < size; i++) {
-      if (tokens[i].role & NEWPARAGRAPH) quotelevel = 0;
-      if (tokens[i].role & ENDQUOTE) quotelevel--;
-      if ((tokens[i].role & BEGINOFSENTENCE) && (quotelevel == 0)) {
+    for ( int i = 0; i < size; ++i ) {
+      if (tokens[i].role & NEWPARAGRAPH) {
+	quotelevel = 0;
+      }
+      else if (tokens[i].role & ENDQUOTE) {
+	--quotelevel;
+      }
+      if ( (tokens[i].role & BEGINOFSENTENCE)
+	   && (quotelevel == 0)) {
 	begin = i;
       }
       //FBK: QUOTELEVEL GOES UP BEFORE begin IS UPDATED... RESULTS IN DUPLICATE OUTPUT
-      if (tokens[i].role & BEGINQUOTE) quotelevel++;
+      if (tokens[i].role & BEGINQUOTE) {
+	++quotelevel;
+      }
 
       if ((tokens[i].role & ENDOFSENTENCE) && (quotelevel == 0)) {
 	if (count == index) {
@@ -1074,7 +1314,7 @@ namespace Tokenizer {
 	  }
 	  return outToks;
 	}
-	count++;
+	++count;
       }
     }
     throw uRangeError( "No sentence exists with the specified index: "
@@ -1654,7 +1894,13 @@ namespace Tokenizer {
   int TokenizerClass::tokenizeLine( const string& s,
 				    const string& lang ){
     UnicodeString uinputstring = convert( s, inputEncoding );
-    return tokenizeLine( uinputstring, lang );
+    return tokenizeLine( uinputstring, lang, "" );
+  }
+
+  // UnicodeString wrapper
+  int TokenizerClass::tokenizeLine( const UnicodeString& u,
+				    const string& lang ){
+    return tokenizeLine( u, lang, "" );
   }
 
   bool u_isemo( UChar32 c ){
@@ -1769,7 +2015,8 @@ namespace Tokenizer {
   }
 
   int TokenizerClass::tokenizeLine( const UnicodeString& originput,
-				    const string& _lang ){
+				    const string& _lang,
+				    const string& id ){
     string lang = _lang;
     if ( lang.empty() ){
       lang = "default";
@@ -1791,7 +2038,14 @@ namespace Tokenizer {
       input = settings[lang]->filter.filter( input );
     }
     if ( input.isBogus() ){ //only tokenize valid input
-      *theErrLog << "ERROR: Invalid UTF-8 in line!:" << input << endl;
+      if ( id.empty() ){
+	LOG << "ERROR: Invalid UTF-8 in line:" << linenum << endl
+	    << "   '" << input << "'" << endl;
+      }
+      else {
+	LOG << "ERROR: Invalid UTF-8 in element:" << id << endl
+	    << "   '" << input << "'" << endl;
+      }
       return 0;
     }
     int32_t len = input.countChar32();
@@ -1811,16 +2065,18 @@ namespace Tokenizer {
     UnicodeString word;
     StringCharacterIterator sit(input);
     long int i = 0;
+    long int tok_size = 0;
     while ( sit.hasNext() ){
       UChar32 c = sit.current32();
       if ( tokDebug > 8 ){
 	UnicodeString s = c;
 	int8_t charT = u_charType( c );
 	LOG << "examine character: " << s << " type= "
-			<< toString( charT  ) << endl;
+	    << toString( charT  ) << endl;
       }
       if (reset) { //reset values for new word
 	reset = false;
+	tok_size = 0;
 	if (!u_isspace(c))
 	  word = c;
 	else
@@ -1912,6 +2168,22 @@ namespace Tokenizer {
       }
       sit.next32();
       ++i;
+      ++tok_size;
+      if ( tok_size > 2500 ){
+	if ( id.empty() ){
+	  LOG << "Ridiculously long word/token (over 2500 characters) detected "
+	      << "in line: " << linenum << ". Skipped ..." << endl;
+	  LOG << "The line starts with " << UnicodeString( word, 0, 75 )
+	      << "..." << endl;
+	}
+	else {
+	  LOG << "Ridiculously long word/token (over 2500 characters) detected "
+	      << "in element: " << id << ". Skipped ..." << endl;
+	  LOG << "The text starts with " << UnicodeString( word, 0, 75 )
+	      << "..." << endl;
+	}
+	return 0;
+      }
     }
     int numNewTokens = tokens.size() - begintokencount;
     if ( numNewTokens > 0 ){
@@ -2113,7 +2385,7 @@ namespace Tokenizer {
 	  break;
 	}
       }
-      if ( ! a_rule_matched ){
+      if ( !a_rule_matched ){
 	// no rule matched
 	if ( tokDebug >=4 ){
 	  LOG << "\tthere's no match at all" << endl;
@@ -2174,7 +2446,7 @@ namespace Tokenizer {
       }
     }
     if ( settings.empty() ){
-      cerr << "No useful settingsfile(s) could be found." << endl;
+      cerr << "ucto: No useful settingsfile(s) could be found." << endl;
       return false;
     }
     return true;
diff --git a/src/ucto.cxx b/src/ucto.cxx
index 3d2ad72..05fb844 100644
--- a/src/ucto.cxx
+++ b/src/ucto.cxx
@@ -45,38 +45,53 @@ using namespace std;
 using namespace Tokenizer;
 
 void usage(){
+  set<string> languages = Setting::installed_languages();
   cerr << "Usage: " << endl;
   cerr << "\tucto [[options]] [input-file] [[output-file]]"  << endl
        << "Options:" << endl
-       << "\t-c <configfile>  - Explicitly specify a configuration file" << endl
-       << "\t-d <value>       - set debug level" << endl
-       << "\t-e <string>      - set input encoding (default UTF8)" << endl
-       << "\t-N <string>      - set output normalization (default NFC)" << endl
-       << "\t-f               - Disable filtering of special characters" << endl
-       << "\t-h or --help     - this message" << endl
-       << "\t-L <language>    - Automatically selects a configuration file by language code. (default 'generic')" << endl
-       << "\t-l               - Convert to all lowercase" << endl
-       << "\t-u               - Convert to all uppercase" << endl
-       << "\t-n               - One sentence per line (output)" << endl
-       << "\t-m               - One sentence per line (input)" << endl
-       << "\t-v               - Verbose mode" << endl
-       << "\t-s <string>      - End-of-Sentence marker (default: <utt>)" << endl
-       << "\t--passthru       - Don't tokenize, but perform input decoding and simple token role detection" << endl
+       << "\t-c <configfile>   - Explicitly specify a configuration file" << endl
+       << "\t-d <value>        - set debug level" << endl
+       << "\t-e <string>       - set input encoding (default UTF8)" << endl
+       << "\t-N <string>       - set output normalization (default NFC)" << endl
+       << "\t--filter=[YES|NO] - Disable filtering of special characters" << endl
+       << "\t-f                - OBSOLETE. use --filter=NO" << endl
+       << "\t-h or --help      - this message" << endl
+       << "\t-L <language>     - Automatically selects a configuration file by language code." << endl
+       << "\t                  - Available Languages:" << endl
+       << "\t                    ";
+  for( const auto l : languages ){
+    cerr << l << ",";
+  }
+  cerr << endl;
+  cerr << "\t-l                - Convert to all lowercase" << endl
+       << "\t-u                - Convert to all uppercase" << endl
+       << "\t-n                - One sentence per line (output)" << endl
+       << "\t-m                - One sentence per line (input)" << endl
+       << "\t-v                - Verbose mode" << endl
+       << "\t-s <string>       - End-of-Sentence marker (default: <utt>)" << endl
+       << "\t--passthru        - Don't tokenize, but perform input decoding and simple token role detection" << endl
        << "\t--normalize=<class1>,class2>,... " << endl
-       << "\t                 - For class1, class2, etc. output the class tokens instead of the tokens itself." << endl
-       << "\t--filterpunct    - remove all punctuation from the output" << endl
-       << "\t--detectlanguages=<lang1,lang2,..langn> - try to detect languages. Default = 'lang1'" << endl
-       << "\t-P               - Disable paragraph detection" << endl
-       << "\t-S               - Disable sentence detection!" << endl
-       << "\t-Q               - Enable quote detection (experimental)" << endl
-       << "\t-V or --version  - Show version information" << endl
-       << "\t-x <DocID>       - Output FoLiA XML, use the specified Document ID (obsolete)" << endl
-       << "\t-F               - Input file is in FoLiA XML. All untokenised sentences will be tokenised." << endl
-       << "\t-X               - Output FoLiA XML, use the Document ID specified with --id=" << endl
-       << "\t--id <DocID>     - use the specified Document ID to label the FoLia doc." << endl
-       << "\t--textclass <class> - use the specified class to search text in the FoLia doc. (deprecated. use --inputclass)" << endl
-       << "\t--inputclass <class> - use the specified class to search text in the FoLia doc." << endl
-       << "\t--outputclass <class> - use the specified class to output text in the FoLia doc. (default is 'current'. changing this is dangerous!)" << endl
+       << "\t                  - For class1, class2, etc. output the class tokens instead of the tokens itself." << endl
+       << "\t-T or --textredundancy=[full|minimal|none]  - set text redundancy level for text nodes in FoLiA output: " << endl
+       << "\t                    'full' - add text to all levels: <p> <s> <w> etc." << endl
+       << "\t                    'minimal' - don't introduce text on higher levels, but retain what is already there." << endl
+       << "\t                    'none' - only introduce text on <w>, AND remove all text from higher levels" << endl
+       << "\t--filterpunct     - remove all punctuation from the output" << endl
+       << "\t--uselanguages=<lang1,lang2,..langn> - only tokenize strings in these languages. Default = 'lang1'" << endl
+       << "\t--detectlanguages=<lang1,lang2,..langn> - try to assignlanguages before using. Default = 'lang1'" << endl
+       << "\t-P                - Disable paragraph detection" << endl
+       << "\t-S                - Disable sentence detection!" << endl
+       << "\t-Q                - Enable quote detection (experimental)" << endl
+       << "\t-V or --version   - Show version information" << endl
+       << "\t-x <DocID>        - Output FoLiA XML, use the specified Document ID (obsolete)" << endl
+       << "\t-F                - Input file is in FoLiA XML. All untokenised sentences will be tokenised." << endl
+       << "\t                    -F is automatically set when inputfile has extension '.xml'" << endl
+       << "\t-X                - Output FoLiA XML, use the Document ID specified with --id=" << endl
+       << "\t--id <DocID>      - use the specified Document ID to label the FoLia doc." << endl
+       << "                      -X is automatically set when inputfile has extension '.xml'" << endl
+       << "\t--inputclass <class> - use the specified class to search text in the FoLia doc.(default is 'current')" << endl
+       << "\t--outputclass <class> - use the specified class to output text in the FoLia doc. (default is 'current')" << endl
+       << "\t--textclass <class> - use the specified class for both input and output of text in the FoLia doc. (default is 'current'). Implies --filter=NO." << endl
        << "\t                  (-x and -F disable usage of most other options: -nPQVsS)" << endl;
 }
 
@@ -88,18 +103,20 @@ int main( int argc, char *argv[] ){
   bool sentenceperlineinput = false;
   bool paragraphdetection = true;
   bool quotedetection = false;
+  bool do_language_detect = false;
   bool dofiltering = true;
   bool dopunctfilter = false;
   bool splitsentences = true;
   bool xmlin = false;
   bool xmlout = false;
   bool verbose = false;
+  string redundancy = "minimal";
   string eosmarker = "<utt>";
   string docid = "untitleddoc";
-  string inputclass = "current";
-  string outputclass = "current";
   string normalization = "NFC";
   string inputEncoding = "UTF-8";
+  string inputclass  = "current";
+  string outputclass = "current";
   vector<string> language_list;
   string cfile;
   string ifile;
@@ -109,8 +126,8 @@ int main( int argc, char *argv[] ){
   string norm_set_string;
 
   try {
-    TiCC::CL_Options Opts( "d:e:fhlPQunmN:vVSL:c:s:x:FX",
-			   "filterpunct,passthru,textclass:,inputclass:,outputclass:,normalize:,id:,version,help,detectlanguages:");
+    TiCC::CL_Options Opts( "d:e:fhlPQunmN:vVSL:c:s:x:FXT:",
+			   "filter:,filterpunct,passthru,textclass:,inputclass:,outputclass:,normalize:,id:,version,help,detectlanguages:,uselanguages:,textredundancy:");
     Opts.init(argc, argv );
     if ( Opts.extract( 'h' )
 	 || Opts.extract( "help" ) ){
@@ -120,13 +137,13 @@ int main( int argc, char *argv[] ){
     if ( Opts.extract( 'V' ) ||
 	 Opts.extract( "version" ) ){
       cout << "Ucto - Unicode Tokenizer - version " << Version() << endl
-	   << "(c) ILK 2009 - 2014, Induction of Linguistic Knowledge Research Group, Tilburg University" << endl
+	   << "(c) CLST 2015 - 2017, Centre for Language and Speech Technology, Radboud University Nijmegen" << endl
+	   << "(c) ILK 2009 - 2015, Induction of Linguistic Knowledge Research Group, Tilburg University" << endl
 	   << "Licensed under the GNU General Public License v3" << endl;
       cout << "based on [" << folia::VersionName() << "]" << endl;
       return EXIT_SUCCESS;
     }
     Opts.extract('e', inputEncoding );
-    dofiltering = !Opts.extract( 'f' );
     dopunctfilter = Opts.extract( "filterpunct" );
     paragraphdetection = !Opts.extract( 'P' );
     splitsentences = !Opts.extract( 'S' );
@@ -137,6 +154,13 @@ int main( int argc, char *argv[] ){
     tolowercase = Opts.extract( 'l' );
     sentenceperlineoutput = Opts.extract( 'n' );
     sentenceperlineinput = Opts.extract( 'm' );
+    Opts.extract( 'T', redundancy );
+    Opts.extract( "textredundancy", redundancy );
+    if ( redundancy != "full"
+	 && redundancy != "minimal"
+	 && redundancy != "none" ){
+      throw TiCC::OptionError( "unknown textredundancy level: " + redundancy );
+    }
     Opts.extract( 'N', normalization );
     verbose = Opts.extract( 'v' );
     if ( Opts.extract( 'x', docid ) ){
@@ -153,9 +177,38 @@ int main( int argc, char *argv[] ){
       Opts.extract( "id", docid );
     }
     passThru = Opts.extract( "passthru" );
-    Opts.extract( "textclass", inputclass );
+    string textclass;
+    Opts.extract( "textclass", textclass );
     Opts.extract( "inputclass", inputclass );
     Opts.extract( "outputclass", outputclass );
+    if ( !textclass.empty() ){
+      if ( inputclass != "current" ){
+	throw TiCC::OptionError( "--textclass conflicts with --inputclass" );
+      }
+      if ( outputclass != "current" ){
+	throw TiCC::OptionError( "--textclass conflicts with --outputclass");
+      }
+      inputclass = textclass;
+      outputclass = textclass;
+    }
+    if ( Opts.extract( 'f' ) ){
+      cerr << "ucto: The -f option is used.  Please consider using --filter=NO" << endl;
+      dofiltering = false;
+    }
+    string value;
+    if ( Opts.extract( "filter", value ) ){
+      bool result;
+      if ( !TiCC::stringTo( value, result ) ){
+	throw TiCC::OptionError( "illegal value for '--filter' option. (boolean expected)" );
+      }
+      dofiltering = result;
+    }
+    if ( dofiltering && xmlin && outputclass == inputclass ){
+      // we cannot mangle the original inputclass, so disable filtering
+      cerr << "ucto: --filter=NO is automatically set. inputclass equals outputclass!"
+	   << endl;
+      dofiltering = false;
+    }
     if ( xmlin && outputclass.empty() ){
       if ( dopunctfilter ){
 	throw TiCC::OptionError( "--outputclass required for --filterpunct on FoLiA input ");
@@ -167,7 +220,6 @@ int main( int argc, char *argv[] ){
 	throw TiCC::OptionError( "--outputclass required for -l on FoLiA input ");
       }
     }
-    string value;
     if ( Opts.extract('d', value ) ){
       if ( !TiCC::stringTo(value,debug) ){
 	throw TiCC::OptionError( "invalid value for -d: " + value );
@@ -175,30 +227,44 @@ int main( int argc, char *argv[] ){
     }
     if ( Opts.is_present('L') ) {
       if ( Opts.is_present('c') ){
-	cerr << "Error: -L and -c options conflict. Use only one of them." << endl;
-	return EXIT_FAILURE;
+	throw TiCC::OptionError( "-L and -c options conflict. Use only one of these." );
       }
       else if ( Opts.is_present( "detectlanguages" ) ){
-	cerr << "Error: -L and --detectlanguages options conflict. Use only one of them." << endl;
-	return EXIT_FAILURE;
+	throw TiCC::OptionError( "-L and --detectlanguages options conflict. Use only one of these." );
+      }
+      else if ( Opts.is_present( "uselanguages" ) ){
+	throw TiCC::OptionError( "-L and --uselanguages options conflict. Use only one of these." );
       }
     }
-    else if ( Opts.is_present( 'c' )
-	      && Opts.is_present( "detectlanguages" ) ){
-      cerr << "Error: -c and --detectlanguages options conflict. Use only one of them." << endl;
-      return EXIT_FAILURE;
+    else if ( Opts.is_present( 'c' ) ){
+      if ( Opts.is_present( "detectlanguages" ) ){
+	throw TiCC::OptionError( "-c and --detectlanguages options conflict. Use only one of these" );
+      }
+      else if ( Opts.is_present( "uselanguages" ) ){
+	throw TiCC::OptionError( "-L and --uselanguages options conflict. Use only one of these." );
+      }
+    }
+    if ( Opts.is_present( "detectlanguages" ) &&
+	 Opts.is_present( "uselanguages" ) ){
+      throw TiCC::OptionError( "--detectlanguages and --uselanguages options conflict. Use only one of these." );
     }
-
     Opts.extract( 'c', c_file );
+
     string languages;
     Opts.extract( "detectlanguages", languages );
-    bool do_language_detect = !languages.empty();
-    if ( do_language_detect ){
+    if ( languages.empty() ){
+      Opts.extract( "uselanguages", languages );
+    }
+    else {
+      do_language_detect = true;
+    }
+    if ( !languages.empty() ){
       if ( TiCC::split_at( languages, language_list, "," ) < 1 ){
 	throw TiCC::OptionError( "invalid language list: " + languages );
       }
     }
     else {
+      // so nu --detectlanguages ot --uselanguages
       string language;
       if ( Opts.extract('L', language ) ){
 	// support some backward compatability to old ISO 639-1 codes
@@ -248,56 +314,113 @@ int main( int argc, char *argv[] ){
     vector<string> files = Opts.getMassOpts();
     if ( files.size() > 0 ){
       ifile = files[0];
+      if ( TiCC::match_back( ifile, ".xml" ) ){
+	xmlin = true;
+      }
     }
-    if ( files.size() > 1 ){
+    if ( files.size() == 2 ){
       ofile = files[1];
+      if ( TiCC::match_back( ofile, ".xml" ) ){
+	xmlout = true;
+      }
+    }
+    if ( files.size() > 2 ){
+      cerr << "found additional arguments on the commandline: " << files[2]
+	   << "...." << endl;
     }
+
   }
   catch( const TiCC::OptionError& e ){
     cerr << "ucto: " << e.what() << endl;
     usage();
     return EXIT_FAILURE;
   }
-
   if ( !passThru ){
+    set<string> available_languages = Setting::installed_languages();
     if ( !c_file.empty() ){
       cfile = c_file;
     }
     else if ( language_list.empty() ){
-      cfile = "tokconfig-generic";
+      cerr << "ucto: missing a language specification (-L or --detectlanguages or --uselanguages option)" << endl;
+      if ( available_languages.size() == 1
+	   && *available_languages.begin() == "generic" ){
+	cerr << "ucto: The uctodata package seems not to be installed." << endl;
+	cerr << "ucto: You can use '-L generic' to run a simple default tokenizer."
+	     << endl;
+	cerr << "ucto: Installing uctodata is highly recommended." << endl;
+      }
+      else {
+	cerr << "ucto: Available Languages: ";
+	for( const auto& l : available_languages ){
+	  cerr << l << ",";
+	}
+	cerr << endl;
+      }
+      return EXIT_FAILURE;
+    }
+    else {
+      for ( const auto& l : language_list ){
+	if ( available_languages.find(l) == available_languages.end() ){
+	  cerr << "ucto: unsupported language '" << l << "'" << endl;
+	  if ( available_languages.size() == 1
+	       && *available_languages.begin() == "generic" ){
+	    cerr << "ucto: The uctodata package seems not to be installed." << endl;
+	    cerr << "ucto: You can use '-L generic' to run a simple default tokenizer."
+		 << endl;
+	    cerr << "ucto: Installing uctodata is highly recommended." << endl;
+	  }
+	  else {
+	    cerr << "ucto: Available Languages: ";
+	    for( const auto& l : available_languages ){
+	      cerr << l << ",";
+	    }
+	    cerr << endl;
+	  }
+	  return EXIT_FAILURE;
+	}
+      }
     }
   }
 
   if ((!ifile.empty()) && (ifile == ofile)) {
-    cerr << "Error: Output file equals input file! Courageously refusing to start..."  << endl;
+    cerr << "ucto: Output file equals input file! Courageously refusing to start..."  << endl;
     return EXIT_FAILURE;
   }
 
-  if ( !passThru ){
-    cerr << "configfile = " << cfile << endl;
-  }
-  cerr << "inputfile = "  << ifile << endl;
-  cerr << "outputfile = " << ofile << endl;
+  cerr << "ucto: inputfile = "  << ifile << endl;
+  cerr << "ucto: outputfile = " << ofile << endl;
 
   istream *IN = 0;
   if (!xmlin) {
-    if ( ifile.empty() )
+    if ( ifile.empty() ){
       IN = &cin;
+    }
     else {
       IN = new ifstream( ifile );
       if ( !IN || !IN->good() ){
-	cerr << "Error: problems opening inputfile " << ifile << endl;
-	cerr << "Courageously refusing to start..."  << endl;
+	cerr << "ucto: problems opening inputfile " << ifile << endl;
+	cerr << "ucto: Courageously refusing to start..."  << endl;
+	delete IN;
 	return EXIT_FAILURE;
       }
     }
   }
 
   ostream *OUT = 0;
-  if ( ofile.empty() )
+  if ( ofile.empty() ){
     OUT = &cout;
+  }
   else {
     OUT = new ofstream( ofile );
+    if ( !OUT || !OUT->good() ){
+      cerr << "ucto: problems opening outputfile " << ofile << endl;
+      cerr << "ucto: Courageously refusing to start..."  << endl;
+      delete OUT;
+      if ( IN != &cin ){
+	delete IN;
+      }
+      return EXIT_FAILURE;
+    }
   }
 
   try {
@@ -309,15 +432,24 @@ int main( int argc, char *argv[] ){
     }
     else {
       // init exept for passthru mode
-      if ( !cfile.empty() ){
-	if ( !tokenizer.init( cfile ) ){
-	  return EXIT_FAILURE;
+      if ( !cfile.empty()
+	   && !tokenizer.init( cfile ) ){
+	if ( IN != &cin ){
+	  delete IN;
 	}
+	if ( OUT != &cout ){
+	  delete OUT;
+	}
+	return EXIT_FAILURE;
       }
-      else {
-	if ( !tokenizer.init( language_list ) ){
-	  return EXIT_FAILURE;
+      else if ( !tokenizer.init( language_list ) ){
+	if ( IN != &cin ){
+	  delete IN;
 	}
+	if ( OUT != &cout ){
+	  delete OUT;
+	}
+	return EXIT_FAILURE;
       }
     }
 
@@ -334,11 +466,13 @@ int main( int argc, char *argv[] ){
     tokenizer.setNormalization( normalization );
     tokenizer.setInputEncoding( inputEncoding );
     tokenizer.setFiltering(dofiltering);
+    tokenizer.setLangDetection(do_language_detect);
     tokenizer.setPunctFilter(dopunctfilter);
     tokenizer.setInputClass(inputclass);
     tokenizer.setOutputClass(outputclass);
     tokenizer.setXMLOutput(xmlout, docid);
     tokenizer.setXMLInput(xmlin);
+    tokenizer.setTextRedundancy(redundancy);
 
     if (xmlin) {
       folia::Document doc;
@@ -354,7 +488,7 @@ int main( int argc, char *argv[] ){
     }
   }
   catch ( exception &e ){
-    cerr << e.what() << endl;
+    cerr << "ucto: " << e.what() << endl;
     return EXIT_FAILURE;
   }
 
diff --git a/src/unicode.cxx b/src/unicode.cxx
index 72a25a9..e0d5d81 100644
--- a/src/unicode.cxx
+++ b/src/unicode.cxx
@@ -172,10 +172,10 @@ namespace Tokenizer {
     return true;
   }
 
-  class uConfigError: public std::invalid_argument {
+  class uRegexError: public std::invalid_argument {
   public:
-    uConfigError( const string& s ): invalid_argument( "ucto: config file:" + s ){};
-    uConfigError( const UnicodeString& us ): invalid_argument( "ucto: config file:" + folia::UnicodeToUTF8(us) ){};
+    explicit uRegexError( const string& s ): invalid_argument( "Invalid regular expression: " + s ){};
+    explicit uRegexError( const UnicodeString& us ): invalid_argument( "Invalid regular expression: " + folia::UnicodeToUTF8(us) ){};
   };
 
 
@@ -196,21 +196,20 @@ namespace Tokenizer {
       string spat = folia::UnicodeToUTF8(pat);
       failString = folia::UnicodeToUTF8(_name);
       if ( errorInfo.offset >0 ){
-	failString += " Invalid regular expression at position " + TiCC::toString( errorInfo.offset ) + "\n";
+	failString += " at position " + TiCC::toString( errorInfo.offset ) + "\n";
 	UnicodeString pat1 = UnicodeString( pat, 0, errorInfo.offset -1 );
 	failString += folia::UnicodeToUTF8(pat1) + " <== HERE\n";
       }
       else {
-	failString += " Invalid regular expression '" + spat + "' ";
+	failString += "'" + spat + "' ";
       }
-      throw uConfigError(failString);
+      throw uRegexError(failString);
     }
     else {
       matcher = pattern->matcher( u_stat );
       if (U_FAILURE(u_stat)){
-	failString = "unable to create PatterMatcher with pattern '" +
-	  folia::UnicodeToUTF8(pat) + "'";
-	throw uConfigError(failString);
+	failString = "'" + folia::UnicodeToUTF8(pat) + "'";
+	throw uRegexError(failString);
       }
     }
   }
diff --git a/tests/Makefile.in b/tests/Makefile.in
index 2e11062..928e8c3 100644
--- a/tests/Makefile.in
+++ b/tests/Makefile.in
@@ -89,8 +89,7 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = tests
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_icu_check.m4 \
-	$(top_srcdir)/m4/ax_lib_readline.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ax_lib_readline.m4 \
 	$(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \
 	$(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \
 	$(top_srcdir)/m4/lt~obsolete.m4 $(top_srcdir)/m4/pkg.m4 \
@@ -155,13 +154,7 @@ EXEEXT = @EXEEXT@
 FGREP = @FGREP@
 GREP = @GREP@
 ICU_CFLAGS = @ICU_CFLAGS@
-ICU_CONFIG = @ICU_CONFIG@
-ICU_CPPSEARCHPATH = @ICU_CPPSEARCHPATH@
-ICU_CXXFLAGS = @ICU_CXXFLAGS@
-ICU_IOLIBS = @ICU_IOLIBS@
-ICU_LIBPATH = @ICU_LIBPATH@
 ICU_LIBS = @ICU_LIBS@
-ICU_VERSION = @ICU_VERSION@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
@@ -252,6 +245,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
diff --git a/ucto.pc.in b/ucto.pc.in
index fe99841..791e04e 100644
--- a/ucto.pc.in
+++ b/ucto.pc.in
@@ -6,7 +6,6 @@ includedir=@includedir@
 Name: ucto
 Version: @VERSION@
 Description: Unicode Tokenizer
-Requires.private: ucto-icu >= 3.6 folia >= 0.3
 Libs: -L${libdir} -lucto
 Libs.private: @LIBS@
 Cflags: -I${includedir}

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/ucto.git



More information about the debian-science-commits mailing list