[irstlm] 08/126: added documentation of trunk

Giulio Paci giuliopaci-guest at moszumanska.debian.org
Tue May 17 07:46:39 UTC 2016


This is an automated email from the git hooks/post-receive script.

giuliopaci-guest pushed a commit to annotated tag adaptiveLM.v0.1
in repository irstlm.

commit 92f57039a33c31d9a8ab50c38168c339f9f82fb0
Author: Marcello Federico <mrcfdr at gmail.com>
Date:   Mon Jul 20 09:32:46 2015 +0200

    added documentation of trunk
---
 doc/CMakeLists.txt          |   37 ++
 doc/ClassAndChunkLMs.tex    |  210 +++++++
 doc/LMAdaptation.tex        |   91 +++
 doc/LMCompilation.tex       |   54 ++
 doc/LMFileFormats.tex       |  247 ++++++++
 doc/LMFiltering.tex         |   16 +
 doc/LMInterface.tex         |  112 ++++
 doc/LMInterpolation.tex     |   75 +++
 doc/LMPrune.tex             |    0
 doc/LMPruning.tex           |   47 ++
 doc/LMQuantization.tex      |   14 +
 doc/LMSmoothing.tex         |    0
 doc/RELEASE                 |    1 +
 doc/compileLM.tex           |    0
 doc/dict.tex                |  115 ++++
 doc/gettingStarted.tex      |   85 +++
 doc/giganticLM.tex          |   79 +++
 doc/installation.tex        |   87 +++
 doc/interpolateLM.tex       |    0
 doc/interpolatedLM.tex      |    0
 doc/introduction.tex        |   35 ++
 doc/irstlm-manual.log       |  369 ++++++++++++
 doc/irstlm-manual.tex       |  229 ++++++++
 doc/mdframed.sty            | 1309 +++++++++++++++++++++++++++++++++++++++++++
 doc/mixtureLM.tex           |    0
 doc/ngt.tex                 |   33 ++
 doc/parallelComputation.tex |   18 +
 doc/pruneLM.tex             |    0
 doc/quantizeLM.tex          |    0
 doc/referenceMaterial.tex   |   34 ++
 doc/regressionTests.tex     |    0
 doc/releaseNotes.tex        |  224 ++++++++
 doc/tlm.tex                 |   88 +++
 33 files changed, 3609 insertions(+)

diff --git a/doc/CMakeLists.txt b/doc/CMakeLists.txt
new file mode 100644
index 0000000..1779345
--- /dev/null
+++ b/doc/CMakeLists.txt
@@ -0,0 +1,37 @@
+# include specific modules
+
+INCLUDE(UseLATEX OPTIONAL)
+
+if (PDFLATEX_COMPILER AND BIBTEX_COMPILER AND MAKEINDEX_COMPILER)
+message("PDFLATEX_COMPILER exists (${PDFLATEX_COMPILER})")
+message("BIBTEX_COMPILER exists (${BIBTEX_COMPILER})")
+message("MAKEINDEX_COMPILER exists (${MAKEINDEX_COMPILER})")
+
+
+SET(LATEX_OUTPUT_PATH build)
+
+PROJECT(irstlm-manual NONE)
+cmake_minimum_required(VERSION 2.8)
+
+SET(IRSTLM_INPUT_TEX ClassAndChunkLMs.tex LMFileFormats.tex LMFiltering.tex LMInterface.tex LMInterpolation.tex LMPruning.tex LMQuantization.tex LMAdaptation.tex LMCompilation.tex LMPrune.tex LMSmoothing.tex compileLM.tex dict.tex gettingStarted.tex giganticLM.tex installation.tex interpolateLM.tex interpolatedLM.tex introduction.tex mixtureLM.tex ngt.tex parallelComputation.tex pruneLM.tex quantizeLM.tex referenceMaterial.tex regressionTests.tex releaseNotes.tex tlm.tex )
+
+ADD_LATEX_DOCUMENT(
+    ./irstlm-manual.tex
+    INPUTS ${IRSTLM_INPUT_TEX}
+    DEFAULT_PDF
+)
+
+add_custom_command(TARGET pdf POST_BUILD
+    COMMAND ${CMAKE_COMMAND} -E copy
+       ${CMAKE_CURRENT_BINARY_DIR}/${LATEX_OUTPUT_PATH}/irstlm-manual.pdf
+       ${CMAKE_BINARY_DIR}/doc/irstlm-${IRSTLM_VERSION}-manual.pdf)
+
+INSTALL(PROGRAMS irstlm-${IRSTLM_VERSION}-manual.pdf
+    DESTINATION doc
+    PERMISSIONS OWNER_READ
+    )
+
+ELSE()
+message("PDFLATEX_COMPILER does not exists")
+ENDIF()
+
diff --git a/doc/ClassAndChunkLMs.tex b/doc/ClassAndChunkLMs.tex
new file mode 100644
index 0000000..c38eacc
--- /dev/null
+++ b/doc/ClassAndChunkLMs.tex
@@ -0,0 +1,210 @@
+{\IRSTLM} allows the use of class and chunk LMs, and a special
+handling of input tokens which are concatenation of $N \ge 1$ fields separated
+by the character \#, e.g.
+
+\begin{verbatim}
+   word#lemma#part-of-speech#word-class
+\end{verbatim}
+
+\noindent The processing is guided by the format of the file passed to
+Moses or {\tt compile-lm}: if it contains just the LM, either in textual or
+binary format, it is treated as usual; otherwise, it is supposed to have
+the following format:
+
+\begin{verbatim}
+LMMACRO <lmmacroSize> <selectedField> <collapse>
+<lmfilename>
+<mapfilename>
+\end{verbatim}
+
+\noindent where:
+\begin{verbatim}
+ LMMACRO is a reserved keyword
+ <lmmacroSize> is a positive integer
+ <selectedField> is an integer >=-1
+ <collapse> is a boolean value (true, false)
+ <lmfilename> is a file containing a LM (format compatible with {\IRSTLM})
+ <mapfilename> is an (optional) file with a (one|many)-to-one map
+\end{verbatim}
+
+\noindent The various cases are discussed with examples in the
+following. Data used in those examples can be found in the directory {\tt
+example/chunkLM/} which represents the relative path for all the parameters
+of the referred commands.  Note that texts with different tokens (words,
+POS, word\#POS pairs...) used either as input or for training LMs are all
+derived from the same multifield texts in order to allow direct comparison
+of results.
+
+\subsection{Field selection}
+
+The simplest case is that of the LM in {\tt <lmfilename>} referring just to
+one specific field of the input tokens. In this case, it is possible to
+specify the field to be selected before querying the LM through the integer
+{\tt <selectedField>} ($0$ for the first filed, $1$ for the
+second...). With the value $-1$, no selection is applied and the LM is
+queried with n-grams of whole strings.  The other parameters are set as:
+
+\begin{verbatim}
+ <lmmacroSize> : set to the size of the LM in <lmfilename>
+ <collapse>    : false
+\end{verbatim}
+
+\noindent The third line optionally reserved to {\tt <mapfilename>} does not exist.
+
+\bigskip
+\noindent
+Examples:
+
+\bigskip
+\noindent
+\thesubsection.a) selection of the second field:
+\begin{verbatim}
+$> compile-lm --eval test/test.w-micro cfgfile/cfg.2ndfield
+%% Nw=126 PP=2.68 PPwp=0.00 Nbo=0 Noov=0 OOV=0.00%
+\end{verbatim}
+
+\noindent
+\thesubsection.b) selection of the first field:
+\begin{verbatim}
+$> compile-lm --eval test/test.w-micro cfgfile/cfg.1stfield
+%% Nw=126 PP=9.71 PPwp=0.00 Nbo=76 Noov=0 OOV=0.00%
+\end{verbatim}
+
+\noindent The result of the latter case is identical to that obtained with
+the standard configuration involving just words:
+
+\bigskip
+\noindent
+\thesubsection.c) usual case on words:
+\begin{verbatim}
+$> compile-lm --eval test/test.w lm/train.en.blm 
+%% Nw=126 PP=9.71 PPwp=0.00 Nbo=76 Noov=0 OOV=0.00%
+\end{verbatim}
+
+
+\subsection{Class LMs}
+
+Possibly, a many-to-one or one-to-one map can be passed through the
+{\tt <mapfilename>} parameter which has the simple format:
+
+\begin{verbatim}
+w1 class(w1)
+w2 class(w2)
+ ...
+wM class(wM)
+\end{verbatim}
+
+
+\noindent The map is applied to each component of ngrams before the LM
+query. Examples:
+\bigskip
+
+\noindent \thesubsection.a) map applied to the second field:
+\begin{verbatim}
+$> compile-lm --eval test/test.w-micro cfgfile/cfg.2ndfld-map
+%% Nw=126 PP=16.40 PPwp=0.00 Nbo=33 Noov=0 OOV=0.00%
+\end{verbatim}
+
+
+
+\noindent \thesubsection.b) just to assess the correctness of the (16.2.a) result:
+\begin{verbatim}
+$> compile-lm --eval test/test.macro lm/train.macro.blm
+%% Nw=126 PP=16.40 PPwp=0.00 Nbo=33 Noov=0 OOV=0.00%
+
+
+\end{verbatim}
+
+
+\subsection{Chunk LMs}
+
+A particular processing is performed whenever fields are supposed to
+correspond to microtags, i.e. the per-word projections of chunk labels. By
+means of the {\tt <collapse>} parameter, it is possible to activate a
+processing aiming at collapsing the sequence of microtags defining a
+chunk. The chunk LM is then queried with ngrams of chunk labels, in an
+asynchronous manner with respect to the sequence of words, as in general
+chunks consist of more words.
+
+\noindent
+The collapsing operation is automatically activated if the sequence of
+microtags is:
+
+\begin{verbatim}
+ TAG( TAG+ TAG+ ... TAG+ TAG)
+\end{verbatim}
+
+\noindent
+Such a sequence is collapsed into a single chunk label (let us say {\tt
+CHNK}) as long as {\tt TAG(}, {\tt TAG+} and {\tt TAG)} are all mapped into
+the same label {\tt CHNK}. The map into different labels or a different
+use/position of characters $($, $+$ and $)$ in the lexicon of tags prevent
+the collapsing operation even if {\tt <collapse>} is set to {\tt true}. Of
+course, if {\tt <collapse>} is {\tt false}, no collapse is attempted.
+
+\paragraph{Warning:} In this context, it assumes an important role the parameter {\tt
+<lmmacroSize>}: it defines the size of the n-gram before the collapsing
+operation, that is the number of microtags of the actually processed
+sequence. {\tt <lmmacroSize>} should be large enough to ensure that after
+the collapsing operation, the resulting n-gram of chunks is at least of the
+size of the LM to be queried (the {\tt <lmfilename>}). As an example,
+assuming {\tt <lmmacroSize>=6}, {\tt <selectedField>=1}, {\tt
+<collapse>=true} and 3 the size of the chunk LM, the following input
+
+\begin{verbatim}
+ on#PP average#NP( 30#NP+ -#NP+ 40#NP+ cm#NP)
+\end{verbatim}
+
+\noindent will yield to query the LM with just the bigram {\tt (PP,NP)},
+instead of a more informative trigram; for this particular case, the value
+6 for {\tt <lmmacroSize>} is not enough.  On the other side, for efficiency
+reasons, it cannot be set to an unlimited valued. A reasonable value could
+derive from the average number of microtags per chunk (2-3), which means
+setting {\tt <lmmacroSize>} to two-three times the size of the LM in {\tt
+<lmfilename>}.  Examples:
+\bigskip
+
+\noindent \thesubsection.a) second field, micro$\rightarrow$macro map, collapse:
+\begin{verbatim}
+$> compile-lm --eval test/test.w-micro cfgfile/cfg.2ndfld-map-cllps
+%% Nw=126 PP=1.84 PPwp=0.00 Nbo=0 Noov=0 OOV=0.00%
+
+$> compile-lm --eval test/test.w-micro cfgfile/cfg.2ndfld-map-cllps -d=1
+%% Nw=126 PP=1.83774013 ... OOV=0.00% logPr=-33.29979642
+
+\end{verbatim}
+
+\noindent
+\thesubsection.b) whole token,  micro$\rightarrow$macro map, collapse:
+\begin{verbatim}
+$> compile-lm --eval test/test.micro cfgfile/cfg.token-map-cllps
+%% Nw=126 PP=1.84 PPwp=0.00 Nbo=0 Noov=0 OOV=0.00%
+\end{verbatim}
+
+\noindent
+\thesubsection.c)  whole token,  micro$\rightarrow$macro map, NO collapse:
+\begin{verbatim}
+$> compile-lm --eval test/test.micro cfgfile/cfg.token-map
+%% Nw=126 PP=16.40 PPwp=0.00 Nbo=0 Noov=0 OOV=0.00%
+\end{verbatim}
+\noindent Note that the configuration (16.3.c) gives the same result of that in
+example (16.2.b), as they are equivalent.
+
+\bigskip
+\noindent
+\thesubsection.d) As an actual example related to the ``warning'' note
+reported above, the following configuration with usual LM:
+
+\begin{verbatim}
+$> compile-lm --eval test/test.chunk lm/train.macro.blm -d=1
+Nw=73 PP=2.85754443 ... OOV=0.00000000% logPr=-33.28748842
+\end{verbatim}
+
+\noindent not necessarily yields the same log-likelihood ({\tt logPr}) nor the same perplexity ({\tt PP}) of case (16.3.a).
+In fact, concerning {\tt PP}, the length of the input sequence is definitely different (126 tokens before collapsing, 73 after that).
+Even the {\tt logPr} is different (-33.29979642 vs. -33.28748842) because in (16.3.a) some 6-grams ({\tt
+<lmmacroSize>} is set to 6) after collapsing reduce to $n$-grams of size less
+than 3 (the size of lm/train.macro.blm). By setting {\tt <lmmacroSize>} to
+a larger value (e.g. 8), the same {\tt logPr} will be computed.
+
+
diff --git a/doc/LMAdaptation.tex b/doc/LMAdaptation.tex
new file mode 100644
index 0000000..8102858
--- /dev/null
+++ b/doc/LMAdaptation.tex
@@ -0,0 +1,91 @@
+Language model adaptation can be  applied when little training data is given for the 
+task at hand, but much more data from other less related sources is available.  {\tt tlm} supports two adaptation methods.
+
+\subsection{Minimum Discriminative Information Adaptation}
+MDI adaptation  is used  when domain related  data is very  little but
+enough to  estimate a  unigram LM.  Basically,  the n-gram probs  of a
+general  purpose (background)  LM are  scaled so  that they  match the
+target unigram distribution.
+	
+\noindent	 
+Relevant parameters:
+\begin{itemize}
+\item {\tt -ar=value}: the adaptation {\tt rate},  a real number ranging 
+ from 0 (=no adaptation) to 1 (=strong adaptation).
+
+\item {\tt -ad=file}: the  adaptation file,  either a text  or a
+  unigram table.
+
+\item {\tt -ao=y}: open vocabulary mode, which  must be set if the adaptation file
+ might contain new words to be added to the basic dictionary.
+\end{itemize}
+
+\noindent
+As an example, we apply MDI adaptation on the ``adapt'' file:
+\begin{small}
+\begin{verbatim}
+$> tlm -tr=train.www -lm=wb -n=3 -te=test -dub=1000000 -ad=adapt -ar=0.8 -ao=yes
+   n=49984 LP=326327.8053 PP=684.470312 OVVRate=0.04193341869
+\end{verbatim}
+\end{small}
+
+\noindent
+\paragraph{Warning:}  modified shift-beta  smoothing  cannot  be applied  in  open
+vocabulary mode  ({\tt -ao=yes}).  If  this is the  case, you  should either
+change  smoothing method  or simply  add  the adaptation  text to  the
+background LM (use {\tt -aug} parameter  of {\tt ngt}). In
+general, this solution should  provide better performance.
+\begin{small}
+\begin{verbatim}
+$> ngt -i=train.www -aug=adapt -o=train-adapt.www -n=3 -b=yes
+$> tlm -tr=train-adapt.www -lm=msb -n=3 -te=test -dub=1000000 -ad=adapt -ar=0.8
+  n=49984 LP=312276.1746 PP=516.7311396 OVVRate=0.04193341869
+\end{verbatim}
+\end{small}
+
+\subsection{Mixture Adaptation}
+
+\noindent
+Mixture adaptation  is useful  when you have  enough training  data to
+estimate a  bigram or  trigram LM and  you also have  data collections
+from other domains.
+
+\noindent
+Relevant parameters:
+\begin{itemize}
+\item {\tt-lm=mix} : specifies mixture smoothing method
+\item {\tt -slmi=<filename>}: specifies filename with information about LMs to combine.
+\end{itemize}
+
+\noindent
+In the example directory, the file {\tt sublmi} contains the following lines:
+\begin{verbatim}
+2
+-slm=msb -str=adapt -sp=0
+-slm=msb -str=train.www -sp=0
+\end{verbatim}
+
+\noindent
+This means  that we use train a  mixture model on the  {\tt adapt} data set and
+combine it  with the train data. For each data  set the desired
+smoothing method is specified  (disregard the parameter {\tt -sp}). The file
+used for adaptation is the one in FIRST position.
+
+\begin{verbatim}
+$> tlm -tr=train.www -lm=mix -slmi=sublm -n=3 -te=test -dub=1000000
+  n=49984 LP=307199.3273 PP=466.8244383 OVVRate=0.04193341869
+\end{verbatim}
+
+\noindent
+{\bf Warning}: for  computational reasons it  is expected that  the $n$-gram
+table  specified by {\tt -tr}  contains AT  LEAST the  $n$-grams of  the last
+table specified in the slmi file, i.e. {\tt train.www} in  the example.
+Faster computations are achieved by putting the largest dataset as the
+last sub-model in the list and the union of all data sets as training
+file.
+
+\noindent
+It is  also IMPORTANT  that a  large {\tt -dub} value  is specified  so that
+probabilities  of  sub-LMs  can  be  correctly  computed  in  case  of
+out-of-vocabulary words.
+
diff --git a/doc/LMCompilation.tex b/doc/LMCompilation.tex
new file mode 100644
index 0000000..2230f63
--- /dev/null
+++ b/doc/LMCompilation.tex
@@ -0,0 +1,54 @@
+LMs in ARPA, iARPA, and qARPA format can be stored in a compact binary table through the command:
+
+\begin{verbatim}
+$> compile-lm train.lm train.blm
+\end{verbatim}
+
+\noindent
+which generates the binary file {\tt train.blm} that can be quickly loaded in memory.  If the LM
+is really very large, {\tt compile-lm} can avoid to create the binary LM directly in memory through the 
+option {\tt -memmap 1}, which exploits the {\em Memory Mapping} mechanism in order to work as 
+much as possible on disk rather than in RAM. \\
+
+\begin{verbatim}
+$> compile-lm --memmap 1 train.lm train.blm
+\end{verbatim}
+\noindent
+This option clearly pays a fee  in terms of speed, but  is often the only way to proceed. It is also recommended 
+that the hard disk for the LM storage belongs to the computer on which the compilation is performed.
+
+\noindent
+Notice that most of the functionalities of {\tt compile-lm} (see below) apply to binary and quantized models. 
+
+\noindent
+By default, the command uses the directory ``/tmp'' for storing
+intermediate results.  For huge LMs, the temporary files can grow
+dramatically causing a ``disk full'' system error.  It is possible to
+explicitly set the directory used for temporary computation through the
+parameter ``--tmpdir''.
+\begin{verbatim}
+$> compile-lm --tmpdir=<mytmpdir> train.lm train.blm
+\end{verbatim}
+
+
+\subsection{Inverted order of ngrams}
+\label{sec:inverted-lm}
+For a faster access, the ngrams can be stored in inverted order with the following two commands:
+\begin{verbatim}
+$> sort-lm.pl -inv -ilm train.lm -olm train.inv.lm
+$> compile-lm train.inv.lm train.inv.blm --invert yes
+\end{verbatim}
+
+\paragraph{Warning:} The following pipeline is no more allowed!!
+
+\COMMENT{
+or with the following pipeline:
+}
+\begin{verbatim}
+$> cat train.lm | sort-lm.pl -inv | \
+   compile-lm /dev/stdin train.inv.blm --invert yes
+\end{verbatim}
+
+
+
+
diff --git a/doc/LMFileFormats.tex b/doc/LMFileFormats.tex
new file mode 100644
index 0000000..afe7e84
--- /dev/null
+++ b/doc/LMFileFormats.tex
@@ -0,0 +1,247 @@
+{\IRSTLM} supports several types of input and output formats for handling LMs, $n$-gram counts, dictionaries.
+
+%{\IRSTLM} supports three output formats of LMs. These formats have the
+%purpose of permitting  the use of LMs by  external programs.
+
+
+\subsection{File Formats for Dictionary}
+The dictionary is the data structure exploited by  {\IRSTLM} to store a set of terms. 
+
+{\IRSTLM} saves a dictionary in textual file format consisting of:
+\begin{itemize}
+\item a header line specifying the most important information about the file itself: the keyword "dictionary", a fixed value 0, and the amount of terms the dictionary contains;
+\item a set of terms listed according to either their occurrence or their frequency in the data.
+\end{itemize}
+Here is an excerpt.
+\begin{verbatim}
+dictionary 0 7893
+<s>
+</s>
+solemn
+ceremony
+marks
+....
+\end{verbatim}
+
+\noindent
+Optionally, the occurrence frequencies of each term can be stored as well; in this case the keyword is "DICTIONARY".
+
+\noindent
+Here is an excerpt.
+\begin{verbatim}
+DICTIONARY 0 7893
+<s> 5000
+</s> 5001
+solemn 7
+ceremony 59
+....
+\end{verbatim}
+
+\IMPORTANT{The list order is used by {\IRSTLM} to define the internal codes of the terms.
+In the vast majority of cases, it is completely transparent and irrelevant to the user.
+Only in very few cases highlighted in this manual, this order is crucial.}
+
+\subsection{File Formats for $n$-gram Table}
+The $n$-gram table is the data structure exploited by  {\IRSTLM} to store a set of $n$-grams. {\IRSTLM} stores an $n$-gram table either in textual or binary formats.
+
+\subsubsection{Textual format}
+The textual format consists of:
+\begin{itemize}
+\item a header line specifying the most important information about the file itself: the keyword "nGrAm", the order $n$ of the $n$-grams, the amount of $n$-grams the $n$-gram table contains, and a second keyword representing the table type;
+\item a second line reporting the size of the dictionary associated to the $n$-grams;
+\item the terms of the dictionary (one term per line) with their frequency;
+\item the list of all $n$-grams with their counts.
+\end{itemize}
+Here is an excerpt.
+\begin{verbatim}
+NgRaM 3 76857 ngram
+7893
+<s> 5000
+</s> 5001
+solemn 7
+ceremony 59
+...
+<s> <s> <s>     2
+<s> <s> </s>    1
+<s> </s> </s>   1
+<s> solemn ceremony     1
+<s> a solemn    1
+\end{verbatim}
+
+\subsubsection{Binary format}
+The binary format is similar, but its main keyword is "{\tt NgRaM}" (different caseing), and the list of $n$-grams is binarized; hence, the last portion of the binary $n$-gram table is not user-readable.
+
+\subsubsection{Google $n$-gram format}
+{\IRSTLM} supports the Google $n$-gram format as well both for input and output. This format, always textual, simply consists of the list of all $n$-grams with their counts.
+Here is an excerpt.
+\begin{verbatim}
+<s> <s> <s>     2
+<s> <s> </s>    1
+<s> </s> </s>   1
+<s> solemn ceremony     1
+<s> a solemn    1
+...
+\end{verbatim}
+
+
+
+
+\subsubsection{Table types}
+The table type keyword represents the way the $n$-grams are collected and the way they are exploited for further computation:
+\begin{itemize}
+\item{\tt ngram}: each entry is a standard $n$-gram, i.e. a contiguous sequence of $n$ terms; they are usually used to estimate a standard $n$-gram LM;
+\item{\tt co-occ$K$}: each entry is a xxxxxx, where $K$ is XXXXXX;
+\item{\tt hm$S$}: each entry is a xxxxxx, where $K$ is XXXXXX;
+\end{itemize}
+
+\subsection{File Formats for LM}
+{\IRSTLM} handles LM both in textual and binary formats.
+It provides facilities to save disk space storing probabilities as quantized values instead of floating  point values, and to reduce access time saving $n$-grams in inverted order.
+ 
+\subsubsection{Textual Format}
+The textual format is the well-known ARPA format introduced in DARPA ASR evaluations  to exchange LMs.
+ARPA format  is  supported by most third party LM toolkit, like SRILM and KenLM.
+
+The ARPA format consists of:
+\begin{itemize}
+\item one block reporting the amount $n$-grams stored for level $m$ of the LM ($m<n$); this block starts with the keyword "{\tt \textbackslash{}data}";
+\item one block per level reporting the set of $m$-grams for that level together with their log-probability (first field), and the backoff log-probability (last field); each block starts with the keyword "{\tt \textbackslash{}$m$-grams}", with the correct value for $m$;
+\item the keyword  "{\tt \textbackslash{}end\textbackslash{}}" closes the file.
+\end{itemize}
+
+\noindent
+Here is an excerpt.
+\begin{verbatim}
+\data\
+ngram  1=      7894
+ngram  2=     46269
+ngram  3=     12188
+\1-grams:
+-4.79871        <s>     -0.826378
+....
+-1.20244        <unk>
+\2-grams:
+-3.29024        <s> <s> -0.221849
+...
+-0.289359       restructuring of
+\3-grams:
+-0.397606       <s> <s> <s>
+-1.67881        <s> a hong_kong
+...
+-0.420213       seymf council ,
+\end\
+\end{verbatim}
+
+\noindent
+Empty lines can occur before and after each block.
+
+\noindent
+There is no limit to the order $n$ of $n$-grams.
+
+\IMPORTANT{Backoff log-probabilities are not reported if equal to 0; backoff log-probabilities do not exist for the largest order.}
+
+
+\subsubsection{Quantized Textual Format}
+This textual format extends the ARPA textual format including codebooks that quantize 
+probabilities and back-off weights of each $n$-gram level.
+
+The quantized ARPA format consists of:
+\begin{itemize}
+\item a header line specifying the most important information about the file itself: the keyword "qARPA", the order $n$ of the LM, the size of the $n$ codebooks
+\item one block reporting the amount $n$-grams stored for level $m$ of the LM ($m<n$); this block starts with the keyword "{\tt \textbackslash{}data}";
+\item one block per level reporting first the codebooks for the level and then the set of $m$-grams for that level together with their quantized log probability (first field), and the backoff quantized  log-probability (last field); each block starts with the keyword "{\tt \textbackslash{}$m$-grams}", with the correct value for $m$;
+\item the keyword  "{\tt \textbackslash{}end\textbackslash{}}" closes the file.
+\end{itemize}
+
+
+\noindent
+Here is an excerpt.
+\begin{verbatim}
+qARPA 3 256 256 256
+\data\
+ngram 1= 7894
+ngram 2= 46269
+ngram 3= 12188
+\1-grams:
+256
+-4.79885 -99
+-4.62261 -1.75587
+....
+0       <s>     53
+186     </s>    0
+....
+\2-grams:
+256
+-3.79901 -99
+-3.62278 -3.01953
+...
+7       <s>     <s>     255
+65      <s>     </s>    255
+....
+\end\
+\end{verbatim}
+
+
+\subsubsection{Intermediate Textual Format}
+This is an {\em intermediate} ARPA format used by {\IRSTLM} for optimizing computation of huge LM. It differs from the ARPA format in two aspects:
+\begin{itemize}
+\item the header line contains only the keyword {\tt iARPA};
+\item the first field of each $n$-gram entry is its smoothed frequency of instead of its log-probability.
+\end{itemize}
+
+\COMMENT{
+\noindent
+Nevertheless, iARPA format is properly managed by the {\tt compile-lm} command
+in order to generate a binary version or a standard ARPA version.
+}
+
+\subsubsection{Binary Format}
+The binary format supported by {\IRSTLM} allows for save disk space and upload the LM quicker.
+
+\noindent 
+The binary format consists of:
+\begin{itemize}
+\item a header line specifying the most important information about the file itself: the keyword "blmt", the order $n$ of the LM, and the amount $n$-grams stored for level $m$ of the LM ($m<n$);
+\item a second line reporting the size of the dictionary associated to the $n$-grams;
+\item the terms of the dictionary (one term per line) with their frequency, if available;
+\item a binary section containing $n$-grams and their probabilities; this portion is not user-readable.
+\end{itemize}
+
+\begin{verbatim}
+blmt 3       7894      46269      12188
+7894
+<s> 1
+</s> 5001
+solemn 7
+...
+_binary_data_
+\end{verbatim}
+
+\subsubsection{Quantized Binary Format}
+The quantized binary format stores the quantized version of a LM.
+
+\noindent 
+It consists of:
+\begin{itemize}
+\item a header line specifying the most important information about the file itself: the keyword "Qblmt", the order $n$ of the LM, and the amount $n$-grams stored for level $m$ of the LM ($m<n$);
+\item a second line specifying the most important information of the codebooks of each level: the keyword "NumCenters", and  the size of the $n$ codebooks;
+\item a third line reporting the size of the dictionary associated to the $n$-grams;
+\item the terms of the dictionary (one term per line) with their frequency, if available;
+\item a binary section containing the codebooks, the $n$-grams and their quantized probabilities; this portion is not user-readable.
+\end{itemize}
+
+\begin{verbatim}
+Qblmt 3 7894 46269 12188
+NumCenters 256 256 256
+7894
+<s> 1
+</s> 5001
+solemn 7
+...
+_binary_data_
+\end{verbatim}
+
+
+
+\subsubsection{Inverted Binary Format}
+{\IRSTLM} can store the $n$-grams in inverted order to speed up access time. This applies to both standard and quantized binary formats, namely {\tt blmt} or {\tt Qblmt}. The keywords are {\tt blmtI} or {\tt QblmtI}, respectively.
diff --git a/doc/LMFiltering.tex b/doc/LMFiltering.tex
new file mode 100644
index 0000000..4600a87
--- /dev/null
+++ b/doc/LMFiltering.tex
@@ -0,0 +1,16 @@
+A large LM can be filtered according to a word list through the command:
+
+\begin{verbatim}
+$> compile-lm  train.lm --filter list filtered.lm
+\end{verbatim}
+The resulting LM will only contain n-grams inside the provided list of words,
+with the exception of the 1-gram level, which by default is preserved identical
+to the original LM. This behavior can be changed by setting the option 
+{\tt --keepunigrams no}.  LM filtering can be useful once very large LMs can
+be specialized in advance to work on a particular portion of language.
+\noindent
+If the original LM is in binary format and is very large, {\tt compile-lm} can avoid to load it in memory,
+through the memory mapping option {\tt -memmap 1}.
+
+
+
diff --git a/doc/LMInterface.tex b/doc/LMInterface.tex
new file mode 100644
index 0000000..49073c2
--- /dev/null
+++ b/doc/LMInterface.tex
@@ -0,0 +1,112 @@
+LMs are useful when they can be queried through another application in order to compute 
+perplexity scores or n-gram probabilities. {\IRSTLM} provides two possible interfaces: 
+\begin{itemize}
+\item at the command level, through  {\tt compile-lm}
+\item at the c++ library level, mainly through methods of the class {\tt lmtable}
+\end{itemize}
+
+\noindent
+In the following, we will only focus on the command level interface. Details about
+the c++ library interface will be provided in a future version of this manual.  
+
+\subsection{Perplexity Computation}
+Assume we have estimated and saved the following LM:
+
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=wb -te=test -o=train.lm -ps=no
+ n=49984 LP=308057.0419 PP=474.9041687 OVVRate=0.05007602433
+\end{verbatim}
+
+\noindent
+To compute the perplexity directly from the LM on disk, we can use the command:
+
+\begin{verbatim}
+$> compile-lm train.lm  --eval test
+ %% Nw=49984 PP=1064.40 PPwp=589.50 Nbo=38071 Noov=2503 OOV=5.01%
+\end{verbatim}
+Notice that {\tt PPwp} reports the contribution of OOV words to the perplexity. Each OOV word is indeed penalized by dividing the
+LM probability of the {\tt unk} word by  the quantity
+
+
+\centerline{{\tt DictionaryUpperBound} - {\tt SizeOfDictionary}}
+
+\noindent
+The OOV penalty can be modify by changing the {\tt DictionaryUpperBound} with the parameter {\tt --dub} (whose default value is set to $10^7$). \\
+
+\noindent
+The perplexity of the pruned LM can be computed with the command:
+\begin{verbatim}
+$> compile-lm train.plm --eval test --dub 10000000
+%% Nw=49984 PP=1019.69 PPwp=564.73 Nbo=39907 Noov=2503 OOV=5.01%
+\end{verbatim}
+Interestingly, a slightly better value is obtained which could be explained by the 
+fact that pruning has removed many unfrequent trigrams and has redistributed 
+their probabilities over more frequent bigrams.
+
+\noindent
+Notice that {\tt PPwp} reports the perplexity with a fixed dictionary upper-bound of 10 million words. Indeed:
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=wb -te=test -o=train.lm -ps=no -dub=10000000 
+n=49984 LP=348396.8632 PP=1064.401254 OVVRate=0.05007602433
+\end{verbatim}
+
+\bigskip
+\noindent
+Again, if the LM is in binary format and is very large, {\tt compile-lm} can avoid to load it in memory,
+through the memory mapping option {\tt -memmap 1}.
+
+\bigskip
+\noindent
+By enabling the option ``{\tt --sentence yes}'', {\tt compile-lm} computes perplexity and related figures (OOV rate, number of backoffs, etc.) for each input sentence. The end of a sentence is identified by a given symbol ({\tt </s>} by default).
+\begin{verbatim}
+$> compile-lm train.plm --eval test --dub 10000000 --sentence yes	
+\end{verbatim}
+{\small 
+\begin{verbatim}
+%% sent_Nw=1 sent_PP=23.22 sent_PPwp=0.00 sent_Nbo=0 sent_Noov=0 sent_OOV=0.00%
+%% sent_Nw=8 sent_PP=7489.50 sent_PPwp=7356.27 sent_Nbo=7 sent_Noov=2 sent_OOV=25.00%
+%% sent_Nw=9 sent_PP=1231.44 sent_PPwp=0.00 sent_Nbo=14 sent_Noov=0 sent_OOV=0.00%
+%% sent_Nw=6 sent_PP=27759.10 sent_PPwp=25867.42 sent_Nbo=19 sent_Noov=1 sent_OOV=16.67%
+.....
+%% sent_Nw=5 sent_PP=378.38 sent_PPwp=0.00 sent_Nbo=39893 sent_Noov=0 sent_OOV=0.00%
+%% sent_Nw=15 sent_PP=4300.44 sent_PPwp=2831.89 sent_Nbo=39907 sent_Noov=1 sent_OOV=6.67%
+%% Nw=49984 PP=1019.69 PPwp=564.73 Nbo=39907 Noov=2503 OOV=5.01%
+\end{verbatim}
+}
+
+\bigskip
+\noindent
+Finally, tracing information with the {\tt --eval }  option are shown by setting 
+debug levels from 1 to 4 ({\tt --debug}):
+\begin{enumerate}
+\item reports the back-off level for each word
+\item adds the log-prob 
+\item adds the back-off weight
+\item check if probabilities sum up to 1.
+\end{enumerate}
+
+
+\subsection{Probability Computations}
+Word-by-word log-probabilities  can be computed as well from standard input with the command:
+\begin{verbatim}
+$> compile-lm train.lm --score yes < test
+
+> </s>  1 p= NULL
+> <s> <unk>     1 p= NULL
+> <s> <unk> of  1 p= -3.530047e+00 bo= 2
+> <unk> of the  1 p= -1.250668e+00 bo= 1
+> of the senate 1 p= -1.170901e+01 bo= 1
+> the senate (  1 p= -5.457265e+00 bo= 2
+> senate ( <unk>        1 p= -2.166440e+01 bo= 2
+....
+....
+\end{verbatim}
+
+\noindent
+the command reports the currently observed n-gram, 
+including {\tt\_unk\_} words, a dummy
+constant frequency 1, the log-probability of the n-gram, and the 
+number of back-offs performed by the LM.  
+
+\paragraph{Warning:} All cross-sentence $n$-grams are skipped. The 1-grams with the sentence start symbol are also skipped. In a $n$-grams all words before the sentence start symbol are removed. For $n$-grams, whose size is smaller than the LM order, probability is not computed, but a {\tt NULL} value is returned.
+
diff --git a/doc/LMInterpolation.tex b/doc/LMInterpolation.tex
new file mode 100644
index 0000000..a2e5ae7
--- /dev/null
+++ b/doc/LMInterpolation.tex
@@ -0,0 +1,75 @@
+We provide a convenient tool to estimate mixtures of LMs that have been already 
+created in one of the available formats.  The tool permits to estimate interpolation
+weights through the EM algorithm, to compute the perplexity, and to query the interpolated
+LM. 
+
+\noindent
+Data used in those examples can be found in the directory {\tt example/interpolateLM/},
+which represents the relative path for all the parameters of the referred commands.
+
+\noindent
+Interpolated LMs are defined by a configuration file in the following format:
+\begin{verbatim}
+3
+0.3 lm-file1
+0.3 lm-file2
+0.4 lm-file3
+\end{verbatim}
+
+\noindent
+The first number indicates the number of LMs to be interpolated, then each LM is specified
+by its weight and its file (either in ARPA or binary format). Notice that you can interpolate
+LMs with different orders\\
+
+\noindent
+Given an initial configuration file {\tt lmlist.init} (with arbitrary weights), new weights can be estimated
+through Expectation-Maximization on some text sample {\tt test} by running the command:
+\begin{verbatim}
+$> interpolate-lm lmlist.init --learn test
+\end{verbatim}
+\noindent
+New weights will be written in the updated configuration file, called by default {\tt lmlist.init.out}.
+You can also specify the name of the updated configuration file as follows:
+
+\begin{verbatim}
+$> interpolate-lm lmlist.init --learn test lmlist.final
+\end{verbatim}
+
+
+\noindent
+Similarly to {\tt compile-lm}, interpolated LMs can be queried through the option {\tt --score}
+
+\begin{verbatim}
+$> interpolate-lm lmlist.final --score yes < test
+\end{verbatim}
+
+\noindent
+and can return the perplexity of a given input text (``{\tt --eval text-file}''), optionally  at sentence level  by enabling the option ``{\tt --sentence yes}'',
+
+\begin{verbatim}
+$> interpolate-lm lmlist.final --eval test 
+$> interpolate-lm lmlist.final --eval test --sentence yes
+\end{verbatim}
+
+\bigskip
+\noindent
+If there are binary LMs in the list,  {\tt interpolate-lm} can avoid to load them in memory through the memory 
+mapping option {\tt -memmap 1}.
+
+
+\noindent
+The full list of options is:
+
+\begin{verbatim}
+--learn text-file   learn optimal interpolation for text-file
+--order n           order of n-grams used in --learn (optional)
+--eval text-file    compute perplexity on text-file
+--dub dict-size     dictionary upper bound (default 10^7)
+--score [yes|no]    compute log-probs of n-grams from stdin
+--debug [1-3]       verbose output for --eval option (see compile-lm)
+--sentence [yes|no] (compute perplexity at sentence level (identified
+                    through the end symbol)
+--memmap 1          use memory map to read a binary LM
+\end{verbatim}
+ 
+
diff --git a/doc/LMPrune.tex b/doc/LMPrune.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/LMPruning.tex b/doc/LMPruning.tex
new file mode 100644
index 0000000..9e78b25
--- /dev/null
+++ b/doc/LMPruning.tex
@@ -0,0 +1,47 @@
+Large LMs files can be pruned in a smart way by means of the command 
+{\tt prune-lm} that removes $n$-grams for which resorting to the back-off 
+results in a small loss. {\IRSTLM} implements a method similar to the 
+Weighted Difference Method described in the paper {\em Scalable Backoff
+Language Models} by Seymore and Rosenfeld.
+
+\noindent
+The syntax is as follows:
+\begin{verbatim}
+$> prune-lm --threshold=1e-6,1e-6  train.lm.gz  train.plm
+\end{verbatim}
+Thresholds for each n-gram level, up from 2-grams, are based on empirical 
+evidence. Threshold zero results in no pruning. If less thresholds are specified,
+the right most is applied to the higher levels. Hence, in the above example we
+could have just specified one threshold, namely {\tt --threshold=1e-6}. 
+The effect of pruning is shown in the following messages of {\tt prune-lm}:
+
+\begin{verbatim}1-grams: reading 15059 entries
+2-grams: reading 142684 entries
+3-grams: reading 293685 entries
+done
+OOV code is 15058
+OOV code is 15058
+pruning LM with thresholds: 
+ 1e-06 1e-06
+savetxt: train.plm
+save: 15059 1-grams
+save: 138252 2-grams
+save: 194194 3-grams
+\end{verbatim}
+
+\noindent
+The saved LM table {\tt train.plm}  contains about 3\% less bigrams, and 34\%  
+less trigrams.
+Notice that the output of prune-lm is an ARPA LM file, while the input can be 
+either an ARPA or binary LM. 
+In order to measure the loss in accuracy introduced
+by pruning, perplexity of the resulting LM can be computed (see below).
+
+\paragraph{Warning:} the possible quantization should be performed after pruning.
+
+\paragraph{Warning:} 
+{\IRSTLM} does not provide a reliable probability for the special
+1-gram composed by the ``sentence start symbol'' ({\tt <s>}) , because none
+should ever ask for it.  However, this pruning method requires the
+computation of the probability of this 1-gram.  Hence, (only) in this case
+the probability of this special 1-gram is arbitrarily set to 1.
diff --git a/doc/LMQuantization.tex b/doc/LMQuantization.tex
new file mode 100644
index 0000000..255761c
--- /dev/null
+++ b/doc/LMQuantization.tex
@@ -0,0 +1,14 @@
+A language model file in ARPA  format, created with the IRST LM toolkit or
+with other tools, can be quantized and stored in a compact data structure, 
+called language model table.  Quantization can be performed by the command:
+
+\begin{verbatim}
+$> quantize-lm  train.lm train.qlm
+\end{verbatim}
+
+\noindent
+which  generates   the  quantized  version  {\tt train.qlm} that  encodes all probabilities and back-off 
+weights in 8 bits. The  output is a  modified ARPA format, called qARPA. Notice that quantized
+LMs reduce memory consumptions at the cost of some loss in performance. Moreover, probabilities
+of quantized LMs are not supposed to be properly normalized.
+
diff --git a/doc/LMSmoothing.tex b/doc/LMSmoothing.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/RELEASE b/doc/RELEASE
new file mode 100644
index 0000000..d2297d5
--- /dev/null
+++ b/doc/RELEASE
@@ -0,0 +1 @@
+5.80.08
diff --git a/doc/compileLM.tex b/doc/compileLM.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/dict.tex b/doc/dict.tex
new file mode 100644
index 0000000..4365624
--- /dev/null
+++ b/doc/dict.tex
@@ -0,0 +1,115 @@
+{\tt dict} is the command which copes with the dictionaries.
+
+\begin{itemize}
+\item It extracts the dictionary from a corpus or a dictionary;
+\item It computes and shows the dictionary growth curve;
+\item It computes and shows the out-of-vocabulary rate on a test corpus.
+\end{itemize}
+
+\subsubsection{Synopsis}
+
+\begin{tabular}{llll}
+\multicolumn{4}{l}{USAGE}\\
+    & \multicolumn{3}{l}{\tt dict -i=$<$inputfile$>$ [options]} \\
+    \\
+\multicolumn{4}{l}{OPTIONS} \\
+    & {\tt Curve}& {\tt c} &      show dictionary growth curve; default is false\\
+    & {\tt CurveSize} & {\tt cs} &    default 10\\
+    & {\tt Freq} & {\tt f} &    output word frequencies; default is false\\
+    & {\tt Help} & {\tt h} &    print this help\\
+    & {\tt InputFile} & {\tt i} &    input file (Mandatory)\\
+    & {\tt IntSymb} & {\tt is} &    interruption symbol\\
+    & {\tt ListOOV} & {\tt oov} &    print OOV words to stderr; default is false\\
+    & {\tt LoadFactor} & {\tt lf} &    set the load factor for cache; it should be a positive real value; default is 0\\
+    & {\tt OutputFile} & {\tt o} &    output file\\
+    & {\tt PruneFreq} & {\tt pf} &    prune words with frequency below the specified value\\
+    & {\tt PruneRank} & {\tt pr} &    prune words with frequency rank above the specified value\\
+    & {\tt Size} & {\tt s} &    initial dictionary size; default is $10^6$\\
+    & {\tt sort} & & sort dictionary by frequency; default is false\\
+    & {\tt TestFile} & {\tt t} &    compute OOV rates on the specified test corpus\\
+\end{tabular}
+
+
+\subsubsection{Extraction of a dictionary}
+To extract the dictionary from a given a text and store it in a file, run the following command:
+
+\begin{verbatim}
+$> dict -i=train.txt.se -o=train.dict -f=true
+\end{verbatim}
+
+The input text can be also generated on the fly by passing a command as value of the parameter{\tt InputFile }; in this case the single or double quotation marks are required.  
+\begin{verbatim}
+$> dict -i="cat train.txt | add-start-end.sh" -o=train.dict -f=true
+\end{verbatim}
+
+\noindent
+For some applications like speech recognition, it  can be useful to limit the LM dictionary.
+You can obtain such a pruned list either by means of the parameter {\tt PruneRank}, which only stores the top frequent, let us say, 10K words:
+\begin{verbatim}
+$> dict -i=train.txt.se -o=train.dict.pr10k -pr=10000
+\end{verbatim}
+
+\noindent
+or by means of the parameter {\tt PruneFreq}, which only store the terms occurring more than a given amount of times, let us say, 5:
+\begin{verbatim}
+$> dict -i=train.txt.se -o=train.dict.pf5 -pf=5
+\end{verbatim}
+
+\noindent
+The two pruning strategies can be combined.
+
+
+
+\subsubsection{Dictionary growth curve}
+{\tt dict} can display the distribution of the terms according to their frequency in a text or in a pre-computed dictionary. This facility is enabled by the parameter {\tt Curve}; the maximum frequency taken into account is specified by the parameter {\tt CurveSize}.
+ 
+\begin{verbatim}
+dict -i=train.dict -c=yes -cs=50
+\end{verbatim}
+
+\noindent
+The output looks as follows
+\begin{verbatim}
+Dict size: 7893
+**************** DICTIONARY GROWTH CURVE ****************
+Freq  Entries  Percent
+>0    7893     100.00%
+>1    4880      61.83%
+>2    3721      47.14%
+>3    2990      37.88%
+...
+>47   271        3.43%
+>48   264        3.34%
+>49   258        3.27%
+*********************************************************
+\end{verbatim}
+\noindent
+Each row of the table reports, given the value in the first column, the amount of terms (second column) having at least the given frequency (first column), and its percentage  (third column) with respect to the total amount of entries.
+
+
+
+\subsubsection{Out-of-vocabulary rate statistics}
+{\tt dict} can display the distribution of the terms according to their frequency in a text or in a pre-computed dictionary; the maximum frequency taken into account is specified by the parameter {\tt CurveSize}.
+\begin{verbatim}
+$> dict -i=train.dict -t=test.txt.se -cs=50
+\end{verbatim}
+
+\noindent
+The output looks as follows
+\begin{verbatim}
+Dict size: 7893
+Words of test: 1009
+**************** OOV RATE STATISTICS ****************
+Freq  OOV_Entries  OOV_Rate
+<1    119          11.79%
+<2    151          14.97%
+<3    191          18.93%
+...
+<48   457          45.29%
+<49   457          45.29%
+<50   457          45.29%
+*********************************************************
+
+\end{verbatim}
+\noindent
+Each row of the table reports, given the value in the first column, the out-of-vocabulary rate on the test set, assuming to prune the dictionary at the given frequency. In other words, 191 (18.93\%) of the running terms in the test set has a frequency smaller than 3 in the dictionary.
diff --git a/doc/gettingStarted.tex b/doc/gettingStarted.tex
new file mode 100644
index 0000000..399918e
--- /dev/null
+++ b/doc/gettingStarted.tex
@@ -0,0 +1,85 @@
+After a successful installation, you are ready to use {\IRSTLM}.
+
+\noindent
+In this Section, a basic 4-step procedure is given to estimate a LM and to compute its perplexity on a text.
+Many changes to this procedure can be done in order to optimize effectiveness and efficiency according to your needs.
+
+\noindent 
+Please refer to Section~\ref{sec:commands} to learn more about each IRSTLM commands, and 
+to Section~\ref{sec:functions} to get hints about IRSTLM functionalities.
+
+
+
+
+\IMPORTANT{All programs assume that the environment variable {\bf IRSTLM} is correctly set to {\tt /path/to/install/doc}, and that that environment variable {\bf PATH} includes the command directory {\tt /path/to/install/bin}. see above}
+
+\noindent
+Data used in the following usage examples can be found in an archive you can download from the official website of {\IRSTLM}.
+Most of them are very little, so the reported figures are not reliable.
+
+\subsection{Preparation of Training Data}
+In order to estimate a Language Model, you first need to prepare your training corpus. The corpus just consists of a text.
+We assume that the text is already preprocessed according to the user needs; this means that lowercasing, uppercasing, tokenization, and any other text transformation has to be performed beforehand with other tools.
+
+\noindent
+You can only decide whether you are interested that {\IRSTLM} is aware of sentence boundaries, i.e. where a sentence starts and ends. Otherwise, it considers the corpus as a continuous stream of text, and does not identify sentence splits. 
+
+\noindent
+The following script adds start and end symbols ({\tt <s>} and {\tt </s>}, respectively) to all lines in your training corpus.
+\begin{verbatim}
+$> cat train.txt | add-start-end.sh > train.txt.se
+\end{verbatim}
+
+\noindent
+{\IRSTLM} does not compute probabilities for cross-sentence $n$-grams, i.e. $n$-grams including the pair {\tt </s>  <s>}.
+
+
+\IMPORTANT{{\IRSTLM} assumes that each line corresponds to a sentence, regardless the presence of punctuation inside or at the end of the line.}
+\IMPORTANT{Start and end symbols ({\tt <s>} and {\tt </s>}) should be considered reserved symbols, and used only as sentence boundaries.}
+
+
+\subsection{Computation of $n$-gram statistics}
+\noindent
+You can now collect $n$-gram statistics for your training data (3-gram in this example) by running the command:
+
+\begin{verbatim}
+$> ngt -i=train.txt.se -n=3 -o=train.www -b=yes
+\end{verbatim}
+
+\noindent
+The $n$-grams counts are saved in the binary file "train.www".
+
+
+\subsection{Estimation of the LM}
+\noindent
+You can now estimate a $n$-gram LM (3-gram LM in this example) smoothed according to the Linear Witten Bell method by running the command:
+
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=LinearWittenBell -obin=train.blm
+\end{verbatim}
+\noindent
+The estimated LM is saved in the binary file "train.blm".
+
+
+\subsection{Computation of the Perplexity}
+\noindent
+With the estimated LM, you can now compute the perplexity of any text contained in "test.txt" by running the commands below.
+
+\noindent
+To be compliant with the training data actually used to estimate the LM, start and end symbols are added to the text as well.
+
+\begin{verbatim}
+$> cat test.txt | add-start-end.sh > test.txt.se
+$> compile-lm  train.blm  --eval=test.txt.se
+\end{verbatim}
+
+\noindent
+which produces the output:
+\begin{verbatim}
+%% Nw=1009 PP=8547.90 PPwp=6870.51 Nbo=983 Noov=119 OOV=11.79%
+\end{verbatim}
+
+\noindent
+The output shows the number of  words ({\tt Nw}), the LM perplexity ({\tt PP}), the portion of PP due to the out-of-vocabulary words ({\tt PPwp}), the amount of backoff calls({\tt Nbo}) required for computing PP, the amount of out-of-vocabulary words ({\tt Noov}), and the out-of-vocabulary rate ({\tt OOV}).
+
+
diff --git a/doc/giganticLM.tex b/doc/giganticLM.tex
new file mode 100644
index 0000000..c3a82c2
--- /dev/null
+++ b/doc/giganticLM.tex
@@ -0,0 +1,79 @@
+LM estimation starts with the collection of n-grams and their frequency counters. Then, 
+smoothing parameters are estimated for each n-gram level; infrequent n-grams are
+possibly pruned and, finally, a LM file is created containing n-grams with probabilities and 
+back-off weights.  This procedure can be very demanding in terms of memory and
+time if it applied on huge corpora.   We provide here a way to split LM training  into smaller and independent steps, that can be easily distributed among independent processes. The  
+procedure relies on a training scripts that makes little use of computer RAM and implements 
+the  Witten-Bell smoothing method in an exact way.  
+
+\noindent
+Before starting, let us create a working directory under {\tt examples}, as many files will be created:
+
+\begin{verbatim}
+$> mkdir stat
+\end{verbatim}
+
+The script to generate the LM is:
+
+\begin{verbatim}
+$> build-lm.sh -i "gunzip -c train.gz" -n 3  -o train.ilm.gz -k 5
+\end{verbatim}
+where the available options are:
+
+\begin{verbatim}
+-i    Input training file e.g. 'gunzip -c train.gz'
+-o    Output gzipped LM, e.g. lm.gz
+-k    Number of splits (default 5)
+-n    Order of language model (default 3)
+-t    Directory for temporary files (default ./stat)
+-p    Prune singleton n-grams (default false)
+-s    Smoothing: witten-bell (default), kneser-ney, improved-kneser-ney 
+-b    Include sentence boundary n-grams (optional)
+-d    Define subdictionary for n-grams (optional)
+-v    Verbose
+\end{verbatim}
+
+\noindent
+The script splits the estimation procedure into 5 distinct jobs, that are explained in
+the following section. There are other options that can be used. We recommend for instance to use pruning of singletons to get smaller LM files. 
+Notice that {\tt build-lm.sh} produces a LM file {\tt train.ilm.gz} that is NOT in the final ARPA format, but in
+an intermediate format called {\tt iARPA}, that is recognized by the {\tt compile-lm} 
+command and by the Moses SMT decoder running with {\IRSTLM}. 
+To convert the file into the standard ARPA format you can use the command:
+
+\begin{verbatim}
+$> compile-lm train.ilm.gz --text yes train.lm 
+\end{verbatim}
+this will create the proper ARPA file {\tt lm-final}.
+To create a gzipped file you might also use:
+\begin{verbatim}
+$> compile-lm train.ilm.gz --text yes /dev/stdout | gzip -c > train.lm.gz
+\end{verbatim}
+
+
+\noindent
+In the following sections, we will discuss on LM file formats, on compiling LMs into a 
+more compact and efficient binary format, and on querying LMs.
+
+\subsection{Estimating a LM with a Partial Dictionary}
+
+A sub-dictionary can be defined by just taking words occurring more than 5 times ({\tt -pf=5})
+and at most the top frequent 5000 words ({\tt -pr=5000}):
+\begin{verbatim}
+$>dict -i="gunzip -c train.gz" -o=sdict -pr=5000 -pf=5 
+\end{verbatim}
+
+
+\noindent
+The LM can be restricted to the defined sub-dictionary with the 
+command {\tt build-lm.sh} by using the option {\tt -d}: 
+\begin{verbatim}
+$> build-lm.sh -i "gunzip -c train.gz" -n 3  -o  train.ilm.gz -k 5 -p -d sdict 
+\end{verbatim}
+
+\noindent
+Notice that all words outside the sub-dictionary will be mapped into the {\tt <unk>}
+class, the probability of which will be directly estimated from the corpus statistics.
+A preferable alternative to this approach is to estimate a large LM and then to filter
+it according to a list of words (see Filtering a LM).
+
diff --git a/doc/installation.tex b/doc/installation.tex
new file mode 100644
index 0000000..166b61e
--- /dev/null
+++ b/doc/installation.tex
@@ -0,0 +1,87 @@
+\IMPORTANT{The installation procedure has been tested using the {\tt bash} shell on the following operating systems: Mac OSx 10.6.8 (Snow Leopard), Ubuntu 14.04 LTS (trusty), Scientific Linux release 6.3 (carbon).}
+
+\noindent In order to install {\IRSTLM} on your machine, please perform the following steps.
+
+
+\subsection{Step 0: Preparation of the Configuration Scripts}
+Run the following command to prepare up-to-date configuration scripts.
+\begin{verbatim}
+$> ./regenerate-makefiles.sh [--force]
+\end{verbatim}
+
+\WARNING{Run with the "--force" parameter if you want to recreate all links to the autotools.}
+
+\subsection{Step 1: Configuration of the Compilation}
+Run the following command to prepare up-to-date compilation scripts, and to optionally set the installation directory (parameter "{\tt -prefix}".
+\begin{verbatim}
+$> ./configure [--prefix=/path/to/install] [optional-parameters]
+\end{verbatim}
+
+You can set other optional parameters to modify the standard compilation behavior.
+\begin{verbatim}
+  --enable-doc|--disable-doc
+    Enable or Disable (default) creation of documentation
+  --enable-trace|--disable-trace
+    Enable (default) or Disable trace info at run-time
+  --enable-debugging|--disable-debugging
+    Enable or Disable (default) debugging info ("-g -O2")
+  --enable-profiling|--disable-profiling
+    Enable or Disable (default) profiling info
+  --enable-caching|--disable-caching
+    Enable or Disable (default) internal caches
+    to store probs and other info
+  --enable-interpolatedsearch|--disable-interpolatedsearch
+    Enable or Disable (default) interpolated search for n-grams
+  --enable-optimization|--disable-optimization
+    Enable or Disable (default) C++ optimization info ("-O3")
+\end{verbatim}
+
+
+\noindent
+Run the following command to get more details on the compilation options.
+\begin{verbatim}
+$> configure --help
+\end{verbatim}
+
+\subsection{Step 2: Compilation}
+
+\begin{verbatim}
+$> make clean
+$> make
+\end{verbatim}
+
+\subsection{Step 3: Installation}
+\begin{verbatim}
+$> make install
+\end{verbatim}
+
+\noindent
+Libraries and commands are generated,  respectively, under the directories\newline {\tt /path/to/install/lib} and {\tt /path/to/install/bin}.
+
+\noindent
+If enabled and PdfLatex is installed, this user manual (in pdf) is generated under the directory\newline {\tt /path/to/install/doc}.
+
+\noindent
+Although caching is not enabled by default, it is highly recommended to activate through its compilation flag "{\tt --enable-caching}".
+%See Section~\ref{sec:caching} to learn more.
+
+
+\subsection{Step 4: Environment Settings}
+Set the environment variable {\tt IRSTLM} to {\tt /path/to/install}.
+
+\noindent
+Include the command directory {\tt /path/to/install/bin} into your environment variable {\tt PATH}.
+For instance, you can run the following commands
+
+\begin{verbatim}
+$> export IRSTLM=/path/to/install/
+$> export PATH=${IRSTLM/bin:${PATH}
+\end{verbatim}
+
+
+
+\subsection{Step 5: Regression Tests}
+If the installation procedure succeeds, you can also run the regression tests to double-check the integrity of the software.
+Please go to Section~\ref{sec:regressionTests} to learn hot to run the regression tests.
+
+\noindent Regression tests should be run also in the case of any change made in the source code.
diff --git a/doc/interpolateLM.tex b/doc/interpolateLM.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/interpolatedLM.tex b/doc/interpolatedLM.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/introduction.tex b/doc/introduction.tex
new file mode 100644
index 0000000..92e0cc5
--- /dev/null
+++ b/doc/introduction.tex
@@ -0,0 +1,35 @@
+This manual illustrates the functionalities of  the IRST Language  Modeling (LM)  toolkit ({\IRSTLM}). It  should  
+put you quickly  in  the condition of:
+\begin{itemize}
+\item extracting the dictionary from a corpus
+\item extracting n-gram statistics from it
+\item estimating n-gram LMs using different smoothing criteria
+\item saving a LM into several textual and binary file
+\item adapting a LM on task-specific data
+\item estimating and handling gigantic LMs
+\item pruning a LM
+\item reducing LM size through quantization
+\item querying a LM through a command or script
+\end{itemize}
+
+\noindent
+{\IRSTLM} features very efficient algorithms and data structures suitable to estimate, store, and access very  large LMs. 
+
+\noindent
+{\IRSTLM} provides adaptation methods to effectively adapt generic LM to specific task when only little task-related data are available. 
+
+\noindent
+{\IRSTLM} provides standalone programs for all its functionalities, as well as library for its exploitation in other softwares, like for instance speech recognizers, machine translation decoders, and POS taggers.
+
+\noindent
+{\IRSTLM} has been integrated into a popular open source SMT decoder  called {\tt Moses}\footnote{http://www.statmt.org/moses/}, and is compatible with LMs created with other tools, such as the SRILM Tooolkit\footnote{http://www.speech.sri.com/projects/srilm}.
+
+
+\paragraph{Acknowledgments.}Users of this toolkit  might cite in their publications:
+\begin{quote}
+M. Federico,  N. Bertoldi,  M. Cettolo, {\em IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models}, Proceedings of Interspeech, Brisbane, Australia, pp. 1618-1621, 2008.
+\end{quote}
+
+\noindent
+References to introductory material on $n$-gram LMs are given in Appendix~\ref{sec:ReferenceMaterial}. 
+
diff --git a/doc/irstlm-manual.log b/doc/irstlm-manual.log
new file mode 100644
index 0000000..70c1007
--- /dev/null
+++ b/doc/irstlm-manual.log
@@ -0,0 +1,369 @@
+This is pdfTeXk, Version 3.1415926-1.40.9 (Web2C 7.5.7) (format=pdflatex 2009.2.11)  11 JAN 2015 20:10
+entering extended mode
+ %&-line parsing enabled.
+**irstlm-manual.tex
+(./irstlm-manual.tex
+LaTeX2e <2005/12/01>
+Babel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh
+yphenation, german-x-2008-06-18, ngerman-x-2008-06-18, ancientgreek, ibycus, ar
+abic, basque, bulgarian, catalan, pinyin, coptic, croatian, czech, danish, dutc
+h, esperanto, estonian, farsi, finnish, french, galician, german, ngerman, mono
+greek, greek, hungarian, icelandic, indonesian, interlingua, irish, italian, la
+tin, lithuanian, mongolian, mongolian2a, bokmal, nynorsk, polish, portuguese, r
+omanian, russian, sanskrit, serbian, slovak, slovenian, spanish, swedish, turki
+sh, ukenglish, ukrainian, uppersorbian, welsh, loaded.
+(/usr/local/texlive/2008/texmf-dist/tex/latex/base/article.cls
+Document Class: article 2005/09/16 v1.4f Standard LaTeX document class
+(/usr/local/texlive/2008/texmf-dist/tex/latex/base/size11.clo
+File: size11.clo 2005/09/16 v1.4f Standard LaTeX file (size option)
+)
+\c at part=\count79
+\c at section=\count80
+\c at subsection=\count81
+\c at subsubsection=\count82
+\c at paragraph=\count83
+\c at subparagraph=\count84
+\c at figure=\count85
+\c at table=\count86
+\abovecaptionskip=\skip41
+\belowcaptionskip=\skip42
+\bibindent=\dimen102
+)
+(/usr/local/texlive/2008/texmf-dist/tex/latex/preprint/fullpage.sty
+Package: fullpage 1999/02/23 1.1 (PWD)
+\FP at margin=\skip43
+)
+(/usr/local/texlive/2008/texmf-dist/tex/latex/psnfss/times.sty
+Package: times 2005/04/12 PSNFSS-v9.2a (SPQR) 
+)
+(/usr/local/texlive/2008/texmf-dist/tex/latex/base/latexsym.sty
+Package: latexsym 1998/08/17 v2.2e Standard LaTeX package (lasy symbols)
+\symlasy=\mathgroup4
+LaTeX Font Info:    Overwriting symbol font `lasy' in version `bold'
+(Font)                  U/lasy/m/n --> U/lasy/b/n on input line 47.
+)
+(/usr/local/texlive/2008/texmf-dist/tex/generic/epsf/epsf.sty
+This is `epsf.tex' v2.7.3 <23 July 2005>
+\epsffilein=\read1
+\epsfframemargin=\dimen103
+\epsfframethickness=\dimen104
+\epsfrsize=\dimen105
+\epsftmp=\dimen106
+\epsftsize=\dimen107
+\epsfxsize=\dimen108
+\epsfysize=\dimen109
+\pspoints=\dimen110
+) (/usr/local/texlive/2008/texmf-dist/tex/latex/graphics/graphicx.sty
+Package: graphicx 1999/02/16 v1.0f Enhanced LaTeX Graphics (DPC,SPQR)
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/graphics/keyval.sty
+Package: keyval 1999/03/16 v1.13 key=value parser (DPC)
+\KV at toks@=\toks14
+)
+(/usr/local/texlive/2008/texmf-dist/tex/latex/graphics/graphics.sty
+Package: graphics 2006/02/20 v1.0o Standard LaTeX Graphics (DPC,SPQR)
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/graphics/trig.sty
+Package: trig 1999/03/16 v1.09 sin cos tan (DPC)
+)
+(/usr/local/texlive/2008/texmf/tex/latex/config/graphics.cfg
+File: graphics.cfg 2007/01/18 v1.5 graphics configuration of teTeX/TeXLive
+)
+Package graphics Info: Driver file: pdftex.def on input line 90.
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/pdftex-def/pdftex.def
+File: pdftex.def 2008/09/08 v0.04l Graphics/color for pdfTeX
+\Gread at gobject=\count87
+))
+\Gin at req@height=\dimen111
+\Gin at req@width=\dimen112
+)
+(/usr/local/texlive/2008/texmf-dist/tex/latex/ltxmisc/version.sty)
+(/usr/local/texlive/2008/texmf-dist/tex/latex/graphics/color.sty
+Package: color 2005/11/14 v1.0j Standard LaTeX Color (DPC)
+
+(/usr/local/texlive/2008/texmf/tex/latex/config/color.cfg
+File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive
+)
+Package color Info: Driver file: pdftex.def on input line 130.
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/graphics/dvipsnam.def
+File: dvipsnam.def 1999/02/16 v3.0i Driver-dependant file (DPC,SPQR)
+))
+(/usr/local/texlive/2008/texmf-dist/tex/latex/ltxmisc/framed.sty
+Package: framed 2007/10/04 v 0.95: framed or shaded text with page breaks
+\fb at frw=\dimen113
+\fb at frh=\dimen114
+\FrameRule=\dimen115
+\FrameSep=\dimen116
+)
+No file irstlm-manual.aux.
+\openout1 = `irstlm-manual.aux'.
+
+LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 27.
+LaTeX Font Info:    ... okay on input line 27.
+LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 27.
+LaTeX Font Info:    ... okay on input line 27.
+LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 27.
+LaTeX Font Info:    ... okay on input line 27.
+LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 27.
+LaTeX Font Info:    ... okay on input line 27.
+LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 27.
+LaTeX Font Info:    ... okay on input line 27.
+LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 27.
+LaTeX Font Info:    ... okay on input line 27.
+LaTeX Font Info:    Try loading font information for OT1+ptm on input line 27.
+(/usr/local/texlive/2008/texmf-dist/tex/latex/psnfss/ot1ptm.fd
+File: ot1ptm.fd 2001/06/04 font definitions for OT1/ptm.
+)
+(/usr/local/texlive/2008/texmf-dist/doc/pdftex/manual/samplepdf/supp-pdf.tex
+(/usr/local/texlive/2008/texmf-dist/doc/pdftex/manual/samplepdf/supp-mis.tex
+loading : Context Support Macros / Miscellaneous (2004.10.26)
+\protectiondepth=\count88
+\scratchcounter=\count89
+\scratchtoks=\toks15
+\scratchdimen=\dimen117
+\scratchskip=\skip44
+\scratchmuskip=\muskip10
+\scratchbox=\box26
+\scratchread=\read2
+\scratchwrite=\write3
+\zeropoint=\dimen118
+\onepoint=\dimen119
+\onebasepoint=\dimen120
+\minusone=\count90
+\thousandpoint=\dimen121
+\onerealpoint=\dimen122
+\emptytoks=\toks16
+\nextbox=\box27
+\nextdepth=\dimen123
+\everyline=\toks17
+\!!counta=\count91
+\!!countb=\count92
+\recursecounter=\count93
+)
+loading : Context Support Macros / PDF (2004.03.26)
+\nofMPsegments=\count94
+\nofMParguments=\count95
+\MPscratchCnt=\count96
+\MPscratchDim=\dimen124
+\MPnumerator=\count97
+\everyMPtoPDFconversion=\toks18
+)
+LaTeX Font Info:    External font `cmex10' loaded for size
+(Font)              <12> on input line 34.
+LaTeX Font Info:    External font `cmex10' loaded for size
+(Font)              <8> on input line 34.
+LaTeX Font Info:    External font `cmex10' loaded for size
+(Font)              <6> on input line 34.
+LaTeX Font Info:    Try loading font information for U+lasy on input line 34.
+ (/usr/local/texlive/2008/texmf-dist/tex/latex/base/ulasy.fd
+File: ulasy.fd 1998/08/17 v2.2e LaTeX symbol font definitions
+) (..//RELEASE)
+LaTeX Font Info:    Font shape `OT1/ptm/bx/n' in size <10.95> not available
+(Font)              Font shape `OT1/ptm/b/n' tried instead on input line 40.
+LaTeX Font Info:    External font `cmex10' loaded for size
+(Font)              <10.95> on input line 51.
+LaTeX Font Info:    External font `cmex10' loaded for size
+(Font)              <9> on input line 51.
+LaTeX Font Info:    External font `cmex10' loaded for size
+(Font)              <5> on input line 51.
+LaTeX Font Info:    Try loading font information for OT1+pcr on input line 51.
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/psnfss/ot1pcr.fd
+File: ot1pcr.fd 2001/06/04 font definitions for OT1/pcr.
+) [1
+
+{/usr/local/texlive/2008/texmf-var/fonts/map/pdftex/updmap/pdftex.map}]
+LaTeX Font Info:    Font shape `OT1/ptm/bx/n' in size <14.4> not available
+(Font)              Font shape `OT1/ptm/b/n' tried instead on input line 64.
+
+No file irstlm-manual.toc.
+\tf at toc=\write4
+\openout4 = `irstlm-manual.toc'.
+
+[2] (./introduction.tex
+LaTeX Font Info:    Try loading font information for OMS+ptm on input line 4.
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/psnfss/omsptm.fd
+File: omsptm.fd 
+)
+LaTeX Font Info:    Font shape `OMS/ptm/m/n' in size <10.95> not available
+(Font)              Font shape `OMS/cmsy/m/n' tried instead on input line 4.
+
+
+LaTeX Warning: Reference `sec:ReferenceMaterial' on page 3 undefined on input l
+ine 34.
+
+) [3] (./installation.tex
+LaTeX Font Info:    Font shape `OT1/ptm/bx/n' in size <12> not available
+(Font)              Font shape `OT1/ptm/b/n' tried instead on input line 6.
+ [4]
+
+LaTeX Warning: Reference `sec:caching' on page 5 undefined on input line 65.
+
+
+LaTeX Warning: Reference `sec:regressionTests' on page 5 undefined on input lin
+e 84.
+
+) [5] (./gettingStarted.tex
+
+LaTeX Warning: Reference `sec:commands' on page 6 undefined on input line 8.
+
+
+LaTeX Warning: Reference `sec:functions' on page 6 undefined on input line 9.
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 14--14
+\OT1/ptm/m/n/10.95 All pro-grams as-sume that the en-vi-ron-ment vari-able \OT1
+/ptm/b/n/10.95 IRSTLM \OT1/ptm/m/n/10.95 is cor-rectly set to
+ []
+
+[6]) [7] (./LMFileFormats.tex [8] [9]
+LaTeX Font Info:    Try loading font information for OMS+pcr on input line 107.
+
+
+(/usr/local/texlive/2008/texmf-dist/tex/latex/psnfss/omspcr.fd
+File: omspcr.fd 
+)
+LaTeX Font Info:    Font shape `OMS/pcr/m/n' in size <10.95> not available
+(Font)              Font shape `OMS/cmsy/m/n' tried instead on input line 107.
+ [10] [11]
+[12]) [13] (./LMsmoothing.tex) (./mixtureLM.tex) (./interpolatedLM.tex)
+[14] (./ClassAndChunkLMs.tex
+Overfull \hbox (16.4244pt too wide) in paragraph at lines 28--28
+[] \OT1/pcr/m/n/10.95 <lmfilename> is a file containing a LM (format compatible
+ with {\IRSTLM})[] 
+ []
+
+[15] [16]
+Overfull \hbox (3.28441pt too wide) in paragraph at lines 175--175
+[]\OT1/pcr/m/n/10.95 $> compile-lm --eval test/test.w-micro cfgfile/cfg.2ndfld-
+map-cllps -d=1[] 
+ []
+
+) [17] (./dict.tex
+Overfull \hbox (11.94783pt too wide) in paragraph at lines 40--41
+[]\OT1/ptm/m/n/10.95 The in-put text can be also gen-er-ated on the fly by pass
+-ing a com-mand as value of the pa-ram-e-ter\OT1/pcr/m/n/10.95 InputFile
+ []
+
+[18] [19]) (./ngt.tex) (./tlm.tex [20] [21]) (./compileLM.tex)
+(./interpolateLM.tex) (./pruneLM.tex) (./quantizeLM.tex) (./LMAdaptation.tex
+Overfull \hbox (10.24498pt too wide) in paragraph at lines 29--29
+[]\OT1/pcr/m/n/10 $> tlm -tr=train.www -lm=wb -n=3 -te=test -dub=1000000 -ad=ad
+apt -ar=0.8 -ao=yes[] 
+ []
+
+
+Overfull \hbox (4.24498pt too wide) in paragraph at lines 43--43
+[]\OT1/pcr/m/n/10 $> tlm -tr=train-adapt.www -lm=msb -n=3 -te=test -dub=1000000
+ -ad=adapt -ar=0.8[] 
+ []
+
+[22]) (./giganticLM.tex [23]
+Overfull \hbox (9.8544pt too wide) in paragraph at lines 51--51
+[]\OT1/pcr/m/n/10.95 $> compile-lm train.ilm.gz --text yes /dev/stdout | gzip -
+c > train.lm.gz[] 
+ []
+
+
+Overfull \hbox (42.70436pt too wide) in paragraph at lines 72--72
+[]\OT1/pcr/m/n/10.95 $> build-lm.sh -i "gunzip -c train.gz" -n 3  -o  train.ilm
+.gz -k 5 -p -d sdict[] 
+ []
+
+) [24] (./LMPruning.tex) [25] (./LMQuantization.tex) [26] (./LMCompilation.tex
+Underfull \hbox (badness 10000) in paragraph at lines 7--12
+
+ []
+
+) [27] (./LMInterpolation.tex
+Underfull \hbox (badness 10000) in paragraph at lines 19--23
+
+ []
+
+) [28] (./LMFiltering.tex) [29] (./ParallelComputation.tex
+
+LaTeX Warning: Reference `sec:giganticLM' on page 30 undefined on input line 17
+.
+
+) [30] (./LMInterface.tex
+Underfull \hbox (badness 10000) in paragraph at lines 33--35
+
+ []
+
+
+Overfull \hbox (16.4244pt too wide) in paragraph at lines 51--51
+[]\OT1/pcr/m/n/10.95 $> tlm -tr=train.www -n=3 -lm=wb -te=test -o=train.lm -ps=
+no -dub=10000000[] 
+ []
+
+
+Overfull \hbox (4.24498pt too wide) in paragraph at lines 74--74
+[]\OT1/pcr/m/n/10 %% sent_Nw=1 sent_PP=23.22 sent_PPwp=0.00 sent_Nbo=0 sent_Noo
+v=0 sent_OOV=0.00%[] 
+ []
+
+
+Overfull \hbox (40.24498pt too wide) in paragraph at lines 74--74
+[]\OT1/pcr/m/n/10 %% sent_Nw=8 sent_PP=7489.50 sent_PPwp=7356.27 sent_Nbo=7 sen
+t_Noov=2 sent_OOV=25.00%[] 
+ []
+
+
+Overfull \hbox (22.24498pt too wide) in paragraph at lines 74--74
+[]\OT1/pcr/m/n/10 %% sent_Nw=9 sent_PP=1231.44 sent_PPwp=0.00 sent_Nbo=14 sent_
+Noov=0 sent_OOV=0.00%[] 
+ []
+
+
+Overfull \hbox (58.24498pt too wide) in paragraph at lines 74--74
+[]\OT1/pcr/m/n/10 %% sent_Nw=6 sent_PP=27759.10 sent_PPwp=25867.42 sent_Nbo=19 
+sent_Noov=1 sent_OOV=16.67%[] 
+ []
+
+[31]
+Overfull \hbox (34.24498pt too wide) in paragraph at lines 74--74
+[]\OT1/pcr/m/n/10 %% sent_Nw=5 sent_PP=378.38 sent_PPwp=0.00 sent_Nbo=39893 sen
+t_Noov=0 sent_OOV=0.00%[] 
+ []
+
+
+Overfull \hbox (64.24498pt too wide) in paragraph at lines 74--74
+[]\OT1/pcr/m/n/10 %% sent_Nw=15 sent_PP=4300.44 sent_PPwp=2831.89 sent_Nbo=3990
+7 sent_Noov=1 sent_OOV=6.67%[] 
+ []
+
+) [32] (./regressionTests.tex) [33] (./referenceMaterial.tex) [34]
+(./releaseNotes.tex [35] [36] [37] [38]) [39] (./irstlm-manual.aux)
+
+LaTeX Warning: There were undefined references.
+
+
+LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right.
+
+ ) 
+Here is how much of TeX's memory you used:
+ 1619 strings out of 493876
+ 22592 string characters out of 1150568
+ 72741 words of memory out of 3000000
+ 4790 multiletter control sequences out of 10000+50000
+ 27471 words of font info for 63 fonts, out of 3000000 for 5000
+ 714 hyphenation exceptions out of 8191
+ 25i,8n,19p,394b,297s stack positions out of 5000i,500n,10000p,200000b,50000s
+{/usr/local/texlive/2008/texmf-dist/fonts/enc/dvips/base/8r.enc}</usr/local/t
+exlive/2008/texmf-dist/fonts/type1/bluesky/cm/cmmi10.pfb></usr/local/texlive/20
+08/texmf-dist/fonts/type1/bluesky/cm/cmmi12.pfb></usr/local/texlive/2008/texmf-
+dist/fonts/type1/bluesky/cm/cmr10.pfb></usr/local/texlive/2008/texmf-dist/fonts
+/type1/bluesky/cm/cmr8.pfb></usr/local/texlive/2008/texmf-dist/fonts/type1/blue
+sky/cm/cmsy10.pfb></usr/local/texlive/2008/texmf-dist/fonts/type1/urw/courier/u
+crr8a.pfb></usr/local/texlive/2008/texmf-dist/fonts/type1/urw/times/utmb8a.pfb>
+</usr/local/texlive/2008/texmf-dist/fonts/type1/urw/times/utmr8a.pfb></usr/loca
+l/texlive/2008/texmf-dist/fonts/type1/urw/times/utmri8a.pfb>
+Output written on irstlm-manual.pdf (39 pages, 155611 bytes).
+PDF statistics:
+ 166 PDF objects out of 1000 (max. 8388607)
+ 0 named destinations out of 1000 (max. 131072)
+ 1 words of extra memory for PDF output out of 10000 (max. 10000000)
+
diff --git a/doc/irstlm-manual.tex b/doc/irstlm-manual.tex
new file mode 100644
index 0000000..c566d9f
--- /dev/null
+++ b/doc/irstlm-manual.tex
@@ -0,0 +1,229 @@
+\documentclass[11pt]{article}
+\usepackage{fullpage}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{epsf}
+\usepackage{graphicx}
+\usepackage{version}
+\usepackage[usenames,dvipsnames]{color}
+
+%\usepackage{mdframed}
+\usepackage{framed}
+
+\newcommand{\IRSTLM}{{\bf IRSTLM Toolkit}}
+
+
+\newcommand*{\MyPath}{../}
+\newcommand{\versionnumber}{\input{\MyPath/RELEASE}}
+
+%\newcommand{\IMPORTANT}[1]{\begin{mdframed}[linecolor=red]\noindent #1\end{mdframed}}
+\newcommand{\IMPORTANT}[1]{\begin{framed}\noindent #1\end{framed}}
+\newcommand{\WARNING}[1]{\paragraph{Warning:} #1}
+\newcommand{\NOTE}[1]{\textcolor{red}{\bf Note}: #1}
+\newcommand{\COMMENT}[1]{}
+
+\def\thesubsubsection{\thesubsection.\alph{subsubsection}} 
+
+\begin{document}   
+
+\title{IRST Language Modeling Toolkit \\USER MANUAL}
+			         
+\author{M. Federico, N. Bertoldi, M. Cettolo\\FBK-irst, Trento, Italy}			       
+\date{\today}
+ 
+\maketitle
+\centerline{Version \versionnumber}
+%% INTRODUCTION %%%%
+
+\vspace*{3cm}
+\noindent
+The official website of {\IRSTLM} is
+
+\bigskip
+
+{\bf http://hlt.fbk.eu/en/irstlm}
+
+\bigskip
+\noindent It contains this manual, source code, examples and regression tests.
+
+\vspace*{1cm}
+\noindent
+{\IRSTLM} is distributed under the GNU General Public License version 3 (GPLv3).\footnote{\tt http://www.gnu.org/licenses/gpl-3.0.html}
+
+\vspace*{1cm}
+\noindent
+Users of {\IRSTLM}  might cite in their publications:
+\begin{quote}
+M. Federico,  N. Bertoldi,  M. Cettolo, {\em IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models}, Proceedings of Interspeech, Brisbane, Australia, pp. 1618-1621, 2008.
+\end{quote}
+
+
+
+\newpage
+\setcounter{tocdepth}{2}  
+\tableofcontents
+
+
+%%%% INTRODUCTION %%%%%%%%%
+\newpage
+\section{Introduction}
+\label{sec:introduction}
+\input{introduction}
+
+
+%%%% INSTALLATION %%%%%%%%%
+\newpage
+\section{Installation}
+\label{sec:installation}
+\input{installation}
+
+%%%% GETTING STARTED %%%%%%%%%
+\newpage
+\section{Getting started}
+\label{sec:gettingStarted}
+\input{gettingStarted}
+
+%%%% LM FORMATS %%%%%%%%%%%
+\newpage
+\section{LM File Formats}
+\label{sec:LMFileFormats}
+\input{LMFileFormats}
+
+%%%% LM TYPES %%%%%%%%%
+\newpage
+\section{LM Types}
+\label{sec:LMTypes}
+
+%%%% LM SMOOTHING %%%%%%%%%
+\subsection{LM smoothing}
+\label{sec:LMSmoothing}
+\input{LMSmoothing}
+
+%%%% MIXTURE LM %%%%%%%%%
+\subsection{Mixture LM}
+\label{sec:mixtureLM}
+\input{mixtureLM}
+
+%%%% INTERPOLATED LM %%%%%%%%%
+\subsection{Interpolated LM}
+\label{sec:InterpolatedLM}
+\input{interpolatedLM}
+
+%%%% CHUNK LM %%%%%%%%%
+\newpage
+\subsection{Class and Chunk LMs}
+\label{sec:ClassAndChunkLMs}
+\input{ClassAndChunkLMs}
+
+%%%% IRSTLM COMMANDS %%%%%%%%%
+\newpage
+\section{IRSTLM commands}
+\label{sec:commands}
+
+\subsection{dict}
+\label{sec:dict}
+\input{dict}
+
+\subsection{ngt}
+\label{sec:ngt}
+\input{ngt}
+
+\subsection{tlm}
+\label{sec:tlm}
+\input{tlm}
+
+\subsection{compile-lm}
+\label{sec:compileLM}
+\input{compileLM}
+
+\subsection{interpolate-lm}
+\label{sec:interpolateLM}
+\input{interpolateLM}
+
+\subsection{prune-lm}
+\label{sec:pruneLM}
+\input{pruneLM}
+
+\subsection{quantize-lm}
+\label{sec:quantizeLM}
+\input{quantizeLM}
+
+
+%% LM ADAPTATION %%%%
+\section{IRSTLM functions}
+\label{sec:functions}
+
+\subsection{LM Adaptation}
+\label{sec:LMAdaptation}
+\input{LMAdaptation}
+
+
+%% ESTIMATING GIGANTIC LMs %%%%
+\subsection{Estimating Gigantic LMs}
+\label{sec:giganticLM}
+\input{giganticLM}
+
+
+%%%% LM PRUNING %%%%%
+\newpage
+\subsection{LM Pruning}
+\label{sec:LMPruning}
+\input{LMPruning}
+
+%%%% LM QUANTIZATION %%%%%
+\newpage
+\subsection{LM Quantization}
+\label{sec:LMQuantization}
+\input{LMQuantization}
+
+%%%% LM COMPILATION %%%%%
+\newpage
+\subsection{LM Compilation}
+\label{sec:LMCompilation}
+\input{LMCompilation}
+
+%%%% LM INTERPOLATION %%%%%%%%%
+\newpage
+\subsection{LM Interpolation}
+\label{sec:LMInterpolation}
+\input{LMInterpolation}
+
+\newpage
+\subsection{Filtering a LM}
+\label{sec:LMFiltering}
+\input{LMFiltering}
+
+%%%% PARALLEL COMPUTATION %%%%%%%%%
+\newpage
+\section{Parallel Computation}
+\label{sec:ParallelComputation}
+\input{parallelComputation}
+
+%%%% LM INTERFACE %%%%%%%%%
+\newpage
+\section{IRSTLM Interface}
+\label{sec:LMInterface}
+\input{LMInterface}
+
+%%%% REGRESSION TESTS %%%%%%%%%
+\newpage
+\section{Regression Tests}
+\label{sec:regressionTests}
+\input{regressionTests}
+
+%%%% APPENDIX %%%%%%%%%
+\appendix
+
+
+\newpage
+\section{Reference Material}
+\label{sec:ReferenceMaterial}
+\input{referenceMaterial}
+
+\newpage
+\section{Release Notes}
+\label{sec:releaseNotes}
+\input{releaseNotes}
+
+
+\end{document}
diff --git a/doc/mdframed.sty b/doc/mdframed.sty
new file mode 100644
index 0000000..9d41a69
--- /dev/null
+++ b/doc/mdframed.sty
@@ -0,0 +1,1309 @@
+%% This is file `mdframed.sty',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% mdframed.dtx  (with options: `package')
+%% ----------------------------------------------------------------
+%% Working with the command fbox or fcolorbox, one has to
+%% handle page breaks by hand. The present package defines the
+%% environment mdframed which automatically deals with page breaks.
+%% 
+%% Author's name: Marco Daniel and Elke Schubert (!new)
+%% License type: lppl
+%% 
+%% ==================================================
+%% ========Is based on the idea of framed.sty========
+%% ==================================================
+%% ===== Currently the package has a beta-Status ====
+%% ==================================================
+%%  WITH THANKS TO (alphabetically):
+%%  ROLF NIEPRASCHK
+%%  HEIKO OBERDIEK
+%%  HERBERT VOSS
+%% 
+%%  Copyright (c) 2010 Marco Daniel
+%% 
+%%  This package may be distributed under the terms of the LaTeX Project
+%%  Public License, as described in lppl.txt in the base LaTeX distribution.
+%%  Either version 1.0 or, at your option, any later version.
+%% 
+%% 
+%% =================================================
+%%  Erstellung eines Rahmens, der am Seitenende keine
+%%  horizontale Linie einfuegt
+%% >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
+%%       _______________
+%%       |    page 1   |
+%%       |    Text     |
+%%       |  __Text__   |
+%%       |  | Text |   |
+%%      P A G E B R E A K
+%%       |  | Text |   |
+%%       |  |_Text_|   |
+%%       |    Text     |
+%%       |____page 2___|
+%% 
+%% >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
+%% ==================================================
+%% 
+\def\mdversion{v1.6b}
+\def\mdframedpackagename{mdframed}
+\def\mdf at maindate@svn$#1: #2 #3 #4-#5-#6 #7 #8${#4/#5/#6\space }
+\NeedsTeXFormat{LaTeX2e}
+\ProvidesPackage{mdframed}%
+     [\mdf at maindate@svn$Id: mdframed.dtx 426 2012-06-02 12:18:56Z marco $%
+      \mdversion: \mdframedpackagename]
+\newcommand*\mdf at PackageError[1]{\PackageError{\mdframedpackagename}{#1}}
+\newcommand*\mdf at PackageWarning[1]{\PackageWarning{\mdframedpackagename}{#1}}
+\newcommand*\mdf at PackageInfo[1]{\PackageInfo{\mdframedpackagename}{#1}}
+\newcommand*\mdf at LoadFile@IfExist[1]{%
+ \IfFileExists{#1.sty}{%
+          \RequirePackage{#1}%
+        }{%
+        \mdf at PackageWarning{The file #1 does not exist\MessageBreak
+                           but needed by \mdframedpackagename\MessageBreak
+                           see documentation fo further information
+                           }%
+       }
+}
+\RequirePackage{kvoptions}
+\RequirePackage{xparse}
+\RequirePackage{etoolbox}[2011/01/03]
+\RequirePackage{zref-abspage}
+\RequirePackage{color}
+\SetupKeyvalOptions{family=mdf,prefix=mdf@}
+
+\newlength{\mdf at templength}
+\def\mdf at iflength#1{%
+  \afterassignment\mdf at iflength@check%
+  \mdf at templength=#1\mdf at defaultunit\relax\relax
+  \expandafter\endgroup\next
+}
+\def\mdf at iflength@check#1{%
+  \begingroup
+  \ifx\relax#1\@empty
+    \def\next{\@secondoftwo}
+  \else
+    \def\next{\@firstoftwo}
+    \expandafter\mdf at iflength@cleanup
+  \fi
+}
+\def\mdf at iflength@cleanup#1\relax{}
+\DeclareListParser*{\mdf at dolist}{,}
+\newrobustcmd*{\mdf at option@length}[2]{%
+ \expandafter\newlength\csname mdf@#1 at length\endcsname%
+ \expandafter\setlength\csname mdf@#1 at length\endcsname{#2}%
+ }
+\newrobustcmd*{\mdf at define@key at length}[1]{%
+   \define at key{mdf}{#1}{%
+      \def\@tempa{##1}
+      \mdf at iflength{\@tempa}%
+       {\csxdef{mdfl@#1}{\the\mdf at templength}}%
+       {\csxdef{mdfl@#1}{\the\mdf at templength}}%
+       \setlength{\csname mdf@#1 at length\endcsname}{\csname mdfl@#1\endcsname}%
+   }%
+}
+\def\mdf at do@lengthoption#1{%
+  \mdf at lengthoption@doubledo#1\@nil%
+}
+\def\mdf at lengthoption@doubledo#1==#2\@nil{%
+   \mdf at option@length{#1}{#2}%
+   \mdf at define@key at length{#1}%
+}
+\def\mdf at do@stringoption#1{%
+   \mdf at stringoption@doubledo#1\@nil%
+}
+\def\mdf at stringoption@doubledo#1==#2\@nil{%
+   \expandafter\gdef\csname mdf@#1\endcsname{#2}%
+   \define at key{mdf}{#1}{%
+      \csdef{mdf@#1}{##1}%
+   }%
+}
+\def\mdf at do@booloption#1{%
+   \mdf at booloption@doubledo#1\@nil%
+}
+\def\mdf at booloption@doubledo#1==#2\@nil{%
+   \newbool{mdf@#1}\setbool{mdf@#1}{#2}%
+   \define at key{mdf}{#1}[#2]{%
+      \setbool{mdf@#1}{##1}%
+   }%
+}
+\def\mdf at do@alignoption#1{%
+   \mdf at alignoption@tripledo#1\@nil%
+}
+\def\mdf at alignoption@tripledo#1==#2==#3\@nil{%
+   \csdef{mdf at align@#1 at left}{\null\hspace*{#2}}%
+   \csdef{mdf at align@#1 at right}{\hspace*{#3}\null}%
+}
+\newcounter{mdf at globalstyle@cnt}
+\defcounter{mdf at globalstyle@cnt}{0}
+\newcommand*\mdfglobal at style{0}
+\define at key{mdf}{style}{%
+  \mdf at PackageWarning{package option style is depreciated^^J
+                      use framemethod instead\MessageBreak}%
+  \renewcommand*\mdfglobal at style{#1}%
+  \defcounter{mdf at globalstyle@cnt}{#1}%
+  \ifcase\value{mdf at globalstyle@cnt}\relax
+     \or\mdf at LoadFile@IfExist{tikz}%=1
+     \or\mdf at LoadFile@IfExist{pstricks-add}%=2
+     \or\defcounter{mdf at globalstyle@cnt}{2}%=3
+        \mdf at LoadFile@IfExist{pst-node}%
+     \or\mdf at LoadFile@IfExist{pst-node}%=4
+  \else%%>4
+     \mdf at PackageWarning{Unknown global style \value{mdf at globalstyle@cnt}}%
+  \fi%
+}
+\providecommand*\mdf at framemethod{}
+\def\mdf at framemethod@i{}%
+\def\mdf at framemethod@ii{}%
+\def\mdf at framemethod@iii{}%
+\define at key{mdf}{framemethod}[default]{%
+  \lowercase{\def\mdf at tempa{#1}}%lowercase not expandable
+  \forcsvlist{\listadd\mdf at framemethod@i}{default,tex,latex,none,0}
+  \forcsvlist{\listadd\mdf at framemethod@ii}{pgf,tikz,1}
+  \forcsvlist{\listadd\mdf at framemethod@iii}{pstricks,ps,2,postscript}
+  \xifinlist{\mdf at tempa}{\mdf at framemethod@i}%
+    {\def\mdf@@framemethod{default}\defcounter{mdf at globalstyle@cnt}{0}}%
+    {\xifinlist{\mdf at tempa}{\mdf at framemethod@ii}%
+       {\def\mdf@@framemethod{tikz}\defcounter{mdf at globalstyle@cnt}{1}}%
+       {\xifinlist{\mdf at tempa}{\mdf at framemethod@iii}%
+          {\def\mdf@@framemethod{pstricks}\defcounter{mdf at globalstyle@cnt}{2}}%
+          {\mdf at LoadFile@IfExist{#1}}%
+       }%
+    }%
+  \ifcase\value{mdf at globalstyle@cnt}\relax%
+     \or\mdf at LoadFile@IfExist{tikz}%=1
+     \or\mdf at LoadFile@IfExist{pst-node}%=2
+     \or\mdf at LoadFile@IfExist{pst-node}%=3
+  \fi%
+}
+\mdf at dolist{\mdf at do@lengthoption}{%
+   {skipabove==\z@},%
+   {skipbelow==\z@},%
+   {leftmargin==\z@},%
+   {rightmargin==\z@},%
+   {innerleftmargin==10pt},%
+   {innerrightmargin==10pt},%
+   {innertopmargin==0.4\baselineskip},%
+   {innerbottommargin==0.4\baselineskip},%
+   {splittopskip==\z@},%
+   {splitbottomskip==\z@},%
+   {outermargin==\z@},%
+   {innermargin==\z@},%
+   {linewidth==0.4pt},%
+   {innerlinewidth==\z@},%
+   {middlelinewidth==\expandafter\mdf at linewidth@length},%
+   {outerlinewidth==\z@},%
+   {roundcorner==\z@},%
+   {footenotedistance==\medskipamount},
+   {userdefinedwidth==\linewidth},
+   {frametitleaboveskip==5pt},
+   {frametitlebelowskip==5pt},
+   {frametitlerulewidth==.2pt},
+   {frametitleleftmargin==10pt},%
+   {frametitlerightmargin==10pt},%
+   {shadowsize==8pt},%
+   {extratopheight==\z@},%
+   {subtitleabovelinewidth==.8pt},%
+   {subtitlebelowlinewidth==.6pt},%
+   {subtitleaboveskip==\baselineskip},%
+   {subtitlebelowskip==1.2\baselineskip},%
+   {subtitleinneraboveskip==.5\baselineskip},%
+   {subtitleinnerbelowskip==.5\baselineskip},%
+   {subsubtitleabovelinewidth==.8pt},%
+   {subsubtitlebelowlinewidth==.6pt},%
+   {subsubtitleaboveskip==\baselineskip},%
+   {subsubtitlebelowskip==1.2\baselineskip},%
+   {subsubtitleinneraboveskip==.5\baselineskip},%
+   {subsubtitleinnerbelowskip==.5\baselineskip},%
+}
+\mdf at dolist{\mdf at do@stringoption}{%
+    {frametitle=={}},%
+    {defaultunit==pt},%
+    {linecolor==black},%
+    {backgroundcolor==white},%
+    {fontcolor==black},%
+    {frametitlefontcolor==black},%
+    {innerlinecolor==\mdf at linecolor},%
+    {outerlinecolor==\mdf at linecolor},%
+    {middlelinecolor==\mdf at linecolor},%
+    {psroundlinecolor==\mdf at backgroundcolor},%
+    {frametitlerulecolor==\mdf at linecolor},
+    {frametitlebackgroundcolor==\mdf at backgroundcolor},%
+    {shadowcolor==black!50},%
+    {settings=={}},%
+    {frametitlesettings=={}},%
+    {font=={}},%
+    {frametitlefont==\normalfont\bfseries},%
+    {printheight==none},%
+    {alignment=={}},%
+    {frametitlealignment=={}},%
+    {theoremseparator=={:}},%
+    {theoremcountersep=={.}},%
+    {theoremtitlefont=={}},%
+    {theoremspace=={\space}},%
+    {singleextra=={}},
+    {firstextra=={}},
+    {middleextra=={}},
+    {secondextra=={}},
+    {subtitlefont==\normalfont\bfseries},%
+    {subsubtitlefont==\normalfont},%
+    {subtitlebackgroundcolor==white},%
+    {subsubtitlebackgroundcolor==white},%
+    {subtitleabovelinecolor==black},%
+    {subtitlebelowlinecolor==black},%
+    {subsubtitleabovelinecolor==black},%
+    {subsubtitlebelowlinecolor==black},%
+}
+\mdf at dolist{\mdf at do@booloption}{%
+    {ntheorem==false},%
+    {topline==true},%
+    {leftline==true},%
+    {bottomline==true},%
+    {rightline==true},%
+    {frametitletopline==true},%
+    {frametitleleftline==true},%
+    {frametitlebottomline==true},%
+    {frametitlerightline==true},%
+    {frametitlerule==false},%
+    {nobreak==false},%
+    {footnoteinside==true},%
+    {usetwoside==true},%
+    {repeatframetitle==false},%Noch nicht richtig implementiert
+    {shadow==false},%
+    {everyline==false},%
+    {ignorelastdescenders==false},%
+    {subtitleaboveline==false},
+    {subtitlebelowline==false},
+    {subsubtitleaboveline==false},
+    {subsubtitlebelowline==false},
+}
+%%special boolflag hidealllines:
+\newbool{mdf at hidealllines}%
+\define at key{mdf}{hidealllines}[false]{%
+\setbool{mdf at hidealllines}{#1}%
+ \ifbool{mdf at hidealllines}{%
+   \kvsetkeys{mdf}{leftline=false,topline=false,%
+                   rightline=false,bottomline=false}%
+ }{}%
+}
+\mdf at dolist{\mdf at do@alignoption}{%
+    {left==\mdf at leftmargin@length==\z@},%
+    {center==\fill==\fill},%
+    {right==\fill==\mdf at rightmargin@length},%
+    {outer==\fill==\mdf at rightmargin@length},%not supported yet
+    {outer==\mdf at leftmargin@length==\fill},%not supported yet
+}
+\newcommand*\mdf at align{}%
+\newcommand*\mdf at makeboxalign@left{\null\hspace*{\mdf at leftmargin@length}}%
+\newcommand*\mdf at makeboxalign@right{}%
+\define at key{mdf}{align}[left]{%
+   \ifcsundef{mdf at align@#1 at left}{%
+       \mdf at PackageWarning{Unknown alignment #1\MessageBreak}%
+       \letcs\mdf at makeboxalign@left{mdf at align@left at left}%
+       \letcs\mdf at makeboxalign@right{mdf at align@left at right}%
+   }{%
+       \def\mdf at makeboxalign@left{\csuse{mdf at align@#1 at left}}%
+       \def\mdf at makeboxalign@right{\csuse{mdf at align@#1 at right}}%
+   }%
+}
+\def\mdf at tikzset@local{\tikzset{tikzsetting/.style={}}}
+\define at key{mdf}{tikzsetting}{%
+  \def\mdf at tikzset@local{\tikzset{tikzsetting/.style={#1}}}%
+}
+\define at key{mdf}{apptotikzsetting}{%
+  \appto\mdf at tikzset@local{#1}%
+}
+\def\mdf at psset@local{}
+\define at key{mdf}{pstrickssetting}{%
+  \def\mdf at psset@local{#1}
+}
+\def\mdfpstricks at appendsettings{}
+\define at key{mdf}{pstricksappsetting}{%
+  \def\mdfpstricks at appendsettings{#1}%
+}
+\def\mdf at xcolor{}
+\define at key{mdf}{xcolor}[]{%
+  \def\@tempa{#1}%
+  \@ifpackageloaded{xcolor}{%
+     \let\mdf at xcolor\@empty %ignoriere die Eingabe der Optionen
+     \def\@tempa{}%
+     }{}%
+  \ifx\relax\@tempa\relax\else
+     \PassOptionsToPackage{\mdf at xcolor}{xcolor}%
+      \RequirePackage{xcolor}%
+  \fi%
+}%
+\define at key{mdf}{needspace}[\z@]{%
+     \begingroup%
+        \setlength{\dimen@}{#1}%
+        \vskip\z@\@plus\dimen@%
+        \penalty -100\vskip\z@\@plus -\dimen@%
+        \vskip\dimen@%
+        \penalty 9999%
+        \vskip -\dimen@%
+        \vskip\z at skip % hide the previous |\vskip| from |\addvspace|
+      \endgroup%
+}
+\DeclareDefaultOption{%
+   \mdf at PackageError{Unknown Option '\CurrentOption' for mdframed}}
+\ProcessKeyvalOptions*\relax
+\newrobustcmd*{\mdfsetup}{\kvsetkeys{mdf}}
+\define at key{mdf}{style}{%
+  \ifcsundef{mdf at definestyle@#1}{%
+    \mdf at PackageWarning{Unknown definedstyle #1^^J
+                       You have to define a style ^^J
+                       via \string\mdfdefinedstyle\MessageBreak
+                      }%
+   }%
+   {\expandafter\expandafter\expandafter\mdfsetup\expandafter%
+    \expandafter\expandafter{\csname mdf at definestyle@#1\endcsname}}%
+}%
+\let\mdf at PackageNoInfo\@gobble
+\newrobustcmd*\mdf at ifstrequal@expand{%
+\expandafter\ifstrequal\expandafter{\mdf at printheight}%
+}
+\newrobustcmd*\mdf at print@space{%
+  %case "none"
+  \mdf at ifstrequal@expand{none}{\def\mdf at tempa{NoInfo}}{%
+      %case "info"
+      \mdf at ifstrequal@expand{info}{\def\mdf at tempa{Info}}{%
+         %case "warning"
+         \mdf at ifstrequal@expand{warning}{\def\mdf at tempa{Warning}}{%
+            %case "unknown"
+            \mdf at PackageWarning{Unknown key for printheight=\mdf at printheight^^J
+                               use none, info or warning}%
+             \def\mdf at tempa{none}%
+         }%
+      }%
+  }%
+\def\mdf at PackageInfoSpace{\csname mdf at Package\mdf at tempa\endcsname}%
+}
+\newsavebox\mdf at frametitlebox
+\newsavebox\mdf at footnotebox
+\newsavebox\mdf at splitbox@one
+\newsavebox\mdf at splitbox@two
+\newsavebox\mdf at splitbox@save
+\newlength\mdfsplitboxwidth
+\newlength\mdfsplitboxtotalwidth
+\newlength\mdfsplitboxheight
+\newlength\mdfsplitboxdepth
+\newlength\mdfsplitboxtotalheight
+\newlength\mdfframetitleboxwidth
+\newlength\mdfframetitleboxtotalwidth
+\newlength\mdfframetitleboxheight
+\newlength\mdfframetitleboxdepth
+\newlength\mdfframetitleboxtotalheight
+\newlength\mdffootnoteboxwidth
+\newlength\mdffootnoteboxtotalwidth
+\newlength\mdffootnoteboxheight
+\newlength\mdffootnoteboxdepth
+\newlength\mdffootnoteboxtotalheight
+
+\newlength\mdftotallinewidth
+
+\newlength\mdfboundingboxwidth
+\newlength\mdfboundingboxtotalwidth
+
+\newlength\mdfboundingboxheight
+\newlength\mdfboundingboxdepth
+\newlength\mdfboundingboxtotalheight
+
+\newlength\mdf at freevspace@length
+\newlength\mdf at horizontalwidthofbox@length
+\newlength\mdf at verticalmarginwhole@length
+
+\newtoggle{mdf at notfirstframetitle}%
+\togglefalse{mdf at notfirstframetitle}%
+
+\newrobustcmd\mdfcreateextratikz{}
+
+\def\mdf at lrbox#1{%
+%%patch to work with amsthm
+  \mdf at patchamsthm
+%%%end patch
+ \edef\mdf at restoreparams{%
+   \parindent=\the\parindent\relax \parskip=\the\parskip\relax}%
+ \setbox#1\vbox\bgroup%
+   \color at begingroup%
+     \mdf at horizontalmargin@equation%
+     \columnwidth=\hsize%
+     \textwidth=\hsize%
+     \let\if at nobreak\iffalse%
+     \let\if at noskipsec\iffalse%
+     \let\par\@@par%
+     \let\-\@dischyph%
+     \let\'\@acci\let\`\@accii\let\=\@acciii%
+     \parindent\z@ \parskip\z at skip%
+     \linewidth\hsize%
+     \@totalleftmargin\z@%
+     \leftskip\z at skip \rightskip\z at skip \@rightskip\z at skip%
+     \parfillskip\@flushglue \lineskip\normallineskip%
+     \baselineskip\normalbaselineskip%
+%%  \sloppy%
+     \let\\\@normalcr%
+     \mdf at restoreparams\relax%
+     \@afterindentfalse%
+     \@afterheading%
+}
+
+\def\endmdf at lrbox{\color at endgroup\egroup}
+
+\newrobustcmd*\mdf at ignorevbadness{%
+   \edef\mdf at currentvbadness{\the\vbadness}%
+   \vbadness=\@M%
+   \afterassignment\mdf at restorevbadness}
+\newrobustcmd*\mdf at restorevbadness{\vbadness=\mdf at currentvbadness\relax}
+\@ifpackageloaded{amsthm}%
+{%
+ \newrobustcmd\mdf at patchamsthm{%
+   \let\mdf at deferred@thm at head\deferred at thm@head
+   \patchcmd{\deferred at thm@head}{\indent}{}%
+      {\mdf at PackageInfo{mdframed detected package amsthm ^^J
+                        changed the theoerem header of amsthm\MessageBreak}%
+      }{%
+       \mdf at PackageError{mdframed detected package amsthm ^^J
+                         changed the theoerem header of amsthm
+                         failed\MessageBreak}%
+       }%
+     }%
+}{\let\mdf at patchamsthm\relax}%
+\def\mdf at trivlist#1{%
+  \setlength{\topsep}{#1}%
+  \partopsep\z@%
+  \parsep\z@%
+  \@nmbrlistfalse%
+  \@trivlist%
+  \labelwidth\z@%
+  \leftmargin\z@%
+  \itemindent\z@%
+  \let\@itemlabel\@empty%
+  \def\makelabel##1{##1}%
+%%  \item\leavevmode\hrule \@height\z@ \@width\linewidth\relax%
+%%  \item\mbox{}\relax% second version
+  \item\relax% first Version
+}
+\let\endmdf at trivlist\endtrivlist
+\patchcmd\endmdf at trivlist\@endparenv\mdf at endparenv{%
+  \immediate\typeout{^^J****** mdframed patching \string\endmdf at trivlist}%
+  \immediate\typeout{^^J****** -- success******^^J}%
+  }{%
+  \immediate\typeout{^^J****** mdframed patching \string\endmdf at trivlist}%
+  \immediate\typeout{^^J****** -- failed******^^J}%
+}
+\def\mdf at endparenv{%
+  \addpenalty\@endparpenalty\addvspace\mdf at skipbelow@length\@endpetrue}
+
+\newrobustcmd*\mdf at makebox@out[2][\linewidth]{%
+ \noindent\hb at xt@\z@{%
+    \noindent\makebox[\dimexpr #1\relax][l]{#2}%
+ \hss}%
+}%
+\newrobustcmd*\mdf at makebox@in[2][\mdf at userdefinedwidth@length]{%
+ \noindent\makebox[\dimexpr #1\relax][l]{#2}%
+}
+\newrobustcmd*\mdfdefinestyle[2]{%
+  \csdef{mdf at definestyle@#1}{#2}%
+}
+\newrobustcmd*\mdfapptodefinestyle[2]{%
+ \ifcsundef{mdf at definestyle@#1}%
+   {\mdf at PackageWarning{Unknown style #1}}%
+   {\csappto{mdf at definestyle@#1}{,#2}}%
+}
+\newrobustcmd*{\mdflength}[1]{\csuse{mdf@#1 at length}}
+
+\newrobustcmd*{\surroundwithmdframed}[2][]{%
+  \BeforeBeginEnvironment{#2}{\begin{mdframed}[#1]}%
+  \AfterEndEnvironment{#2}{\end{mdframed}}%
+}
+\newrobustcmd*\newmdenv[2][]{%
+  \newenvironment{#2}{%
+     \mdfsetup{#1}%
+     \begin{mdframed}%
+    }{%
+     \end{mdframed}%
+  }%
+}
+\newrobustcmd*\renewmdenv[2][]{%
+  \expandafter\let\csname #2\endcsname\relax%
+  \expandafter\let\csname end#2\endcsname\relax%
+  \newmdenv[#1]{#2}%
+  }%
+\DeclareDocumentCommand\newmdtheoremenv{O{} m o m o }{%
+ \ifboolexpr{ test {\IfNoValueTF {#3}} and test {\IfNoValueTF {#5}} }%
+    {\newtheorem{#2}{#4}}{%
+     \IfValueTF{#3}{\newtheorem{#2}[#3]{#4}}{}%
+     \IfValueTF{#5}{\newtheorem{#2}{#4}[#5]}{}%
+    }%
+  \BeforeBeginEnvironment{#2}{%
+     \begin{mdframed}[#1]}%
+  \AfterEndEnvironment{#2}{%
+     \end{mdframed}}%
+}
+\newrobustcmd*\mdf at thm@caption[2]{}
+\AtBeginDocument{%
+ \@ifpackageloaded{ntheorem}%
+   {\renewrobustcmd*\mdf at thm@caption{\thm at thmcaption}}{}%
+ }
+\DeclareDocumentCommand{\mdtheorem}{ O{} m o m o }%
+ {\ifcsdef{#2}%
+   {\mdf at PackageWarning{Environment #2 already exits\MessageBreak}}%
+   {%
+    \IfNoValueTF {#3}%
+     {%#3 not given -- number relationship
+      \IfNoValueTF {#5}%
+        {%#3+#5 not given
+        \@definecounter{#2}%
+        \expandafter\xdef\csname the#2\endcsname{\@thmcounter{#2}}%
+        \newenvironment{#2}[1][]{%
+          \refstepcounter{#2}%
+          \ifstrempty{##1}%
+            {\let\@temptitle\relax}%
+            {%
+             \def\@temptitle{\mdf at theoremseparator%
+                             \mdf at theoremspace%
+                             \mdf at theoremtitlefont%
+                             ##1}%
+             \mdf at thm@caption{#2}{{#4}{\csname the#2\endcsname}{##1}}%
+             }%
+          \begin{mdframed}[#1,frametitle={\strut#4\ \csname the#2\endcsname%
+                                          \@temptitle}]}%
+          {\end{mdframed}}%
+        \newenvironment{#2*}[1][]{%
+          \ifstrempty{##1}{\let\@temptitle\relax}{\def\@temptitle{:\ ##1}}%
+          \begin{mdframed}[#1,frametitle={\strut#4\@temptitle}]}%
+          {\end{mdframed}}%
+        }%
+        {%#5 given -- reset counter
+        \@definecounter{#2}\@newctr{#2}[#5]%
+        \expandafter\xdef\csname the#2\endcsname{\@thmcounter{#2}}%
+        \expandafter\xdef\csname the#2\endcsname{%
+               \expandafter\noexpand\csname the#5\endcsname \@thmcountersep%
+                  \@thmcounter{#2}}%
+        \newenvironment{#2}[1][]{%
+          \refstepcounter{#2}%
+          \ifstrempty{##1}%
+            {\let\@temptitle\relax}%
+            {%
+             \def\@temptitle{\mdf at theoremseparator%
+                             \mdf at theoremspace%
+                             \mdf at theoremtitlefont%
+                             ##1}%
+             \mdf at thm@caption{#2}{{#4}{\csname the#2\endcsname}{##1}}%
+             }
+          \begin{mdframed}[#1,frametitle={\strut#4\ \csname the#2\endcsname%
+                                          \@temptitle}]}%
+          {\end{mdframed}}%
+        \newenvironment{#2*}[1][]{%
+          \ifstrempty{##1}%
+            {\let\@temptitle\relax}%
+            {%
+             \def\@temptitle{\mdf at theoremseparator%
+                             \mdf at theoremspace%
+                             \mdf at theoremtitlefont%
+                             ##1}%
+             \mdf at thm@caption{#2}{{#4}{\csname the#2\endcsname}{##1}}%
+             }%
+          \begin{mdframed}[#1,frametitle={\strut#4\@temptitle}]}%
+          {\end{mdframed}}%
+        }%
+     }%
+     {%#3 given -- number relationship
+        \global\@namedef{the#2}{\@nameuse{the#3}}%
+        \newenvironment{#2}[1][]{%
+          \refstepcounter{#3}%
+          \ifstrempty{##1}%
+            {\let\@temptitle\relax}%
+            {%
+             \def\@temptitle{\mdf at theoremseparator%
+                             \mdf at theoremspace%
+                             \mdf at theoremtitlefont%
+                             ##1}%
+             \mdf at thm@caption{#2}{{#4}{\csname the#2\endcsname}{##1}}%
+             }
+          \begin{mdframed}[#1,frametitle={\strut#4\ \csname the#2\endcsname%
+                                          \@temptitle}]}%
+          {\end{mdframed}}%
+        \newenvironment{#2*}[1][]{%
+          \ifstrempty{##1}{\let\@temptitle\relax}{\def\@temptitle{:\ ##1}}%
+          \begin{mdframed}[#1,frametitle={\strut#4\@temptitle}]}%
+          {\end{mdframed}}%
+     }%
+   }%
+ }
+
+\newrobustcmd\mdfframedtitleenv[1]{%
+    \mdf at lrbox{\mdf at frametitlebox}%
+     \mdf at frametitlealignment%
+       \leavevmode\color{\mdf at frametitlefontcolor}%
+           \normalfont\mdf at frametitlefont{#1}
+       \ifbool{mdf at ignorelastdescenders}%
+         {%
+          \par\strut\par
+          \unskip\unskip\setbox0=\lastbox
+          \vspace*{\dimexpr\ht\strutbox-\baselineskip\relax}%
+         }{}%
+    \par\unskip\ifvmode\nointerlineskip\hrule \@height\z@ \@width\hsize\fi%%
+    \endmdf at lrbox\relax%
+   \mdf at ignorevbadness%
+   \setbox\mdf at frametitlebox=\vbox{\unvbox\mdf at frametitlebox}%
+   \mdfframetitleboxwidth=\wd\mdf at frametitlebox\relax%
+   \mdfframetitleboxheight=\ht\mdf at frametitlebox\relax%
+   \mdfframetitleboxdepth=\dp\mdf at frametitlebox\relax%
+   \mdfframetitleboxtotalheight=\dimexpr
+                                  \ht\mdf at frametitlebox
+                                  +\dp\mdf at frametitlebox%
+                                  +\mdf at frametitleaboveskip@length
+                                  +\mdf at frametitlebelowskip@length
+                                \relax%
+}
+
+\newrobustcmd*\mdf@@frametitle{%
+    \mdfframedtitleenv{\mdf at frametitle}%
+}
+
+\newrobustcmd*\mdf@@frametitle at use{%
+   \parskip\z@\relax%
+   \parindent\z@\relax%
+   \offinterlineskip\relax%
+   \mdf at ignorevbadness%
+   \setbox\mdf at splitbox@one=\vbox{%
+       \unvcopy\mdf at frametitlebox\relax%
+       \mdf@@frametitlerule\relax%
+       \unvbox\mdf at splitbox@one\relax%
+    }%
+   \mdf at ignorevbadness%
+   \setbox\mdf at splitbox@one=\vbox{\unvbox\mdf at splitbox@one}%
+   \mdfsetup{innertopmargin=\mdf at frametitleaboveskip@length}%
+}
+\newrobustcmd*\mdf at checkntheorem{%
+  \ifbool{mdf at ntheorem}%
+    {\ifundef{\theorempreskipamount}%
+          {\mdf at PackageWarning{You have not loaded ntheorem yet}}%
+          {\setlength{\theorempreskipamount}{\z@}%
+           \setlength{\theorempostskipamount}{\z@}%
+    }%
+  }{}%
+}
+\newrobustcmd*\mdf at footnoterule{%
+    \kern0\p@%
+    \hrule \@width 1in \kern 2.6\p@}
+\newrobustcmd*\mdf at footnoteoutput{%
+     \ifvoid\@mpfootins\else%
+          \nobreak%
+          \vskip\mdf at footenotedistance@length%
+          \normalcolor%
+          \mdf at footnoterule%
+          \unvbox\@mpfootins%
+     \fi%
+}
+\newrobustcmd*\mdf at footnoteinput{%
+   \def\@mpfn{mpfootnote}%
+   \def\thempfn{\thempfootnote}%
+   \c at mpfootnote\z@%
+   \let\@footnotetext\@mpfootnotetext%
+}
+\newrobustcmd*\mdf at load@style{%
+\ifcase\value{mdf at globalstyle@cnt}\relax%
+    \input{md-frame-0.mdf}%
+ \or\input{md-frame-1.mdf}%
+ \or\input{md-frame-2.mdf}%
+ \or\input{md-frame-3.mdf}%
+ \else%
+    \IfFileExists{md-frame-\value{mdf at globalstyle@cnt}.mdf}%
+    {\input{md-frame-\value{mdf at globalstyle@cnt}.mdf}}%
+    {%
+     \input{md-frame-0.mdf}%
+     \mdf at PackageWarning{The style number \value{mdf at globalstyle@cnt}
+                         does not exist^^J
+                         mdframed ues instead style=0 \mdframedpackagename}%
+    }%
+\fi%
+}%
+\mdf at load@style
+\newrobustcmd*\mdf at styledefinition{%AVOID!!!Needed for framemethod=default
+    \ifnumequal{\value{mdf at globalstyle@cnt}}{0}%
+    {\deflength{\mdf at innerlinewidth@length}{\z@}%
+     \deflength{\mdf at middlelinewidth@length}{\mdf at linewidth@length}%
+     \deflength{\mdf at outerlinewidth@length}{\z@}%
+     \let\mdf at innerlinecolor\mdf at linecolor%
+     \let\mdf at middlelinecolor\mdf at linecolor%
+     \let\mdf at outerlinecolor\mdf at linecolor%
+    }{}%
+}
+\let\mdf at reserved@a\@empty
+\newrobustcmd*\detected at mdf@put at frame{%
+  \ifmdf at nobreak%Option nobreak=true?
+     \def\mdf at reserved@a{\mdf at put@frame at standalone}%
+  \else
+     \def\mdf at reserved@a{\mdf at put@frame}%
+     \ifx\@captype\@undefined
+         \def\mdf at reserved@a{\mdf at put@frame}%
+     \else
+         \mdf at PackageInfo{mdframed inside float  ^^J
+                          mdframed uses option nobreak \mdframedpackagename}%
+         \def\mdf at reserved@a{\mdf at put@frame at standalone}%
+     \fi
+     \if at minipage%
+           \mdf at PackageInfo{mdframed inside minipage  ^^J
+                           mdframed uses option nobreak \mdframedpackagename}%
+           \def\mdf at reserved@a{\mdf at put@frame at standalone}%
+     \fi%
+     \ifinner%
+          \mdf at PackageInfo{mdframed inside a box ^^J
+                          mdframed uses option nobreak \mdframedpackagename}%
+          \def\mdf at reserved@a{\mdf at put@frame at standalone}%
+     \fi%
+  \fi%
+\mdf at reserved@a%
+}
+\newenvironment{mdframed}[1][]{%
+\color at begingroup%
+   \mdfsetup{userdefinedwidth=\linewidth,#1}%
+   \mdf at twoside@checklength%
+   \let\width\z@%
+   \let\height\z@%
+   \mdf at checkntheorem%
+   \mdf at styledefinition%
+   \mdf at footnoteinput%
+   \color{\mdf at fontcolor}%
+   \mdf at font%
+   \ifvmode\nointerlineskip\fi%
+   \mdf at trivlist{\mdf at skipabove@length}%%
+   \ifdefempty{\mdf at frametitle}{}{\mdf@@frametitle}%
+   \mdf at settings%
+   \mdf at lrbox{\mdf at splitbox@one}%
+  }%
+  {%
+   \ifbool{mdf at ignorelastdescenders}%
+     {%
+      \par\strut\par
+      \unskip\unskip\setbox0=\lastbox
+      \vspace*{\dimexpr\ht\strutbox-\baselineskip\relax}%
+     }{}%
+    \par\unskip\ifvmode\nointerlineskip\hrule \@height\z@ \@width\hsize\fi%%
+    \ifmdf at footnoteinside%
+      \def\mdf at reserveda{%
+        \mdf at footnoteoutput%
+        \endmdf at lrbox%
+        \ifdefempty{\mdf at frametitle}{}{\mdf@@frametitle at use}%
+        \detected at mdf@put at frame}%
+    \else%
+      \def\mdf at reserveda{%
+        \endmdf at lrbox%
+        \ifdefempty{\mdf at frametitle}{}{\mdf@@frametitle at use}%
+        \detected at mdf@put at frame%
+        \mdf at footnoteoutput%
+        }%
+    \fi%
+    \mdf at reserveda%
+    \endmdf at trivlist%
+\color at endgroup\@doendpe%
+}
+
+\newtoggle{md:checktwoside}
+\settoggle{md:checktwoside}{false}
+\newrobustcmd*\mdf at twoside@checklength{%
+ \if at twoside
+   \ifbool{mdf at usetwoside}%
+      {\mdf at PackageInfo{mdframed works in twoside mode}%
+       \settoggle{md:checktwoside}{true}%
+       \setlength\mdf at rightmargin@length{\mdf at outermargin@length}%
+       \setlength\mdf at leftmargin@length{\mdf at innermargin@length}%
+      }%
+      {\mdf at PackageInfo{mdframed inside twoside mode but\MessageBreak
+                       works with oneside mode}%
+       \settoggle{md:checktwoside}{false}%
+      }%
+ \fi%
+}
+
+\newcounter{mdf at zref@counter}%keine doppelten laebes
+\zref at newprop*{mdf at pagevalue}[0]{\number\value{page}}
+\zref at addprop{\ZREF at mainlist}{mdf at pagevalue}
+\newrobustcmd*\mdf at zref@label{%
+   \stepcounter{mdf at zref@counter}
+   \zref at label{mdf at pagelabel-\number\value{mdf at zref@counter}}%
+}
+\newrobustcmd*\if at mdf@pageodd{%
+ \zref at refused{mdf at pagelabel-\the\value{mdf at zref@counter}}%
+ \ifodd\zref at extract{mdf at pagelabel-\the\value{mdf at zref@counter}}%
+                    {mdf at pagevalue}%
+    \setlength\mdf at rightmargin@length{\mdf at outermargin@length}%
+    \setlength\mdf at leftmargin@length{\mdf at innermargin@length}%
+ \else
+    \setlength\mdf at rightmargin@length{\mdf at innermargin@length}%
+    \setlength\mdf at leftmargin@length{\mdf at outermargin@length}%
+ \fi%
+}
+\newrobustcmd*\mdf@@setzref{%
+ \iftoggle{md:checktwoside}{\mdf at zref@label\if at mdf@pageodd}{}%
+}
+\newrobustcmd*\mdf at freepagevspace{%
+     \bgroup\@nobreakfalse\addpenalty\z@\egroup%added 29.5.12
+     \penalty\@M\relax\vskip 2\baselineskip\relax%
+     \penalty9999\relax\vskip -2\baselineskip\relax%
+     \penalty9999%
+     \ifdimequal{\pagegoal}{\maxdimen}%
+          {\mdf at freevspace@length\vsize}%
+          {\mdf at freevspace@length=\pagegoal\relax%
+           \advance\mdf at freevspace@length by -\pagetotal\relax%
+           \addtolength\mdf at freevspace@length{\dimexpr-\parskip\relax}\relax%
+          }%
+}
+\newrobustcmd*\mdf at advancelength@horizontalmargin at sub[1]{%
+  \advance\mdf at horizontalspaceofbox by -\csname mdf@#1 at length\endcsname\relax%
+}
+\newlength\mdf at horizontalspaceofbox
+\newrobustcmd*\mdf at horizontalmargin@equation{%
+    \setlength{\mdf at horizontalspaceofbox}{\mdf at userdefinedwidth@length}%
+    \mdf at dolist{\mdf at advancelength@horizontalmargin at sub}{%
+             leftmargin,outerlinewidth,middlelinewidth,%
+             innerlinewidth,innerleftmargin,innerrightmargin,%
+             innerlinewidth,middlelinewidth,outerlinewidth,%
+             rightmargin}%
+    \notbool{mdf at leftline}%
+       {%
+        \advance\mdf at horizontalspaceofbox by \mdf at innerlinewidth@length\relax%
+        \advance\mdf at horizontalspaceofbox by \mdf at middlelinewidth@length\relax%
+        \advance\mdf at horizontalspaceofbox by \mdf at outerlinewidth@length\relax%
+       }{}%
+    \notbool{mdf at rightline}%
+       {%
+        \advance\mdf at horizontalspaceofbox by \mdf at innerlinewidth@length\relax%
+        \advance\mdf at horizontalspaceofbox by \mdf at middlelinewidth@length\relax%
+        \advance\mdf at horizontalspaceofbox by \mdf at outerlinewidth@length\relax%
+       }{}%
+    \ifdimless{\mdf at horizontalspaceofbox}{3cm}%
+      {\mdf at PackageWarning{You have only a width of 3cm}}{}%
+    \hsize=\mdf at horizontalspaceofbox%
+}
+\newrobustcmd*\mdf at keeplines@single{%
+  \notbool{mdf at topline}%
+     {%
+      \advance\mdf at verticalmarginwhole@length %
+               by -\mdf at innerlinewidth@length\relax%
+      \advance\mdf at verticalmarginwhole@length %
+               by -\mdf at middlelinewidth@length\relax%
+      \advance\mdf at verticalmarginwhole@length %
+               by -\mdf at outerlinewidth@length\relax%
+     }{}%
+  \notbool{mdf at bottomline}%
+     {%
+      \advance\mdf at verticalmarginwhole@length %
+               by -\mdf at innerlinewidth@length\relax%
+      \advance\mdf at verticalmarginwhole@length %
+               by -\mdf at middlelinewidth@length\relax%
+      \advance\mdf at verticalmarginwhole@length %
+               by -\mdf at outerlinewidth@length\relax%
+     }{}%
+}
+\newrobustcmd*\mdf at advancelength@verticalmarginwhole[1]{%
+  \advance\mdf at verticalmarginwhole@length %
+           by \csname mdf@#1 at length\endcsname\relax%
+}
+\newrobustcmd*\mdf at advancelength@freevspace at sub[1]{%
+  \advance\dimen@ by -\csname mdf@#1 at length\endcsname\relax%
+}
+\newrobustcmd*\mdf at advancelength@freevspace at add[1]{%
+  \advance\dimen@ by \csname mdf@#1 at length\endcsname\relax%
+}
+\protected at edef\mdf at reset{\boxmaxdepth\the\boxmaxdepth
+                          \splittopskip\the\splittopskip}%
+\newrobustcmd*\mdf at put@frame at standalone{\relax%
+   \ifvoid\mdf at splitbox@one\relax
+      \mdf at PackageWarning{The environment is empty\MessageBreak}%
+      \let\mdf at reserved@a\relax%
+   \else
+      %Hier berechnung Box-Inhalt+Rahmen oben und unten
+      \setlength{\mdf at verticalmarginwhole@length}%
+                 {\dimexpr\ht\mdf at splitbox@one+\dp\mdf at splitbox@one\relax}%
+      \mdf at dolist{\mdf at advancelength@verticalmarginwhole}{%
+                  outerlinewidth,middlelinewidth,innerlinewidth,%
+                  innertopmargin,innerbottommargin,innerlinewidth,%
+                  middlelinewidth,outerlinewidth}%
+      \mdf at keeplines@single%
+      \def\mdf at reserved@a{\mdf at putbox@single}%
+   \fi
+   \mdf at reserved@a%
+}
+\def\mdf at put@frame{\relax%
+\ifvoid\mdf at splitbox@one\relax
+  \mdf at PackageWarning{The environment is empty\MessageBreak}%
+  \let\mdf at reserved@a\relax%
+\else
+  \setlength\mdfboundingboxwidth{\wd\mdf at splitbox@one}%
+  \mdf at print@space%
+  \mdf at freepagevspace%gives \mdf at freevspace@length
+  \mdf at PackageInfoSpace{\the\mdf at freevspace@length before the
+                        beginning of \MessageBreak
+                        the environment ending on input line \MessageBreak}%
+  \ifdimless{\mdf at freevspace@length}{2\baselineskip}
+    {%
+     \mdf at PackageInfo{Not enough space on this page}
+     \vfill\eject%
+     \def\mdf at reserved@a{\mdf at put@frame}%
+    }{%
+      %Hier berechnung Box-Inhalt+Rahmen oben und unten
+      \setlength{\mdf at verticalmarginwhole@length}%
+                {\dimexpr\ht\mdf at splitbox@one+\dp\mdf at splitbox@one\relax}%
+      \mdf at dolist{\mdf at advancelength@verticalmarginwhole}%
+                 {%
+                  outerlinewidth,middlelinewidth,innerlinewidth,%
+                  innertopmargin,innerbottommargin,%
+                  innerlinewidth,middlelinewidth,outerlinewidth}%
+      \mdf at keeplines@single%
+      \ifdimless{\mdf at verticalmarginwhole@length}{\mdf at freevspace@length}%
+         {%passt auf Seite%
+          \begingroup\mdf@@setzref\mdf at putbox@single\endgroup%Output no break
+          \let\mdf at reserved@a\relax%
+         }%
+         {%
+          \def\mdf at reserved@a{\mdf at put@frame at i}%passt nicht auf Seite
+         }
+    }%
+\fi
+\mdf at reserved@a%
+}
+\def\mdf at put@frame at i{%Box must be splitted
+ \mdf at freepagevspace%gives \mdf at freevspace@length
+ \dimen@=\the\mdf at freevspace@length\relax%
+ \dimen at i=\mdf at innertopmargin@length\relax%
+ \advance\dimen at i by \mdf at innerlinewidth@length\relax%
+ \advance\dimen at i by \mdf at middlelinewidth@length\relax%
+ \advance\dimen at i by \mdf at outerlinewidth@length\relax%
+ \advance\dimen at i by 2\baselineskip\relax%
+ \ifdimless{\dimen@}{\dimen at i}%
+   {\hrule \@height\z@ \@width\hsize%
+    \vfill\eject%
+    \def\mdf at reserved@a{\mdf at put@frame}%
+   }%
+   {%
+    \mdf at dolist{\mdf at advancelength@freevspace at sub}{%calculate with \dimen@
+              outerlinewidth,middlelinewidth,innerlinewidth,%
+              innertopmargin,splitbottomskip}%
+    \ifbool{mdf at everyline}%
+      {%
+       \ifbool{mdf at bottomline}%
+          {%
+           \advance\dimen@ by -\mdf at innerlinewidth@length%
+           \advance\dimen@ by -\mdf at middlelinewidth@length%
+           \advance\dimen@ by -\mdf at outerlinewidth@length%
+          }{}%
+      }{}%
+    \notbool{mdf at topline}%
+       {%
+        \advance\dimen@ by \mdf at innerlinewidth@length%
+        \advance\dimen@ by \mdf at middlelinewidth@length%
+        \advance\dimen@ by \mdf at outerlinewidth@length%
+       }{}%
+    \advance\dimen at .8\pageshrink
+    \ifdimless{\ht\mdf at splitbox@one+\dp\mdf at splitbox@one}{\dimen@}%
+       {\mdf at PackageWarning{You got a bad break\MessageBreak
+                            because the last box will be empty\MessageBreak
+                           you have to change it manually\MessageBreak
+                           by changing the text, the space\MessageBreak
+                           or something else}%
+        \advance\dimen@ by -1.8\baselineskip\relax%needed????????????????????
+       }{}%
+    \setbox\mdf at splitbox@save=\vbox{\unvcopy\mdf at splitbox@one}%
+    \splitmaxdepth\z@ \splittopskip\mdf at splittopskip@length%
+    \mdf at ignorevbadness%
+    \setbox\mdf at splitbox@two\vsplit\mdf at splitbox@one to \dimen@
+    \setbox\mdf at splitbox@two\vbox{\unvbox\mdf at splitbox@two}%
+    \setbox\mdf at splitbox@one\vbox{\unvbox\mdf at splitbox@one}%
+    \ifdimgreater{\ht\mdf at splitbox@two+\dp\mdf at splitbox@two}{\dimen@}%
+      {%splitted wrong
+       \mdf at PackageInfo{Box was splittet wrong^^M starting loop to iterate
+                        the splitting point\MessageBreak}%
+       \setbox\mdf at splitbox@one=\vbox{\unvcopy\mdf at splitbox@save}%
+       \dimen at i=\dimen@%\relax
+       \@tempcnta=\z@\relax
+       \loop
+        \ifdim\dimexpr\ht\mdf at splitbox@two+\dp\mdf at splitbox@two\relax>\dimen@
+          \advance\dimen at i by -\p@\relax
+          \advance\@tempcnta by \@ne\relax
+          \ifnum\@tempcnta>100
+            \let\iterate\relax
+            \mdf at PackageWarning{correct box splittet fails^^M
+                                It seems you are using a non splittable
+                                contents\MessageBreak}
+          \fi
+          \mdf at ignorevbadness%
+          \setbox\mdf at splitbox@one=\vbox{\break\unvcopy\mdf at splitbox@save}%
+          \splitmaxdepth\z@ \splittopskip\mdf at splittopskip@length%
+          \mdf at ignorevbadness%
+          \setbox\mdf at splitbox@two\vsplit\mdf at splitbox@one to \dimen at i\relax%
+          \setbox\mdf at splitbox@two\vbox{\unvbox\mdf at splitbox@two}%
+          \setbox\mdf at splitbox@one\vbox{\unvbox\mdf at splitbox@one}%
+       \repeat%
+      }{}%
+    \ifvoid\mdf at splitbox@one\relax%
+      \mdf at PackageWarning{You got a bad break because the splittet box
+                          is empty^^M
+                          You have to change the page settings^^M
+                          like enlargethispage or something else^^M
+                          the package increases do
+                          \enlargethispage{\baselineskip}\MessageBreak}%
+      \setbox\mdf at splitbox@one=\vbox{\unvcopy\mdf at splitbox@save}
+      \enlargethispage{\baselineskip}%
+      \def\mdf at reserved@a{\mdf at put@frame}%
+    \fi%
+    \ifdim\wd\mdf at splitbox@two=\wd\mdf at splitbox@one\relax
+    \else%
+      \mdf at PackageInfo{You first box width is to small^^M
+                       mdframed fixed it\MessageBreak}%
+      \setbox\mdf at splitbox@two=\vbox%
+                   {%
+                    \hrule \@height\z@ \@width\wd\mdf at splitbox@one\relax
+                    \unvcopy\mdf at splitbox@two%
+                   }%
+    \fi%
+    \ifvoid\mdf at splitbox@two\relax%
+        {\hrule \@height\f at size pt \@width\z@%
+         \hrule \@height\z@ \@width\hsize}%
+         \setbox\mdf at splitbox@one=\vbox{\unvcopy\mdf at splitbox@save}%
+         \def\mdf at reserved@a{\mdf at put@frame}%
+     \else%
+        \ifdimequal{\ht\mdf at splitbox@two}{0pt}%
+          {\hrule \@height\z@ \@width\hsize%
+           \vfill\eject%
+           \setbox\mdf at splitbox@one=\vbox{\unvcopy\mdf at splitbox@save}%
+           \def\mdf at reserved@a{\mdf at put@frame}%
+          }%
+          {%
+          \begingroup\mdf@@setzref\mdf at putbox@first\endgroup%
+          \hrule \@height\z@ \@width\hsize%
+          \vfill\eject%
+          \def\mdf at reserved@a{\mdf at put@frame at ii}%
+          }%
+     \fi%
+   }%
+\mdf at reserved@a%
+}
+\def\mdf at put@frame at ii{%
+  \setlength{\mdf at freevspace@length}{\vsize}%
+    \ifbool{mdf at repeatframetitle}%
+      {%
+       \toggletrue{mdf at notfirstframetitle}%
+       \splitmaxdepth\z@ \splittopskip\z@%
+       \setbox\mdf at splitbox@one=\vbox{\break\unvbox\mdf at splitbox@one}%
+       \mdf at ignorevbadness%
+       \setbox0=\vsplit\mdf at splitbox@one to \z@\relax%
+       \setbox\mdf at splitbox@one=\vbox{\unvbox\mdf at splitbox@one}
+       \setbox\mdf at splitbox@one\vbox%
+          {%
+           \vbox to \mdf at frametitleaboveskip@length{}
+           \unvcopy\mdf at frametitlebox\relax%
+           \mdf@@frametitlerule\relax%
+           \unvbox\mdf at splitbox@one\relax%
+          }%
+       \setbox\mdf at splitbox@one=\vbox{\unvbox\mdf at splitbox@one}%
+      }{}%
+  \setlength{\dimen@}{\dimexpr\ht\mdf at splitbox@one+\dp\mdf at splitbox@one\relax}%
+  \mdf at dolist{\mdf at advancelength@freevspace at add}%
+        {%used \dimen@
+         innerbottommargin,innerlinewidth,middlelinewidth,outerlinewidth,%
+        }%
+  \ifbool{mdf at everyline}%
+    {%
+     \ifbool{mdf at topline}%
+      {%
+       \advance\dimen@ by \mdf at innerlinewidth@length\relax%
+       \advance\dimen@ by \mdf at middlelinewidth@length\relax%
+       \advance\dimen@ by \mdf at outerlinewidth@length\relax%
+      }{}%
+    }{}%
+   \notbool{mdf at bottomline}%
+     {%
+      \advance\dimen@ by -\mdf at innerlinewidth@length\relax%
+      \advance\dimen@ by -\mdf at middlelinewidth@length\relax%
+      \advance\dimen@ by -\mdf at outerlinewidth@length\relax%
+      \relax%
+     }{}%
+   \ifdimgreater{\dimen@}{\mdf at freevspace@length}%
+    {%have a middle box
+     \advance\mdf at freevspace@length by -\mdf at splitbottomskip@length\relax%
+     \ifbool{mdf at everyline}%
+       {%
+        \ifbool{mdf at topline}%
+          {%
+          \advance\mdf at freevspace@length by -\mdf at innerlinewidth@length\relax%
+          \advance\mdf at freevspace@length by -\mdf at middlelinewidth@length\relax%
+          \advance\mdf at freevspace@length by -\mdf at outerlinewidth@length\relax%
+          }{}%
+        \ifbool{mdf at bottomline}%
+          {%
+          \advance\mdf at freevspace@length by -\mdf at innerlinewidth@length\relax%
+          \advance\mdf at freevspace@length by -\mdf at middlelinewidth@length\relax%
+          \advance\mdf at freevspace@length by -\mdf at outerlinewidth@length\relax%
+          \relax
+          }{}%
+       }{}%
+     \setbox\mdf at splitbox@save=\vbox{\unvcopy\mdf at splitbox@one}%
+     \splitmaxdepth\z@ \splittopskip\mdf at splittopskip@length%
+     \mdf at ignorevbadness%
+     \setbox\mdf at splitbox@two\vsplit\mdf at splitbox@one to \mdf at freevspace@length
+     \setbox\mdf at splitbox@two\vbox{\unvbox\mdf at splitbox@two}
+     \setbox\mdf at splitbox@one\vbox{\unvbox\mdf at splitbox@one}
+     \ifdimgreater{\ht\mdf at splitbox@two+\dp\mdf at splitbox@two}{\dimen@}%
+       {%splitted wrong
+        \mdf at PackageInfo{Box was splittet wrong^^M starting loop to iterate
+                         the splitting point\MessageBreak}%
+        \dimen at i=\mdf at freevspace@length%\relax
+        \@tempcnta=\z@\relax
+        \loop
+        \ifdim\dimexpr\ht\mdf at splitbox@two+\dp\mdf at splitbox@two\relax>%
+              \mdf at freevspace@length\relax
+          \advance\dimen at i by -\p@\relax%
+          \advance\@tempcnta by \@ne\relax%
+          \ifnum\@tempcnta>100
+            \let\iterate\relax%
+            \mdf at PackageWarning{correct box splittet fails^^M
+                                It seems you are using a non splittable
+                                contents\MessageBreak}%
+          \fi
+          \setbox\mdf at splitbox@one=\vbox{\break\unvcopy\mdf at splitbox@save}%
+          \splitmaxdepth\z@ \splittopskip\mdf at splittopskip@length%
+          \mdf at ignorevbadness%
+          \setbox\mdf at splitbox@two\vsplit\mdf at splitbox@one to \dimen at i\relax%
+          \setbox\mdf at splitbox@two\vbox{\unvbox\mdf at splitbox@two}%
+          \setbox\mdf at splitbox@one\vbox{\unvbox\mdf at splitbox@one}%
+        \repeat%
+       }{}%
+     \ifvoid\mdf at splitbox@one\relax%
+        \mdf at PackageWarning{You got a bad break because the splittet box is
+                            empty^^M
+                            You have to change the page settings^^M
+                            like enlargethispage or something else^^M
+                            the package increases do
+                            \enlargethispage{\baselineskip}\MessageBreak}%
+        \setbox\mdf at splitbox@one=\vbox{\unvcopy\mdf at splitbox@save}%
+        \enlargethispage{\baselineskip}%
+        \def\mdf at reserved@a{\mdf at put@frame at ii}%
+     \else
+        \begingroup\mdf@@setzref\mdf at putbox@middle\endgroup%
+          \hrule \@height\z@ \@width\hsize%
+          \vfill\eject%
+          \def\mdf at reserved@a{\mdf at put@frame at ii}%
+        \fi
+     }%End middle box case
+     {%start last box case
+      \ifvoid\mdf at splitbox@one
+           \mdf at PackageWarning{You got a bad break\MessageBreak
+                               because the last split box is empty\MessageBreak
+                               You have to change the settings}%%
+           \setbox\mdf at splitbox@one=\vbox%
+                  {%
+                   \unvbox\mdf at splitbox@one%
+                   \hrule \@height\z@ \@width\mdfboundingboxwidth
+                  }%
+      \fi%
+      \ifdimless{\ht\mdf at splitbox@one}{1sp}%
+         {%
+          \mdf at PackageWarning{You got a bad break\MessageBreak
+                              because the last split box is empty\MessageBreak
+                              You have to change the settings}%
+
+          \let\mdf at reserved@a\relax%
+          \setbox\mdf at splitbox@one=\vbox%
+                 {%
+                  \unvbox\mdf at splitbox@one%
+                  \hrule \@height\z@ \@width\mdfboundingboxwidth
+                 }%
+         }{}%
+      \begingroup\mdf@@setzref\mdf at putbox@second\endgroup%
+      \hrule \@height\z@ \@width\hsize%
+      \let\mdf at reserved@a\relax%
+     }%
+  \mdf at reserved@a%
+}
+
+%%%%    _____t_____
+%%%%   |           |
+%%%%   |           |
+%%%%   |           |
+%%%%  l|           |r
+%%%%   |           |
+%%%%   |           |
+%%%%   |___________|
+%%%%         b
+%%Zusammenhaenge abfragen:
+\newrobustcmd*\mdf at test@ltrb{%
+    \ifboolexpr{ (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@ltr{%
+    \ifboolexpr{ (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@ltb{%
+    \ifboolexpr{ (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@trb{%
+    \ifboolexpr{ (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@lrb{%
+    \ifboolexpr{ not (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@lb{%
+    \ifboolexpr{ not (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@rb{%
+    \ifboolexpr{ not (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@tr{%
+    \ifboolexpr{ (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@lt{%
+    \ifboolexpr{ (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@lr{%
+    \ifboolexpr{ not (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@tb{%
+    \ifboolexpr{ (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@l{%
+    \ifboolexpr{ not (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@r{%
+    \ifboolexpr{ not (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@t{%
+    \ifboolexpr{ (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@b{%
+    \ifboolexpr{ not (bool {mdf at topline}) and (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@noline{%
+    \ifboolexpr{ not (bool {mdf at topline}) and not (bool {mdf at bottomline})
+                 and not (bool {mdf at leftline}) and not (bool {mdf at rightline})}}
+\newrobustcmd*\mdf at test@single{%
+    \ifboolexpr{ not (test {\mdf at test@ltrb} or test {\mdf at test@ltr} or
+                 test {\mdf at test@ltb} or test {\mdf at test@trb} or
+                 test {\mdf at test@lrb}  or test {\mdf at test@lb} or
+                 test {\mdf at test@rb} or test {\mdf at test@tr} or
+                 test {\mdf at test@lt} ) }}
+\DisableKeyvalOption[action=warning,package=mdframed]{mdf}{framemethod}%
+\DisableKeyvalOption[action=warning,package=mdframed]{mdf}{xcolor}%
+
+ \endinput
+%% 
+%% ================================================================
+%% Copyright (C) 2012 by Marco Daniel
+%% 
+%% This work may be distributed and/or modified under the
+%% conditions of the LaTeX Project Public License (LPPL), either
+%% version 1.3c of this license or (at your option) any later
+%% version.  The latest version of this license is in the file:
+%% 
+%% http://www.latex-project.org/lppl.txt
+%% 
+%% This work is "maintained" (as per LPPL maintenance status) by
+%% Marco Daniel.
+%% 
+%% Have fun!
+%% 
+%% ================================================================
+%%
+%% End of file `mdframed.sty'.
diff --git a/doc/mixtureLM.tex b/doc/mixtureLM.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/ngt.tex b/doc/ngt.tex
new file mode 100644
index 0000000..e987a83
--- /dev/null
+++ b/doc/ngt.tex
@@ -0,0 +1,33 @@
+{\tt ngt} is the command which copes with the $n$-gram counts.
+
+
+\begin{itemize}
+\item It extracts the $n$-gram counts and stores into a $n$-gram table.
+\item It prunes $n$-gram table.
+\item It merges $n$-gram tables.
+\item It transforms $n$-gram table formats.
+\end{itemize}
+
+\noindent
+A new  $n$-gram table for the  limited dictionary can  be computed with {\tt ngt} by specifying 
+the sub-dictionary:
+\begin{verbatim}
+$> ngt -i=train.www -sd=top10k -n=3 -o=train.10k.www -b=yes
+\end{verbatim}
+The command replaces  all words outside  top10K with  the special
+out-of-vocabulary symbol {\tt \_unk\_}.{\tt dict} is the command which copes with the dictionaries.
+
+\noindent
+Another useful feature of ngt is the merging of two $n$-gram tables. Assume that we have 
+split our training corpus into files  {\tt text-a} and file {\tt text-b} and have computed $n$-gram 
+tables for both files, we can merge them with the option {\tt -aug}:
+\begin{verbatim}
+$> ngt -i="gunzip -c text-a.gz" -n=3 -o=text-a.www -b=yes
+$> ngt -i="gunzip -c text-b.gz" -n=3 -o=text-b.www -b=yes
+$> ngt -i=text-a.www -aug=text-b.www -n=3 -o=text.www -b=yes
+\end{verbatim}
+
+\paragraph{Warning:} Note that if the concatenation of {\tt text-a.gz} and {\tt text-b.gz} is equal to {\tt train.gz} the resulting $n$-gram tables
+{\tt text.www} and {\tt train.www} can slightly differ. This happens because during the construction of each single $n$-gram table few $n$-grams are automatically added to make it consistent for further computation.
+
+
diff --git a/doc/parallelComputation.tex b/doc/parallelComputation.tex
new file mode 100644
index 0000000..e4869c9
--- /dev/null
+++ b/doc/parallelComputation.tex
@@ -0,0 +1,18 @@
+This package provides facilities to build a gigantic LM in parallel in order to reduce computation time.
+The script implementing this feature is based on the {\tt SUN Grid Engine} software\footnote{http://www.sun.com/software/gridware}.
+
+\noindent
+To apply the parallel computation run the following script (instead of {\tt build-lm.sh}):
+
+\begin{verbatim}
+$> build-lm-qsub.sh -i "gunzip -c train.gz" -n 3  -o train.ilm.gz -k 5
+\end{verbatim}
+Besides the options of {\tt build-lm.sh}, parameters for the SGE manager can be provided through the following one:
+
+\begin{verbatim}
+   -q      parameters for qsub, e.g. "-q <queue>", "-l <resources>"
+\end{verbatim}
+
+\noindent
+The script performs the same {\em split-and-merge} policy described in Section~\ref{sec:giganticLM}, but some computation is performed in parallel (instead of sequentially) distributing the tasks on several machines.
+
diff --git a/doc/pruneLM.tex b/doc/pruneLM.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/quantizeLM.tex b/doc/quantizeLM.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/referenceMaterial.tex b/doc/referenceMaterial.tex
new file mode 100644
index 0000000..3603808
--- /dev/null
+++ b/doc/referenceMaterial.tex
@@ -0,0 +1,34 @@
+The following books contain basic introductions to statistical language modeling:
+\begin{itemize}
+\item {\em Spoken Dialogues with Computers}, by Renato DeMori, chapter 7.
+\item {\em Speech  and Language Processing},  by Dan  Jurafsky and  Jim Martin, chapter 6.
+\item {\em Foundations   of  Statistical   Natural  Language   Processing},  by C. Manning and H. Schuetze.
+\item {\em Statistical Methods for Speech Recognition}, by Frederick Jelinek.
+\item {\em Spoken Language Processing}, by Huang, Acero and Hon.
+\end{itemize}
+
+\noindent
+The following papers describe the IRST LM toolkit:
+\begin{itemize}
+
+\item Efficient data structures to handle huge language models:
+\begin{quote}
+Marcello Federico and Mauro Cettolo, {\em Efficient Handling of N-gram Language Models for Statistical Machine Translation}, In Proc. of the Second Workshop on Statistical Machine Translation, pp. 88--95, ACL, Prague, Czech Republic, 2007.
+\end{quote}
+
+\item Language Model quantization:
+\begin{quote}
+Marcello Federico and Nicola Bertoldi, {\em How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?}, In Proc. of the Workshop on Statistical Machine Translation. pp. 94-101, NAACL, New York City, NY, 2006. 
+\end{quote}
+
+
+\item Language Model adaptation with mixtures:
+\begin{quote}
+Marcello Federico and Nicola Bertoldi, {\em Broadcast news LM adaptation over time}, Computer Speech and Language. 18(4): pp. 417-435, October, 2004.
+\end{quote}
+\item Language Model adaptation with MDI:
+\begin{quote}
+Marcello Federico, {\em Efficient LM Adaptation through MDI Estimation}. In Proc. of Eurospeech, Budapest, Hungary, 1999.
+\end{quote}
+\end{itemize}
+
diff --git a/doc/regressionTests.tex b/doc/regressionTests.tex
new file mode 100644
index 0000000..e69de29
diff --git a/doc/releaseNotes.tex b/doc/releaseNotes.tex
new file mode 100644
index 0000000..5f83240
--- /dev/null
+++ b/doc/releaseNotes.tex
@@ -0,0 +1,224 @@
+
+\IMPORTANT{If present, the index in parentheses refers to the revision number in IRSTLM repository (until 5.60.02) or SourceForge repository (from 5.60.03).}
+\subsection{Version 3.2}
+\begin{itemize}
+\item Quantization of probabilities
+\item Efficient run-time data structure for LM querying 
+\item Dismissal of MT output format
+\end{itemize}
+
+\subsection{Version 4.2}
+\begin{itemize}
+\item Distinction between open source and internal Irstlm tools
+\item More memory efficient versions of binarization and quantization commands
+\item Memory mapping of run-time LM
+\item Scripts and data structures for the estimation and handling of gigantic LMs 
+\item Integration of {\IRSTLM} into Moses Decoder
+\end{itemize}
+
+\subsection{Version 5.00}
+\begin{itemize}
+\item Fixed bug in the documentation 
+\item General script {\tt build-lm.sh} for the estimation of large LMs.
+\item Management of iARPA file format.
+\item Bug fixes
+\item Estimation of LM over a partial dictionary.
+\end{itemize}
+
+
+\subsection{Version 5.04}
+\begin{itemize}
+\item Extended documentation with ShiftBeta smoothing. 
+\item Smoothing parameter of ShiftBeta can be set manually.
+\item Robust handling for smoothing parameters of ModifiedShiftBeta.
+\item Fixed probability checks in TLM.
+\item Parallel estimation of gigantic LM through SGE
+\item Better management of sub dictionary with build-lm.sh   
+\item Minor bug fixes
+\end{itemize}
+
+\subsection{Version 5.05}
+\begin{itemize}
+\item (Optional) computation of OOV penalty in terms of single OOV word instead of OOV class
+\item Extended use of OOV penalty to the standard input LM scores of compile-lm. 
+\item Minor bug fixes
+\end{itemize}
+
+\subsection{Version 5.10}
+\begin{itemize}
+\item Extended ngt to compute statistics for approximated Kneser-Ney smoothing
+\item New implementation of approximated Kneser-Ney smoothing method
+\item Minor bug fixes
+\item More to be added here ....
+\end{itemize}
+
+\subsection{Version 5.20}
+\begin{itemize}
+\item Improved tracing of back-offs
+\item Added command prune-lm  (thanks to Fabio Brugnara)
+\item Extended lprob function to supply back-off weight/level information
+\item Improved back-off handling of OOV words with quantized LM
+\item Added more debug modalities to compile-lm
+\item Fixed minor bugs in regression tests
+\item Updated documentation
+\end{itemize}
+
+\subsection{Version 5.21}
+\begin{itemize}
+\item Addition of interpolate-lm 
+\item Added LM filtering to compile-lm
+\item Improved regression tests
+\item Integration of interpolated LMs in Moses
+\item Extended tests on compilers and platforms
+\item Improved documentation with website
+\end{itemize}
+
+\subsection{Version 5.22}
+\begin{itemize}
+\item Use of AutoConf/AutoMake toolkit compilation and installation
+\end{itemize}
+
+\subsection{Version 5.30}
+\begin{itemize}
+\item Support for a safe management of LMs with a total amount of $n$-grams larger than 250 million
+\item Use of a new parameter to specify a directory for temporary computation because the default ("/tmp") could be too small
+\item Improved a safer method of concatenation of gzipped sub lms
+\item Improved management of log files
+\end{itemize}
+
+\subsection{Version 5.40}
+\begin{itemize}
+\item Merging of internal-only tlm code into the public version
+\item Updated documentation into the public version
+\item Included documentation into the public version
+\end{itemize}
+
+\subsection{Version 5.50}
+
+\begin{itemize}
+\item {\bf 5.50.01}
+\begin{itemize}
+\item Binary saving directly with tlm
+\item Speed improvement through 
+\item Caching of probability and states of $n$-grams in the LM interface
+\item Storing of $n$-grams in inverted order
+\end{itemize}
+
+\item {\bf 5.50.02}
+\begin{itemize}
+\item Optional creation of documentation
+\item Improved documentation
+\item Optional computation of the perplexity at sentence-level
+\end{itemize}
+
+\end{itemize}
+
+\subsection{Version 5.60}
+\begin{itemize}
+\item {\bf 5.60.01}
+\begin{itemize}
+\item Handling of class/chunk LMs with both compile-lm and interpolate-lm
+\item Improved pruning strategy to handle with sentence-start symbols
+\item Improved documentation and examples
+\end{itemize}
+\item {\bf 5.60.02}
+\begin{itemize}
+\item Code cleanup 
+\end{itemize}
+\item {\bf 5.60.03 (r404)}
+\begin{itemize}
+\item Xcode project
+\item import from IRSTLM repository (revision 4263)
+\end{itemize}
+
+\end{itemize}
+
+\subsection{Version 5.70}
+\begin{itemize}
+\item {\bf 5.70.01 (r454)}
+\begin{itemize}
+\item Class-based LM
+\item Added improved-kneser-ney smoothing for lm-build-qsub.sh
+\item Enabled different singleton pruning policy for each submodel of mixture LM
+\item Enabled the possibility to load an existing LM up to a specific level smaller than the actual LM order
+\item Code tracing
+\item Handling of error codes
+\item Handling of long filenames and parameters
+\item Improved parallel code compilation
+\item Improved documentation and examples
+\end{itemize}
+\item {\bf 5.70.02 (r469)}
+\begin{itemize}
+\item Code optimization
+\item Common interface for all LM types
+\end{itemize}
+\end{itemize}
+
+\subsection{Version 5.80}
+\begin{itemize}
+\item {\bf 5.80.01 (r501)}
+\begin{itemize}
+\item Facility to {\em beautify} source code
+\item Re-activation of filtering on a sub-dictionary
+\item Code optimization related to LM dumping
+\item Transformation of scripts into Bourne shell scripts 
+\end{itemize}
+\item {\bf 5.80.03 (r579)}
+\begin{itemize}
+\item Data selection tool
+\item Handling of precision upper- and lower-bounds by means of constants.
+\item Improved of Xcode project
+\item Improved code compilation
+\item Improved documentation
+\item Improved handling of help
+\item Facility to check whether IRSTLM is compile with or without caching
+\end{itemize}
+\item {\bf 5.80.05 (r642)}
+\begin{itemize}
+\item Introduction of {\em namespace irstlm}
+\item Code optimization
+\item Code compliant with OsX Maverick
+\item Support for Redis output format to ngt
+\item Support CRC16 algorithm
+\item Improved plsa command and regression test
+\item Improved handling of tracing
+\item Improved handling of help
+\item Improved handling of error messages
+\end{itemize}
+\item {\bf 5.80.06 (r647)}
+\begin{itemize}
+\item Improved code compilation
+\end{itemize}
+\item {\bf 5.80.07}
+\begin{itemize}
+\item Changes to LM smoothing types; removed Good-Turing, added Approximated Modified ShiftBeta, renaming
+\item Added an additional pruning method, based on level-dependent pruning frequency
+\item Improved code compilation
+\item Improved documentation
+\item Code cleanup
+\item Added support for long names of parameters
+\item Improved output format 
+\end{itemize}
+\item {\bf 5.80.08}
+\begin{itemize}
+\item Added functionality to score n-grams in isolation
+\item Added level-based caches for storing prob, state, and statesize
+\item Improved management of tracing assert
+\item Improved management of tracing/assert/caching AutoConf compilation flags
+\item Improved output format
+\item Improved code compilation
+\item Code cleanup
+\end{itemize}
+\end{itemize}
+
+
+\COMMENT{
+\subsection{Version 5.xx}
+\begin{itemize}
+\item {\bf 5.xx.01}
+\begin{itemize}
+\item 
+\end{itemize}
+\end{itemize}
+}
+
diff --git a/doc/tlm.tex b/doc/tlm.tex
new file mode 100644
index 0000000..9dadc4b
--- /dev/null
+++ b/doc/tlm.tex
@@ -0,0 +1,88 @@
+Language models have to  cope with out-of-vocabulary words, that is internally represented
+with the word class  {\tt \_unk\_}.  In order  to
+compare perplexity of LMs having  different vocabulary size it is better
+to define  a conventional dictionary  size, or dictionary  upper bound
+size,  trough the  parameter  ({\tt -dub}).  In  the  following example,  we
+compare the perplexity of the full vocabulary LM against the perplexity of the
+LM estimated over the more frequent 10K-words. In our comparison, we assume a dictionary 
+upper bound of one million words.
+
+\begin{verbatim}
+$>tlm -tr=train.10k.www -n=3 -lm=wb -te=test -dub=1000000
+  n=49984 LP=342160.8721 PP=939.5565162 OVVRate=0.07666453265
+
+$>tlm -tr=train.www -n=3 -lm=wb -te=test -dub=1000000
+  n=49984 LP=336276.7842 PP=835.2144716 OVVRate=0.05007602433
+\end{verbatim}
+
+
+\noindent
+The  large  difference  in  perplexity  between the two LMs is   explained  by  the 
+significantly higher  OOV rate of the 10K-word LM.
+
+\noindent
+N-gram LMs generally apply frequency smoothing techniques, and combine
+smoothed frequencies according to  two main schemes: interpolation and
+back-off.  The  toolkit assumes interpolation
+as default.  The back-off  scheme is computationally more costly but
+often provides better performance. It  can be activated with the option
+{\tt -bo=yes}, e.g.:
+
+\begin{verbatim}
+$>tlm -tr=train.10k.www -n=3 -lm=wb -te=test -dub=1000000 -bo=yes
+  n=49984 LP=337278.3227 PP=852.1186066 OVVRate=0.07666453265
+\end{verbatim}
+
+
+\noindent
+This toolkit implements several frequency smoothing methods, which are
+specified  by  the  parameter  {\tt -lm}.  Three  methods  are  particularly
+recommended:
+\begin{itemize}
+\item [a)] {\bf Modified shift-beta}, also known as  ``improved kneser-ney smoothing''.  
+This smoothing scheme gives top performance when training data is not 
+very sparse but it is more time and memory consuming during the estimation phase: 
+
+\begin{verbatim}
+$>tlm -tr=train.www -n=3 -lm=msb -te=test -dub=1000000 -bo=yes
+  n=49984 LP=321877.3411 PP=626.1609806 OVVRate=0.05007602433
+\end{verbatim}
+
+
+\item [b)] {\bf Witten Bell smoothing}. This is an excellent smoothing
+   method which works well in every data condition and is much less time and memory consuming:
+
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=wb -te=test -dub=1000000  -bo=yes
+  n=49984 LP=331577.2279 PP=760.2652095 OVVRate=0.05007602433
+\end{verbatim}
+
+\item [c)] {\bf Shift-beta smoothing}. This smoothing method is a simpler and cheaper version
+of the Modified shift-beta method and works sometimes better than Witten-Bell method: 
+
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=sb -te=test -dub=1000000  -bo=yes
+  n=49984 LP=334724.5032 PP=809.6750442 OVVRate=0.05007602433
+\end{verbatim}
+
+\noindent
+Moreover, the non linear smoothing parameter $\beta$ can be specified with the option {\tt -beta}:
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=sb -beta=0.001 -te=test -dub=1000000  
+       -bo=yes
+  n=49984 LP=449339.8282 PP=8019.836058 OVVRate=0.05007602433
+\end{verbatim}
+This could be helpful in case we need to use language models with very limited frequency smoothing.
+
+\end{itemize}
+\subsection*{Limited Vocabulary}
+\noindent
+Using an  n-gram table  with a fixed  or limited  dictionary  will cause
+some performance  degradation, as LM smoothing  statistics result
+slightly distorted. A  valid alternative is to estimate  the LM on the
+full dictionary of the training corpus and to use a limited dictionary
+just when  saving the  LM on a  file.  This  can be achieved  with the
+option {\tt -d} (or {\tt -dictionary}):
+\begin{verbatim}
+$> tlm -tr=train.www -n=3 -lm=msb -bo=y -te=test -o=train.lm -d=top10k
+\end{verbatim}
\ No newline at end of file

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/irstlm.git



More information about the debian-science-commits mailing list