[vspline] 01/01: initial upload KFJ 2017-09-06

Kay F. Jahnke kfj-guest at moszumanska.debian.org
Wed Sep 6 16:44:29 UTC 2017


This is an automated email from the git hooks/post-receive script.

kfj-guest pushed a commit to branch master
in repository vspline.

commit b50eb4b8e6c42975d3f3217435f777d2b5c22e61
Author: Kay F. Jahnke <kfj-guest at moszumanska.debian.org>
Date:   Wed Sep 6 16:26:32 2017 +0000

    initial upload KFJ 2017-09-06
---
 LICENSE                      |   27 +
 README.rst                   |  153 +++
 basis.h                      |  247 +++++
 brace.h                      |  546 ++++++++++
 bspline.h                    |  850 ++++++++++++++++
 common.h                     |  317 ++++++
 debian/changelog             |    5 +
 debian/compat                |    1 +
 debian/control               |   38 +
 debian/copyright             |   28 +
 debian/debhelper-build-stamp |    1 +
 debian/files                 |    1 +
 debian/rules                 |   10 +
 debian/source/format         |    1 +
 debian/vspline-dev.examples  |   13 +
 debian/vspline-dev.install   |    1 +
 debian/vspline-dev.substvars |    2 +
 debian/watch                 |   13 +
 doxy.h                       |  324 ++++++
 eval.h                       | 1480 +++++++++++++++++++++++++++
 example/channels.cc          |  150 +++
 example/complex.cc           |   74 ++
 example/eval.cc              |  177 ++++
 example/gradient.cc          |  116 +++
 example/gsm.cc               |  138 +++
 example/gsm2.cc              |  197 ++++
 example/impulse_response.cc  |  134 +++
 example/roundtrip.cc         |  393 +++++++
 example/slice.cc             |  127 +++
 example/slice2.cc            |  192 ++++
 example/slice3.cc            |  147 +++
 example/splinus.cc           |   99 ++
 example/use_map.cc           |  113 +++
 filter.h                     | 1905 ++++++++++++++++++++++++++++++++++
 map.h                        |  528 ++++++++++
 multithread.h                |  671 ++++++++++++
 poles.h                      |  690 +++++++++++++
 prefilter.h                  |  207 ++++
 prefilter_poles.cc           |  176 ++++
 remap.h                      | 1339 ++++++++++++++++++++++++
 thread_pool.h                |  174 ++++
 unary_functor.h              |  421 ++++++++
 vspline.doxy                 | 2303 ++++++++++++++++++++++++++++++++++++++++++
 vspline.h                    |   46 +
 44 files changed, 14575 insertions(+)

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..d4ebd66
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,27 @@
+vspline - generic C++ code for creation and evaluation
+          of uniform b-splines
+
+        Copyright 2015, 2016 by Kay F. Jahnke
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the
+Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..1417fb4
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,153 @@
+===================================================================
+vspline - generic C++ code to create and evaluate uniform B-splines
+===================================================================
+
+------------
+Introduction
+------------
+
+vspline aims to provide a free, comprehensive and fast library for uniform B-splines.
+
+Uniform B-splines are a method to provide a 'smooth' interpolation over a set of
+uniformly sampled data points. They are commonly used in signal processing as they
+have several 'nice' qualities - an in-depth treatment and comparison to other
+interpolation methods can be found in the paper 'Interpolation Revisited' [CIT2000]_
+by Philippe Thévenaz, Member, IEEE, Thierry Blu, Member, IEEE, and Michael Unser,
+Fellow, IEEE.
+
+While there are several freely available packets of B-spline code, I failed to find
+one which is comprehensive, efficient and generic at once. vspline attempts to be
+all that, making use of generic programming in C++11, and of common, but often underused
+hardware features in modern processors. Overall, there is an emphasis on speed, even
+if this makes the code more complex. I tried to eke as much performance out of the
+hardware at my disposal as possible, only compromising when the other design goals
+would have been compromised.
+
+While some of the code is quite low-level, there are reasonably high-level mechanisms
+to interface with vspline, allowing easy access to it's functionality without requiring
+users to familiarize themselves with the internal workings. High-level approach is
+provided via class 'bspline' defined in bspline.h, and via the remap functions
+defined in remap.h.
+
+While I made an attempt to write code which is portable, vspline is only tested with
+g++ and clang++ on Linux. It may work in other environments, but it's unlikely it will
+do so without modification. An installation of Vigra_ is needed to compile, installation
+of Vc_ is optional but recommended.
+
+vspline is relatively new, the current version might qualify as late beta.
+I have made efforts to cover 'reasonable' use cases, but I'm sure there are
+corner cases and unexpected scenarios where my code fails. The code is not
+well shielded against inappropriate parameters. The intended audience is
+developers rather than end users; if the code is used as the 'engine' in
+a well-defined way, parametrization can be tailored by the calling code.
+Parameter checking is avoided where it gets in the way of speedy operation.
+
+-----
+Scope
+-----
+
+There are (at least) two different approaches to tackle B-splines as a mathematical problem. The first one is to look at them as a linear algebra problem. Calculating the B-spline coefficients is done by solving a set of equations, which can be codified as banded diagonal matrices with slight disturbances at the top and bottom, resulting from boundary conditions. The mathematics are reasonably straightforward and can be efficiently coded (at least for lower-degree splines), but I found i [...]
+
+The second approach to B-splines comes from signal processing, and it's the one which I found most commonly used in the other implementations I studied. It generates the B-spline coefficients by applying a forward-backward recursive digital filter to the data and usually implements boundary conditions by picking appropriate initial causal and anticausal coefficients. Once I had understood the process, I found it elegant and beautiful - and perfectly general, lending itself to the impleme [...]
+
+I have made an attempt to generalize the code so that it can handle
+
+- arbitrary real data types and their aggregates [1]_
+- coming in strided memory
+- a reasonable selection of boundary conditions
+- used in either an implicit or an explicit scheme of extrapolation
+- arbitrary spline orders
+- arbitrary dimensions of the spline
+- in multithreaded code
+- using the CPU's vector units if possible
+
+On the evaluation side I provide
+
+- evaluation of the spline at point locations in the defined range
+- evaluation of the spline's derivatives
+- mapping of arbitrary coordinates into the defined range
+- evaluation of nD arrays of coordinates ('remap' function)
+- coordinate-fed remap function ('index_remap')
+- functor-based remap, aka 'transform' function
+- functor-based 'apply' function
+
+The code at the very core of my B-spline coefficient generation code evolved from the code by Philippe Thévenaz which he published here_, with some of the boundary condition treatment code derived from formulae which Philippe Thévenaz communicated to me. Next I needed code to handle multidimensional arrays in a generic fashion in C++. I chose to use Vigra_. Since my work has a focus on signal (and, more specifically image) processing, it's an excellent choice, as it provides a large body [...]
+
+I did all my programming on a Kubuntu_ system, running on an intel(R) Core (TM) i5-4570 CPU, and used GNU gcc_ and clang_ to compile the code in C++11 dialect. While I am confident that the code runs on other CPUs, I have not tested it with other compilers or operating systems (yet).
+
+.. _here: http://bigwww.epfl.ch/thevenaz/interpolation/
+.. _Vigra: http://ukoethe.github.io/vigra/
+.. _Vc: https://compeng.uni-frankfurt.de/index.php?id=vc 
+.. _Kubuntu: http://kubuntu.org/
+.. _gcc: https://gcc.gnu.org/
+.. _clang: http://http://clang.llvm.org/
+
+.. [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000, available online here_
+
+.. _online here: http://bigwww.epfl.ch/publications/thevenaz0002.pdf
+
+.. [1] I use 'aggregate' here to mean a collection of identical elements, in contrast to what C++ defines as an aggregate type. So aggregates would be pixels, vigra TinyVectors, and, also, complex types.
+
+-------------
+Documentation
+-------------
+
+There is reasonably comprehensive documentation for vspline, which can be generated
+by running doxygen:
+
+doxygen vspline.doxy
+
+in vspline's base folder.
+
+There is also online documentation here:
+
+https://kfj.bitbucket.io
+
+-----
+Speed
+-----
+
+While performance will vary widely from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of several runs:
+
+::
+
+  testing bc code MIRROR spline degree 3
+  avg 32 x prefilter:........................ 13.093750 ms
+  avg 32 x remap from unsplit coordinates:... 59.218750 ms
+  avg 32 x remap with internal spline:....... 75.125000 ms
+  avg 32 x index_remap ...................... 57.781250 ms
+
+  testing bc code MIRROR spline degree 3 using Vc
+  avg 32 x prefilter:........................ 9.562500 ms
+  avg 32 x remap from unsplit coordinates:... 22.406250 ms
+  avg 32 x remap with internal spline:....... 35.687500 ms
+  avg 32 x index_remap ...................... 21.656250 ms
+
+As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higher-degree splines doesn't take much more time (when using Vc):
+
+::
+
+  testing bc code MIRROR spline degree 5 using Vc
+  avg 32 x prefilter:........................ 10.687500 ms
+
+  testing bc code MIRROR spline degree 7 using Vc
+  avg 32 x prefilter:........................ 13.656250 ms
+
+Using double precision arithmetics, vectorization doesn't help so much, and prefiltering is actually slower on my system when using Vc. Doing a complete roundtrip run on your system should give you an idea about which mode of operation best suits your needs.
+
+----------
+History
+----------
+
+Some years ago, I needed uniform B-splines for a project in python. I looked for code in C which I could easily wrap with cffi_, as I intended to use it with pypy_, and found K. P. Esler's libeinspline_. I proceeded to code the wrapper, which also contained a layer to process Numpy_ nD-arrays, but at the time I did not touch the C code in libeinspline. The result of my efforts is still available from the repository_ I set up at the time. I did not use the code much and occupied myself wi [...]
+
+.. _cffi: https://cffi.readthedocs.org/en/latest/
+.. _pypy: http://pypy.org/
+.. _libeinspline: http://einspline.sourceforge.net/
+.. _Numpy: http://www.numpy.org/
+.. _repository: https://bitbucket.org/kfj/python-bspline
+
+
+
+
+
diff --git a/basis.h b/basis.h
new file mode 100644
index 0000000..7573d45
--- /dev/null
+++ b/basis.h
@@ -0,0 +1,247 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file basis.h
+
+    \brief Code to calculate the value B-spline basis function
+    and it's derivatives.
+
+    There are several variants in here. First, there is a perfectly general
+    routine, using the Cox-de Boor recursion. While this is 'nice to have',
+    vspline does not actually use it (except as a reference in unit testing).
+
+    vspline only needs evaluation of the B-spline basis function at multiples
+    of 0.5. With these values it can construct it's evaluators which in turn
+    are capable of evaluating the spline at real coordinates.
+    
+    So next is a specialized routine using an adapted version of the recursion
+    to calculate the basis function's value for integral operands. This isn't
+    used in vspline either - instead vspline uses a third version which abbreviates
+    the recursion by relying on precomputed values for the basis function with
+    derivative 0, which the recursion reaches after as many levels as the
+    requested derivative, so seldom deeper than 2. That makes it very fast.
+
+    For comparison there is also a routine calculating an approximation of the
+    basis function's value (only derivative 0) by means of a gaussian. This
+    routine isn't currently used in vspline.
+
+    for a discussion of the b-spline basis function, have a look at
+    http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-basis.html
+*/
+
+#ifndef VSPLINE_BASIS_H
+#define VSPLINE_BASIS_H
+
+// poles.h has precomputed basis function values sampled at n * 1/2
+
+#include <vspline/poles.h>
+
+namespace vspline {
+
+/// Implementation of the Cox-de Boor recursion formula to calculate
+/// the value of the bspline basis function. This code is taken from vigra
+/// but modified to take the spline degree as a parameter. This makes it
+/// easier to handle, since we don't need a vigra::BSpline object of a specific
+/// order to call it. This code is quite expensive for higer spline orders
+/// because the routine calls itself twice recursively, so the performance is
+/// N*N with the spline's degree. Luckily there are ways around using this routine
+/// at all - whenever we need the b-spline basis function value in vspline, it is
+/// at multiples of 1/2, and poles.h has precomputed values for all spline
+/// degrees covered by vspline. I leave the code in here for reference purposes.
+
+template < class real_type >
+real_type gen_bspline_basis ( real_type x , int degree , int derivative )
+{
+  if ( degree == 0 )
+  {
+    if ( derivative == 0 )
+        return ( x < real_type(0.5) && real_type(-0.5) <= x )
+               ? real_type(1.0)
+               : real_type(0.0) ;
+    else
+        return real_type(0.0);
+  }
+  if ( derivative == 0 )
+  {
+    real_type n12 = real_type((degree + 1.0) / 2.0);
+    return (     ( n12 + x )
+                * gen_bspline_basis<real_type> ( x + real_type(0.5) , degree - 1 , 0 )
+              +   ( n12 - x )
+                * gen_bspline_basis<real_type> ( x - real_type(0.5) , degree - 1 , 0 )
+            )
+            / degree;
+  }
+  else
+  {
+    --derivative;
+    return   gen_bspline_basis<real_type> ( x + real_type(0.5) , degree - 1 , derivative )
+           - gen_bspline_basis<real_type> ( x - real_type(0.5) , degree - 1 , derivative ) ;
+  }
+}
+
+/// this routine is a helper routine to cdb_bspline_basis, the
+/// modified Cox-de Boor recursion formula to calculate the b-spline basis function
+/// for integral operands, operating in int as long as possible. This is achieved by
+/// working with 'x2', the doubled x value. Since in the 'real' recursion, the next
+/// iteration is called with x +/- 1/2, we can call the 'doubled' version with x +/- 1.
+/// This routine recurses 'all the way down to degree 0, So the result is, disregarding
+/// arithmetic errors, the same as the result obtained with the general routine.
+
+template < class real_type >
+real_type cdb_bspline_basis_2 ( int x2 , int degree , int derivative )
+{
+  if ( degree == 0 )
+  {
+    if ( derivative == 0 )
+        return ( x2 < 1 && -1 <= x2 )
+               ? real_type(1.0)
+               : real_type(0.0) ;
+    else
+        return real_type(0.0);
+  }
+  if ( derivative == 0 )
+  {
+    int n122 = degree + 1 ;
+    return (     ( n122 + x2 )
+                * cdb_bspline_basis_2<real_type> ( x2 + 1 , degree - 1 , 0 )
+              +   ( n122 - x2 )
+                * cdb_bspline_basis_2<real_type> ( x2 - 1 , degree - 1 , 0 )
+            )
+            / ( 2 * degree ) ;
+  }
+  else
+  {
+    --derivative;
+    return   cdb_bspline_basis_2<real_type> ( x2 + 1 , degree - 1 , derivative )
+           - cdb_bspline_basis_2<real_type> ( x2 - 1 , degree - 1 , derivative ) ;
+  }
+}
+
+/// modified Cox-de Boor recursion formula to calculate the b-spline basis function
+/// for integral operands, delegates to the 'doubled' routine above
+
+template < class real_type >
+real_type cdb_bspline_basis ( int x , int degree , int derivative = 0 )
+{
+  return cdb_bspline_basis_2<real_type> ( x + x , degree , derivative ) ;
+}
+
+/// see bspline_basis() below!
+/// this helper routine works with the doubled value of x, so it can serve for calls
+/// equivalent to basis ( x + .5 ) or basis ( x - .5 ) as basis2 ( x + 1 ) and
+/// basis2 ( x - 1 ) having precalculated the basis function at .5 steps, we can
+/// therefore avoid using the general recursion formula. This is a big time-saver
+/// for high degrees. Note, though, that calculating the basis function for a
+/// spline's derivatives still needs recursion, with two branches per level.
+/// So calculating the basis function's value for high derivatives still consumes
+/// a fair amount of time.
+
+template < class real_type >
+real_type bspline_basis_2 ( int x2 , int degree , int derivative )
+{
+  if ( degree == 0 )
+  {
+    if ( derivative == 0 )
+        return ( x2 < 1 && -1 <= x2 )
+               ? real_type(1.0)
+               : real_type(0.0) ;
+    else
+        return real_type(0.0);
+  }
+  if ( derivative == 0 )
+  {
+    if ( abs ( x2 ) > degree )
+      return real_type ( 0 ) ;
+    // for derivative 0 we have precomputed values:
+    const long double * pk = vspline_constants::precomputed_basis_function_values [ degree ] ;
+    return pk [ abs ( x2 ) ] ;
+  }
+  else
+  {
+    --derivative;
+    return   bspline_basis_2<real_type> ( x2 + 1 , degree - 1 , derivative )
+           - bspline_basis_2<real_type> ( x2 - 1 , degree - 1 , derivative ) ;
+  }
+}
+
+/// bspline_basis produces the value of the b-spline basis function for
+/// integral operands, the given degree 'degree' and the desired derivative.
+/// It turns out that this is all we ever need inside vspline, the calculation
+/// of the basis function at arbitrary points is performed via the matrix
+/// multiplication in the weight generating functor, and this functor sets
+/// it's internal matrix up with bspline basis function values at integral
+/// locations.
+///
+/// bspline_basis delegates to bspline_basis_2 above, which picks precomputed
+/// values as soon as derivative becomes 0. This abbreviates the recursion
+/// a lot, since usually the derivative requested is 0 or a small integer.
+/// all internal calculations in vspline accessing b-spline basis function
+/// values are currently using this routine, not the general routine.
+///
+/// Due to the precalculation with long double arithmetic, the precomputed
+/// values aren't precisely equal to the result of running the recursive
+/// routines above on the same arguments.
+
+template < class real_type >
+real_type bspline_basis ( int x , int degree , int derivative = 0 )
+{
+  return bspline_basis_2<real_type> ( x + x , degree , derivative ) ;
+}
+
+/// Gaussian approximation to B-spline basis function. This routine
+/// approximates the basis function of degree spline_degree for real x.
+/// I checked for all degrees up to 20. The partition of unity quality of the
+/// resulting reconstruction filter is okay for larger degrees, the cumulated
+/// error over the covered interval is quite low. Still, as the basis function
+/// is never actually evaluated in vspline (whenever it's needed, it is needed
+/// at n * 1/2 and we have precomputed values for that) there is not much point
+/// in having this function around. I leave the code in for now.
+
+template < typename real_type >
+real_type gaussian_bspline_basis_approximation ( real_type x , int degree )
+{
+  real_type sigma = ( degree + 1 ) / 12.0 ;
+  return   real_type(1.0)
+         / sqrt ( real_type(2.0 * M_PI) * sigma )
+         * exp ( - ( x * x ) / ( real_type(2.0) * sigma ) ) ;
+}
+
+} ; // end of namespace vspline
+
+#endif // #define VSPLINE_BASIS_H
diff --git a/brace.h b/brace.h
new file mode 100644
index 0000000..98d251a
--- /dev/null
+++ b/brace.h
@@ -0,0 +1,546 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file brace.h
+
+    \brief This file provides code for 'bracing' the spline coefficient array.
+
+    Inspired by libeinspline, I wrote code to 'brace' the spline coefficients. The concept is
+    this: while the IIR filter used to calculate the coefficients has infinite support (though
+    arithmetic precision limits this in real-world applications), the evaluation of the spline
+    at a specific location only looks at a small window of coefficients (compact, finite support).
+    This fact can be exploited by taking note of how large the support area is and providing
+    a few more coefficients in a frame around the 'core' coefficients to allow the evaluation
+    to proceed without having to check for boundary conditions. While the difference is not
+    excessive (the main computational cost is the actual evaluation itself), it's still
+    nice to be able to code the evaluation without boundary checking, which makes the code
+    very straightforward and legible.
+
+    There is another aspect to bracing: In my implementation of vectorized evaluation,
+    the window into the coefficient array used to pick out coefficients to evaluate at
+    a specific location is coded as a set of offsets from it's 'low' corner. This way,
+    several such windows can be processed in parallel. This mechanism can only function
+    efficiently in a braced coefficient array, since it would otherwise have to give up
+    if any of the windows accessed by the vector of coordinates had members outside the
+    (unbraced) coefficient array and submit the coordinate vector to individual processing.
+    I consider the logic to code this and the loss in performance too much of a bother
+    to go down this path; all my evaluation code uses braced coefficient arrays. Of course
+    the user is free to omit bracing, but then they have to use their own evaluation
+    code.
+
+    What's in the brace? Of course this depends on the boundary conditions chosen.
+    In vspline, I offer code for several boundary conditions, but most have something
+    in common: the original, finite sequence is extrapolated into an infinite periodic
+    signal. With straight PERIODIC boundary conditions, the initial sequence is
+    immediately followed and preceded by copies of itself. The other boundary conditions
+    mirror the signal in some way and then repeat the mirrored signal periodically.
+    Using boundary conditions like these, both the extrapolated signal and the
+    coefficients share the same periodicity and mirroring.
+    
+    There are two ways of arriving at a braced coeffcient array: We can start from the
+    extrapolated signal, pick a section large enough to make margin effects vanish
+    (due to limited arithmetic precision), prefilter it and pick out a subsection containing
+    the 'core' coefficients and their support. Alternatively, we can work only on the core
+    coefficients, calculate suitable initial causal and anticausal coeffcients (where the
+    calculation considers the extrapolated signal, which remains implicit), apply the filter
+    and *then* surround the core coefficient array with more coeffcients (the brace) following
+    the same extrapolation pattern as we imposed on the signal, but now on the coefficients
+    rather than on the initial knot point values.
+  
+    The bracing can be performed without any solver-related maths by simply copying
+    (possibly trivially modified) slices of the core coefficients to the margin area.
+
+    Following the 'implicit' scheme, my default modus operandi braces after the
+    prefiltering. Doing so, it is posible to calculate the inital causal and anticausal
+    coefficient for the prefilter exactly. But this exactness is still, eventually,
+    subject to discretization and can only be represented after quantization. If,
+    instead, we prefilter a suitably extrapolated signal, now with arbitrary boundary
+    conditions, the margin effects will vanish towards the center (due to the characteristics
+    of the filter), and the 'core' coefficients will end up the same as in the first
+    approach. So we might as well extrapolate the signal 'far enough', pick any boundary
+    conditions we like (even zero padding), prefilter, and discard the margin outside the
+    area which is unaffected by margin effects. The result is, within arithmetic precision,
+    the same. Both approaches have advantages and disadvantages:
+
+    Implicit extrapolation needs less memory - we only need to provide storage for the
+    core coeffcients, which is just as much as we need for the original signal, so we can
+    operate in-place. The disadvantage of the implicit scheme is that we have to capture
+    the implicit extrapolation in code to calculate the initial causal/anticausal coefficients,
+    which is non-trivial and requires separate routines for each case, as can be seen in my
+    prefiltering code. And if, after prefiltering, we want to brace the core coeffcients
+    for efficient evaluation, we still need additional memory, which, if it hasn't been
+    allocated around the core before prefiltering, even requires us to copy the data out
+    into a larger memory area.
+
+    Explicit extrapolation needs more memory. A typical scheme would be to anticipate the
+    space needed for the explicit extrapolation, allocate enough memory for the extrapolated
+    signal, place the same into the center of the allocated memory, perform the extrapolation
+    and then prefilter. The advantage is that we can run the prefilter with arbitrary initial
+    causal/anticausal coefficients. No matter what the extrapolation looks like, we can always
+    use the same code. And we can extrapolate in any way we see fit, without having to produce
+    code to deal with our choice. If we pick the frame of extrapolated values large enough,
+    we can even pick out the 'braced' coefficient array from the result of the filter.
+
+    Obviously, there is no one 'right' way of doing this. Offered several choices
+    of implicit extrapolation, the user can choose between the implicit and explicit scheme.
+    The code in this file is useful for both choices: for the implicit scheme, bracing is
+    applied after prefiltering to enable evaluation with vspline. For the explicit scheme,
+    bracing may be used on the original data before prefiltering with arbitrary boundary
+    conditions, if the user's extrapolation scheme is covered by the code given here.
+
+    When using the higher-level access methods (via bspline objects), using the explicit or
+    implicit scheme becomes a matter of passing in the right flag, so at this level, a deep
+    understanding of the extrapolation mechanism isn't needed at all. I use the implicit scheme
+    as the default.
+
+    Since the bracing mainly requires copying data or trivial maths we can do the operations
+    on higher-dimensional objects, like slices of a volume. To efficiently code these operations
+    we make use of vigra's multi-math facility and it's bindAt array method, which makes
+    these subarrays easily available.
+
+    TODO: while this is convenient, it's not too fast, as it's neither multithreaded nor
+    vectorized. Still in most 'normal' scenarios the execution time is negligible...
+
+    TODO: there are 'pathological' cases where one brace is larger than the other brace
+    and the width of the core together. These cases can't be handled for all bracing modes
+    and will result in an exception.
+*/
+
+#ifndef VSPLINE_BRACE_H
+#define VSPLINE_BRACE_H
+
+#include <vigra/multi_array.hxx>
+#include <vigra/multi_iterator.hxx>
+#include "common.h"
+
+namespace vspline {
+
+/// class bracer encodes the entire bracing process. It also gives the metrics
+/// for the size of the braces expected by the evaluation code.
+
+template < class view_type >
+struct bracer
+{
+  typedef typename view_type::value_type value_type ;
+  typedef typename ExpandElementResult<value_type>::type ele_type ;
+  enum { dimension = view_type::actual_dimension } ;
+  typedef typename view_type::difference_type shape_type ;
+
+  /// calculates the size of the left brace for a given spline degree. In most cases,
+  /// this is the size of the support of the reconstruction filter, rounded up to the
+  /// next integer. So for an even spline, we get the same left brace size as for the
+  /// odd spline one degree higher.
+  ///
+  /// reflect boundary conditions work slightly differently: we somehow have to access
+  /// the area around the point of reflection - coordinates -1 .. 0 and M-1 .. M, which
+  /// are part of the extrapolated signal, so we need a wider brace - 0.5 at either end.
+  /// For an even spline, this results in the same value as for other boundary conditions,
+  /// since the even spline has smaller support (0.5 precisely). But for odd splines,
+  /// we need another coefficient on either end.
+
+  static int left_brace_size ( int spline_degree , bc_code bc )
+  {
+      return spline_degree / 2 ;
+  }
+
+/// The right handside bracing differs between periodic and mirrored splines, due to
+/// the amount of initial data: when specifying knot point data for a periodic spline,
+/// the first repetition is omitted (as it is the same value as at coordinate 0), but
+/// for evaluation, it has to be present, so the bracing code will produce it.
+///
+/// initially I was using the minimal bracing possible by coding:
+/// return left_brace_size ( spline_degree , bc ) + ( bc == PERIODIC ? 1 : 0 ) ;
+/// This has the disadvantage that for odd splines it requires checking if incoming
+/// coordinates are precisely at the right end of the defined range and splitting these
+/// coordinates to M-2, 1.0 instead of M-1, 0.0. Being more generous with the right brace
+/// and adding another layer makes this check unnecessary. Since this check is inner loop
+/// stuff where every clock cycle counts, I now use the more generous bracing.
+/// Note that for the periodic case I assume silently that incoming
+/// coordinates won't ever be M-1, as these can be mapped to 0.0 (due to the periodicity,
+/// this is equivalent) - so here I need an extended brace to capture the last unit spacing
+/// of the spline, but I don't need the additional extension to safeguard against v == M-1.
+/// If you use foreign evaluation routines you may want an additional coefficient here.
+  
+  static int right_brace_size ( int spline_degree , bc_code bc )
+  {
+    return   left_brace_size ( spline_degree , bc )
+           + ( ( ( spline_degree & 1 ) || ( bc == PERIODIC ) ) ? 1 : 0 ) ;
+  }
+
+/// this method gives the shape of the braced array, given the unbraced array's shape,
+/// the BC codes and the spline degree. For the shapes involved, this relation holds true:
+/// target_shape = left_corner + core_shape + right_corner
+/// So. for the implicit scheme, to evaluate from a braced spline, target_shape is the minimal
+/// coefficient array size needed by vspline's evaluation code. For the explicit scheme, this
+/// is the section of the coeficient array the evaluation code will look at.
+
+  static shape_type target_shape ( shape_type source_shape ,
+                            vigra::TinyVector < bc_code , dimension > bcv ,
+                            int spline_degree )
+  {
+    shape_type target_shape ;
+    for ( int d = 0 ; d < dimension ; d++ )
+      target_shape[d] =   source_shape[d]
+                        + left_brace_size ( spline_degree , bcv[d] )
+                        + right_brace_size ( spline_degree , bcv[d] ) ;
+    return target_shape ;
+  }
+
+// /// convenience variant of the previous routine using the same BC for all axes
+// 
+//   static shape_type target_shape ( shape_type source_shape ,
+//                             bc_code bc ,
+//                             int spline_degree )
+//   {
+//     vigra::TinyVector < bc_code , dimension > bcv ( bc ) ;
+//     return target_shape ( source_shape , bcv , spline_degree ) ;
+//   }
+  
+/// this method gives the left offset to the 'core' subarray (array minus bracing),
+/// given the BC codes and the spline degree
+
+  static shape_type left_corner ( vigra::TinyVector < bc_code , dimension > bcv ,
+                                  int spline_degree )
+  {
+    shape_type target_offset ;
+    for ( int d = 0 ; d < dimension ; d++ )
+      target_offset[d] = left_brace_size ( spline_degree , bcv[d] ) ;
+    return target_offset ;
+  }
+  
+/// this method gives the right offset to the 'core' subarray (array minus bracing),
+/// given the BC codes and the spline degree
+
+  static shape_type right_corner ( vigra::TinyVector < bc_code , dimension > bcv ,
+                                   int spline_degree )
+  {
+    shape_type target_offset ;
+    for ( int d = 0 ; d < dimension ; d++ )
+      target_offset[d] = right_brace_size ( spline_degree , bcv[d] ) ;
+    return target_offset ;
+  }
+  
+/// given a braced array, return the size of it's 'core', the array without applied bracing
+
+  static shape_type core_shape ( view_type& a ,
+                                 vigra::TinyVector < bc_code , dimension > bcv ,
+                                 int spline_degree )
+  {
+    return a.subarray (   a.shape()
+                        - (   right_corner ( bcv , spline_degree )
+                            + left_corner ( bcv , spline_degree ) ) ) ;
+  }
+
+/// produce a view to the core
+
+  static view_type core_view ( view_type& a ,
+                               vigra::TinyVector < bc_code , dimension > bc ,
+                               int spline_degree )
+  {
+    return a.subarray ( left_corner ( bc , spline_degree ) ,
+                        a.shape() - right_corner ( bc , spline_degree ) ) ;
+  }
+
+  /// for spherical images, we require special treatment for two-dimensional
+  /// input data, because we need to shift the values by 180 degrees, or half
+  /// the margin's width. But to compile, we also have to give a procedure
+  /// for the other cases (not 2D), so this is first:
+  
+  template < typename value_type >
+  void shift_assign ( value_type target , value_type source )
+  {
+    // should not ever get used, really...
+  }
+
+  /// specialized routine for the 2D case (the slice itself is 1D)
+
+  template < typename value_type >
+  void shift_assign ( MultiArrayView < 1 , value_type > target ,
+                      MultiArrayView < 1 , value_type > source )
+  {
+    // bit sloppy here, with pathological data (very small source.size()) this will
+    // be imprecise for odd sizes, for even sizes it's always fine. But then full
+    // sphericals always have size 2N * N, so odd sizes should not occur at all for dim 0
+    target = source ;
+    return ;
+    auto si = source.begin() + source.size() / 2 ;
+    auto se = source.end() ;
+    for ( auto& ti : target )
+    {
+      ti = *si ;
+      ++si ;
+      if ( si >= se )
+        si = source.begin() ;
+    }
+  }
+
+/// apply the bracing to the array, performing the required copy/arithmetic operations
+/// to the 'frame' around the core. This routine performs the operation along axis dim.
+/// This variant takes the sizes of the left and right brace without any reference to
+/// a spline's degree, so it can be fed arbitrary values. This is the most general bracing
+/// routine, which is used by the routines below which derive the brace's size from the
+/// spline's degree. It's also the routine to be used for explicitly extrapolating a signal:
+/// you place the data into the center of a larger array, and pass in the sizes of the 'empty'
+/// space which is to be filled with the extrapolated data.
+///
+/// the bracing is done one-left-one-right, to avoid corner cases as best as posible.
+
+  void apply ( view_type & a , // containing array
+               bc_code bc ,    // boundary condition code
+               int lsz ,       // space to the left which needs to be filled
+               int rsz ,       // ditto, to the right
+               int axis )      // axis along which to apply bracing 
+  {
+    int w = a.shape ( axis ) ; // width of containing array along axis 'axis'
+    int m = w - ( lsz + rsz ) ;    // width of 'core' array
+
+    if ( m < 1 )                   // has to be at least 1
+      throw shape_mismatch ( "combined brace sizes must be at least one less than container size" ) ;
+
+    if (    ( lsz > m + rsz )
+         || ( rsz > m + lsz ) )
+    {
+      // not enough data to fill brace
+      if ( bc == PERIODIC || bc == NATURAL || bc == MIRROR || bc == REFLECT )
+        throw std::out_of_range ( "each brace must be smaller than the sum of it's opposite brace and the core's width" ) ;
+    }
+
+    int l0 = lsz - 1 ; // index of innermost empty slice on the left; like begin()
+    int r0 = lsz + m ; // ditto, on the right
+
+    int lp = l0 + 1 ;  // index of leftmost occupied slice (p for pivot)
+    int rp = r0 - 1 ;  // index of rightmost occupied slice
+
+    int l1 = -1 ;     // index one before outermost empty slice to the left
+    int r1 = w ;      // index one after outermost empty slice on the right; like end()
+
+    int lt = l0 ;     // index to left target slice
+    int rt = r0 ;     // index to right target slice ;
+
+    int ls , rs ;     // indices to left and right source slice, will be set below
+
+    int ds = 1 ;      // step for source index, +1 == forẃard, used for all mirroring modes
+                      // for periodic bracing, it's set to -1.
+
+    switch ( bc )
+    {
+      case PERIODIC :
+      {
+        ls = l0 + m ;
+        rs = r0 - m ;
+        ds = -1 ;      // step through source in reverse direction
+        break ;
+      }
+      case NATURAL :
+      case MIRROR :
+      {
+        ls = l0 + 2 ;
+        rs = r0 - 2 ;
+        break ;
+      }
+      case CONSTANT :
+      case SPHERICAL :
+      case REFLECT :
+      {
+        ls = l0 + 1 ;
+        rs = r0 - 1 ;
+        break ;
+      }
+      case ZEROPAD :
+      {
+        break ;
+      }
+      case IDENTITY :
+      {
+        // these modes perform no bracing, return prematurely
+        return ;
+      }
+      default:
+      {
+        cerr << "bracing for BC code " << bc_name[bc] << " is not supported" << endl ;
+        break ;
+      }
+    }
+
+    for ( int i = max ( lsz , rsz ) ; i > 0 ; --i )
+    {
+      if ( lt > l1 )
+      {
+        switch ( bc )
+        {
+          case PERIODIC :
+          case MIRROR :
+          case REFLECT :
+          {
+            // with these three bracing modes, we simply copy from source to target
+            a.bindAt ( axis , lt ) = a.bindAt ( axis , ls ) ;
+            break ;
+          }
+          case NATURAL :
+          {
+            // here, we subtract the source slice from twice the 'pivot'
+            // easiest would be:
+            // a.bindAt ( axis , lt ) = a.bindAt ( axis , lp ) * value_type(2) - a.bindAt ( axis , ls ) ;
+            // but this fails in 1D TODO: why?
+            auto target = a.bindAt ( axis , lt ) ; // get a view to left target slice
+            target = a.bindAt ( axis , lp ) ;      // assign value of left pivot slice
+            target *= value_type(2) ;                // double that
+            target -= a.bindAt ( axis , ls ) ;     // subtract left source slice
+            break ;
+          }
+          case CONSTANT :
+          {
+            // here, we repeat the 'pivot' slice
+            a.bindAt ( axis , lt ) = a.bindAt ( axis , lp ) ;
+            break ;
+          }
+          case ZEROPAD :
+          {
+            // fill with 0
+            a.bindAt ( axis , lt ) = value_type() ;
+            break ;
+          }
+          case SPHERICAL : // needs special treatment
+          {
+            shift_assign ( a.bindAt ( axis , lt ) , a.bindAt ( axis , ls ) ) ;
+            break ;
+          }
+          default :
+            // default: leave untouched
+            break ;
+        }
+        --lt ;
+        ls += ds ;
+      }
+      if ( rt < r1 )
+      {
+        // essentially the same, but with rs instead of ls, etc.
+        switch ( bc )
+        {
+          case PERIODIC :
+          case MIRROR :
+          case REFLECT :
+          {
+            // with these three bracing modes, we simply copy from source to target
+            a.bindAt ( axis , rt ) = a.bindAt ( axis , rs ) ;
+            break ;
+          }
+          case NATURAL :
+          {
+            // here, we subtract the source slice from twice the 'pivot'
+            // the easiest would be:
+            // a.bindAt ( axis , rt ) = a.bindAt ( axis , rp ) * value_type(2) - a.bindAt ( axis , rs ) ;
+            // but this fails in 1D TODO: why?
+            auto target = a.bindAt ( axis , rt ) ; // get a view to right targte slice
+            target = a.bindAt ( axis , rp ) ;      // assign value of pivot slice
+            target *= value_type(2) ;                // double that
+            target -= a.bindAt ( axis , rs ) ;     // subtract source slice
+            break ;
+          }
+          case CONSTANT :
+          {
+            // here, we repeat the 'pivot' slice
+            a.bindAt ( axis , rt ) = a.bindAt ( axis , rp ) ;
+            break ;
+          }
+          case ZEROPAD :
+          {
+            // fill with 0
+            a.bindAt ( axis , rt ) = value_type() ;
+            break ;
+          }
+          case SPHERICAL : // needs special treatment
+          {
+            shift_assign ( a.bindAt ( axis , rt ) , a.bindAt ( axis , rs ) ) ;
+            break ;
+          }
+          default :
+            // default: leave untouched
+            break ;
+        }
+        ++rt ;
+        rs -= ds ;
+      }
+    }
+  }
+  
+/// This variant of apply braces along all axes in one go.
+
+  static void apply ( view_type& a ,          ///< target array, containing the core and (empty) frame
+               vigra::TinyVector < bc_code , dimension > bcv ,  ///< boundary condition codes
+               vigra::TinyVector < int , dimension > left_corner ,  ///< sizes of left braces
+               vigra::TinyVector < int , dimension > right_corner ) ///< sizes of right braces
+  {
+    for ( int dim = 0 ; dim < dimension ; dim++ )
+      apply ( a , bcv[dim] , left_corner[dim] , right_corner[dim] , dim ) ;
+  }
+
+/// apply the bracing to the array, performing the required copy/arithmetic operations
+/// to the 'frame' around the core. This routine performs the operation along axis dim.
+/// Here, the size of the brace is derived from the spline degree. This is a convenience
+/// variant which saves you the explicit calls to left_brace_size and right_brace_size.
+
+  void operator() ( view_type& a ,           ///< target array, containing the core and (empty) frame
+                    bc_code bc ,         ///< boundary condition code
+                    int spline_degree ,  ///< degree of the spline
+                    int dim )            ///< axis along which to brace
+  {
+    // calculate brace sizes
+    int lsz = left_brace_size ( spline_degree , bc ) ;
+    int rsz = right_brace_size ( spline_degree , bc ) ;
+
+    // delegate to apply()
+    apply ( a , bc , lsz , rsz , dim ) ;
+  }
+  
+/// This variant braces along all axes, deriving brace sizes from the spline's degree
+
+  void operator() ( view_type& a ,          ///< target array, containing the core and (empty) frame
+                    bc_code bc ,        ///< boundary condition codes
+                    int spline_degree ) ///< degree of the spline
+  {
+    for ( int dim = 0 ; dim < dimension ; dim++ )
+      (*this) ( a , bc , spline_degree , dim ) ;
+  }
+} ;
+
+
+} ; // end of namespace vspline
+
+#endif // VSPLINE_BRACE_H
diff --git a/bspline.h b/bspline.h
new file mode 100644
index 0000000..bf29b80
--- /dev/null
+++ b/bspline.h
@@ -0,0 +1,850 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file bspline.h
+
+    \brief defines class bspline
+
+  class bspline is the most convenient access to vspline's functionality.
+  It attempts to do 'the right thing' by automatically creating suitable helper
+  objects and parametrization so that the spline does what it's supposed to do.
+  Most users will not need anything else, and using class bspline is quite
+  straightforward. It's quite possible to have a b-spline up and running with
+  a few lines of code without even having to make choices concerning it's
+  parametrization, since there are sensible defaults for everything. At the same
+  time, pretty much everything *can* be parametrized even at this level.
+  bspline objects can be used without any knowledge of their internals,
+  e.g. as parameters to the remap functions.
+  
+  While using 'raw' coefficient arrays with an evaluation scheme which applies
+  boundary conditions is feasible and most memory-efficient, it's not so well
+  suited for very fast evaluation, since the boundary treatment needs conditionals,
+  and 'breaks' uniform access, which is especially detrimental when using
+  vectorization. So vspline uses coefficient arrays with a few extra coefficients
+  'framing' or 'bracing' the 'core' coefficients. Since evaluation of the spline
+  looks at a certain small section of coefficients (the evaluator's 'support'),
+  the bracing is chosen so that this lookup will always succeed without having to
+  consider boundary conditions: the brace is set up to make the boundary conditions
+  explicit, and the evaluation can proceed blindly without bounds checking. With
+  large coefficient arrays, the little extra space needed for the brace becomes
+  negligible, but the code for evaluation becomes much faster and simpler.
+
+  So class bspline handles several views to the coefficients it operates on, these
+  are realized as vigra::MultiArrayViews, and they all share the same storage:
+
+  - the 'core', which is a view to an array of data precisely the same shape as
+    the knot point data over which the spline is calculated.
+
+  - 'coeffs', which is a view to the core, plus 'bracing' needed to evaluate
+    the spline with vspline's evaluation code. 'coeffs' contains 'core'.
+    I refer to this view as the 'braced coefficients' as well.
+
+  - 'container', which contains the two views above plus an additional frame
+    of coefficients used for the 'explicit' scheme of extrapolation before
+    prefiltering, or as extra 'headroom' if 'shifting' the spline is intended. 
+
+  Using class bspline, there is a choice of 'strategy'. The simplest strategy is
+  'UNBRACED'. With this strategy, after putting the knot point data into the bspline's
+  'core' area and calling prefilter(), the core area will contain the b-spline
+  coefficients. The resulting b-spline object can't be evaluated with the code in eval.h.
+  this mode of operation is intended for users who want to do their own processing of the
+  coefficients and don't need the code in eval.h. prefiltering is done using an implicit
+  scheme as far as the boundary conditions are concerned.
+  
+  The 'standard' strategy is 'BRACED'. Here, after prefiltering, the view 'coeffs'
+  in the bspline object will contain the b-spline coefficients, surrounded by a 'brace'
+  of coefficients which allows code in eval.h to process them without special treatment
+  for values near the border (the brace covers what support is needed by marginal
+  coefficients). Again, an implicit scheme is used.
+  
+  The third strategy, 'EXPLICIT', extrapolates the knot point data in the 'core' area
+  sufficiently far to suppress margin effects when the prefiltering is performed without
+  initial coefficient calculation. If the 'frame' of extrapolated data is large enough,
+  the result is just the same. The inner part of the frame is taken as the brace, so no
+  bracing needs to be performed explicitly. The resulting b-spline object will work with
+  vspline's evaluation code. Note that the explicit scheme uses 'GUESS' boundary conditions
+  on the (framed) array, which tries to minimize margin effects further.
+
+  Also note that the additional memory needed for the 'frame' will be held throughout the bspline
+  object's life, the only way to 'shrink' the coefficient array to the size of the braced or core
+  coefficients is by copying them out to a smaller array.
+
+  The fourth strategy, 'MANUAL', is identical to 'EXPLICIT', except that automatic extrapolation
+  of the core data to the frame is not performed. Instead, this strategy relies on the user to
+  fill the frame with extrapolated data. This is to allow for the user to apply custom
+  extrapolation schemes. The procedure would be to create the bspline object, fill the core,
+  apply the extrapolation, then call prefilter.
+
+  Probably the most common scenario is that the source data for the spline are available from
+  someplace like a file. Instead of reading the file's contents into memory first and passing
+  the memory to class bspline, there is a more efficient way: a bspline object is set up
+  first, with the specification of the size of the incoming data and the intended mode of
+  operation. The bspline object allocates the memory it will need for the purpose, but
+  doesn't do anything else. The 'empty' bspline object is then 'filled' by the user
+  by putting data into it's 'core' area. Subsequently, prefilter() is called, which converts
+  the data to b-spline coefficients. This way, only one block of memory is used throughout,
+  the initial data are overwritten by the coefficients, operation is in-place and most efficient.
+
+  If this pattern can't be followed, there are alternatives:
+
+  - if a view to an array at least the size of the container array is passed into bspline's
+    constructor, this view is 'adopted' and all operations will use the data it refers to.
+    The caller is responsible for keeping these data alive while the bspline object exists,
+    and relinquishes control over the data, which may be changed by the bspline object.
+    Note that there is a convenience method, 'container_size', which can calculate the
+    shape of a container suitable for the purpose.
+
+  - if data are passed to prefilter(), they will be taken as containing the knot point data,
+    rather than expecting the knot point data to be in the bspline oject's memory already.
+    This can also be used to reuse a bspline object with new data. The data passed in will
+    not be modified. This is most efficient when using an implicit scheme; when used together
+    with EXPLICIT, the data are (automatically) copied into the core area before prefiltering,
+    which is unnecessary with the implicit schemes - they can 'pull in' data in the course
+    of their operation.
+
+  While there is no explicit code to create a 'smoothing spline' - a b-spline evaluating
+  the source data without prefiltering them - this can be achieved simply by creating a b-spline
+  object with spline degree 0 and 'shifting' it to the desired degree for evaluation. Note that
+  you'll need the EXPLICIT strategy for the purpose, or specify extra 'headroom', because
+  otherwise the spline won't have enough 'headroom' for shifting.
+  
+  If stronger smoothing is needed, this can be achieved with the code in filter.h, passing in
+  appropriate pole values. a single-pole filter with a positive pole in ] 0 , 1 [ will do the
+  trick - the larger the pole, the stronger the smoothing. Note that smoothing with large pole
+  values will need a large 'horizon' as well to handle the margins properly.
+
+  With shifting, you can also create a 'poor man's pyramid'. While using no additional storage,
+  you can extract smoothed data from the spline by shifting it up. This only goes so far, though,
+  because even a degree-20 b-spline reconstruction kernel's equivalent gaussian doesn't have a very
+  large standard deviation, and evaluation times become very long. From the gaussian approximation
+  of the b-spline basis function, it can be seen that the equivalent gaussian's standard deviation is
+  ( degree + 1 ) / 12.0, so a quintic spline will have a standard deviation of 0.5 only.
+*/
+
+#ifndef VSPLINE_BSPLINE_H
+#define VSPLINE_BSPLINE_H
+
+#include "prefilter.h"
+#include "brace.h"
+
+namespace vspline {
+
+/// struct bspline is a convenience class which bundles a coefficient array (and it's creation)
+/// with a set of metadata describing the parameters used to create the coefficients and the
+/// resulting data. I have chosen to implement class bspline so that there is only a minimal
+/// set of template arguments, namely the spline's data type (like pixels etc.) and it's dimension.
+/// All other parameters relevant to the spline's creation are passed in at construction time.
+/// This way, if explicit specialization becomes necessary (like, to interface to code which
+/// can't use templates) the number of specializations remains manageable. This design decision
+/// pertains specifically to the spline's degree, which can also be implemented as a template
+/// argument, allowing for some optimization by making some members static. Yet going down this
+/// path requires explicit specialization for every spline degree used and the performance gain
+/// I found doing so was hardly measurable, while automatic testing became difficult and compilation
+/// times grew.
+///
+/// I chose making bspline a struct for now, but messing with the data inside is probably
+/// not a good idea...
+
+template < class value_type , int _dimension >
+struct bspline
+{
+  /// pull the template arg into an enum
+  enum { dimension = _dimension } ;
+  /// if the coefficients are owned, this array holds the data
+  typedef vigra::MultiArray < dimension , value_type > array_type ;
+  /// data are read and written to vigra MultiArrayViews
+  typedef vigra::MultiArrayView < dimension , value_type > view_type ;
+  /// multidimensional index type
+  typedef typename view_type::difference_type shape_type ;
+  /// nD type for one boundary condition per axis
+  typedef typename vigra::TinyVector < bc_code , dimension > bcv_type ;
+
+  /// elementary type of value_type, like float or double
+  typedef typename ExpandElementResult < value_type >::type real_type ;
+  enum { channels = ExpandElementResult < value_type >::size } ;
+
+  typedef bspline < real_type , dimension > channel_view_type ;
+  
+  // for internal calculations in the filter, we use the elementary type of value_type.
+  // Note how in class bspline we make very specific choices about the
+  // source data type, the target data type and the type used for arithmetics: we use
+  // the same value_type for source and target array and perform the arithmetics with
+  // this value_type's elementary type. The filter itself is much more flexible; all of
+  // the three types can be different, the only requirements are that the value_types
+  // must be vigra element-expandable types with an elementary type that can be cast
+  // to and from math_type, and math_type must be a real data type, with the additional
+  // requirement that it can be vectorized by VC if Vc is used.
+
+  typedef real_type math_type ; // arbitrary, can use float or double here.
+
+private:
+
+  array_type _coeffs ;
+  prefilter_strategy strategy ;
+
+public:
+
+  const bcv_type bcv ;      ///< coundary conditions, see common.h
+  
+  view_type container ;     ///< view to container array
+  view_type coeffs ;        ///< view to the braced coefficient array
+  view_type core ;          ///< view to the core part of the coefficient array
+  int spline_degree ;       ///< degree of the spline (3 == cubic spline)
+  double tolerance ;        ///< acceptable error
+  double smoothing ;        ///< E ] 0 : 1 [; apply smoothing to data before prefiltering
+  bool braced ;             ///< whether coefficient array is 'braced' or not
+  int horizon ;             ///< additional frame size for explicit scheme
+  shape_type left_brace ;   ///< width(s) of the left handside bracing
+  shape_type right_brace ;  ///< width(s) of the right handside bracing
+  shape_type left_frame ;   ///< width(s) of the left handside bracing
+  shape_type right_frame ;  ///< width(s) of the right handside bracing
+  shape_type container_shape ;   ///< shape of the container array
+  shape_type core_shape ;   ///< shape of the core coefficient array
+  shape_type braced_shape ; ///< shape of the coefficient array + bracing
+
+  /// lower_limit returns the lower bound of the spline's defined range.
+  /// This is usually 0.0, but with REFLECT boundary condition it's -0.5,
+  /// the lower point of reflection. The lowest coordinate at which the
+  /// spline can be accessed may be lower: even splines have wider support,
+  /// and splines with extra headroom add even more room to manoevre.
+
+  double lower_limit ( const int & axis )
+  {
+    double limit = 0.0 ;
+    
+    if ( bcv [ axis ] == vspline::REFLECT )
+      limit = -0.5 ;
+    
+    return limit ;
+  }
+  
+  /// upper_limit returns the upper bound of the spline's defined range.
+  /// This is normally M - 1 if the shape for this axis is M. Splines with
+  /// REFLECT boundary condition use M - 0.5, the upper point of reflection,
+  /// and periodic splines use M. The highest coordinate at which the spline
+  /// may be accessed safely may be higher.
+  
+  double upper_limit ( const int & axis )
+  {
+    double limit = core.shape [ axis ] - 1 ;
+    
+    if ( bcv [ axis ] == vspline::REFLECT )
+      limit += 0.5 ;
+    else if ( bcv [ axis ] == vspline::PERIODIC )
+      limit += 1.0 ;
+    
+    return limit ;
+  }
+  
+  /// setup_metrics determines the sizes of the three views and any braces/frames
+  /// needed with the given parameters
+
+  void setup_metrics ( int headroom = 0 )
+  {
+    switch ( strategy )
+    {
+      case UNBRACED:
+        // UNBRACED is simple: all internal views are the same. prefiltering will
+        // be done using an implicit scheme.
+        container_shape = braced_shape = core_shape ;
+        left_brace = right_brace = left_frame = right_frame = shape_type() ;
+        braced = false ;
+        return ;
+      case BRACED:
+        // again an implicit prefiltering scheme will be used, but here we add
+        // a 'brace' to the core data, which makes the resulting bspline object
+        // suitable to work with vspline's evaluation code. The container array's
+        // size is the same as the braced core's size, unless additional headroom
+        // was requested.
+        braced_shape = bracer<view_type>::target_shape ( core_shape , bcv , spline_degree ) ;
+        left_brace = bracer<view_type>::left_corner ( bcv , spline_degree ) ;
+        right_brace = bracer<view_type>::right_corner ( bcv , spline_degree ) ;
+        left_frame = left_brace + shape_type(headroom) ;
+        right_frame = right_brace + shape_type(headroom) ;
+        break ;
+      case EXPLICIT:
+        // here we prepare for an explicit extrapolation. This requires additional
+        // space, namely the 'frame', around the core data, into which the extrapolated
+        // data are put before prefiltering the lot. This frame is applied in excess of
+        // the bracing, to make sure all coefficients inside the brace meet the precision
+        // requirements expressed by the choice of 'horizon'. If additional headroom
+        // is requested, this comes yet on top.
+        braced_shape = bracer<view_type>::target_shape ( core_shape , bcv , spline_degree ) ;
+        left_brace = bracer<view_type>::left_corner ( bcv , spline_degree ) ;
+        right_brace = bracer<view_type>::right_corner ( bcv , spline_degree ) ;
+        left_frame = left_brace + shape_type(horizon + headroom) ;
+        right_frame = right_brace + shape_type(horizon + headroom) ;
+        break ;
+      case MANUAL:
+        // nothing to do here, it's up to the user
+        break ;
+    }
+    braced = true ;
+    
+    // for odd splines with REFLECT boundary conditions we increase the size of
+    // the left frame, so that access to coordinates in [-0.5:0] will not result
+    // in accessing memory outside the coefficient array. Access to [M-1:M-0.5]
+    // is safe with the right brace size the bracer has returned.
+
+    if ( spline_degree & 1 )
+    {
+      for ( int d = 0 ; d < dimension ; d++ )
+      {
+        if ( bcv[d] == REFLECT || bcv[d] == SPHERICAL )
+          left_frame[d]++ ;
+      }
+    }
+
+    container_shape = core_shape + left_frame + right_frame ;
+  }
+
+  /// this method calculates the size of container needed by a bspline object with
+  /// the given parameters. This is a helper routine for use cases where the memory for
+  /// the bspline object is allocated externally and passed into the bspline object.
+
+  static shape_type container_size ( shape_type core_shape ,  ///< shape of knot point data
+            int spline_degree = 3 ,                ///< spline degree with reasonable default
+            bcv_type bcv = bcv_type ( MIRROR ) ,   ///< boundary conditions and common default
+            prefilter_strategy strategy = BRACED , ///< default strategy is the 'implicit' scheme
+            int horizon = sizeof(real_type) * 3 )  ///< width of frame for explicit scheme (heuristic)
+  {
+    switch ( strategy )
+    {
+      case UNBRACED:
+      {
+        return core_shape ;
+        break ;
+      }
+      case BRACED:
+      {
+        auto ts = bracer<view_type>::target_shape
+          ( core_shape , bcv , spline_degree ) ;
+
+        if ( spline_degree & 1 )
+        {
+          for ( int d = 0 ; d < dimension ; d++ )
+          {
+            if ( bcv[d] == REFLECT || bcv[d] == SPHERICAL )
+              ts[d]++ ;
+          }
+        }
+        return ts ;
+        break ;
+      }
+      case EXPLICIT:
+      {
+        auto ts = bracer<view_type>::target_shape
+          ( core_shape , bcv , spline_degree ) ;
+
+        if ( spline_degree & 1 )
+        {
+          for ( int d = 0 ; d < dimension ; d++ )
+          {
+            if ( bcv[d] == REFLECT || bcv[d] == SPHERICAL )
+              ts[d]++ ;
+          }
+        }
+        ts += 2 * horizon ;
+        return ts ;
+        break ;
+      }
+      case MANUAL:
+        // nothing to do here, it's up to the user
+        break ;
+    }
+    return core_shape ;
+  }
+
+  /// construct a bspline object with appropriate storage space to contain and process an array
+  /// of knot point data with shape core_shape. Depending on the strategy chosen and the other
+  /// parameters passed, more space than core_shape may be allocated. Once the bspline object
+  /// is ready, usually it is filled with the knot point data and then the prefiltering needs
+  /// to be done. This sequence assures that the knot point data are present in memory only once,
+  /// the prefiltering is done in-place. So the user can create the bspline, fill in data (like,
+  /// from a file), prefilter, and then evaluate.
+  ///
+  /// alternatively, if the knot point data are already manifest elsewhere, they can be passed
+  /// to prefilter(). With this mode of operation, they are 'pulled in' during prefiltering.
+  ///
+  /// It's possible to pass in a view to an array providing space for the coefficients,
+  /// or even the coefficients themselves. This is done via the parameter _space. This has
+  /// to be an array of the same or larger shape than the container array would end up having
+  /// given all the other parameters. This view is then 'adopted' and subsequent processing
+  /// will operate on it's data. container_size can be used to get the precise shape of the memory
+  /// needed with the given parameters.
+  ///
+  /// with the EXPLICIT scheme, the horizon is set by default to a value which is
+  /// deemed to be 'sufficiently large' to keep the error 'low enough'. the expression
+  /// used here produces a frame which is roughly the size needed to make any margin
+  /// effects vanish by the time the prefilter hits the core, but it's a bit 'rule of thumb'.
+  ///
+  /// The additional parameter 'headroom' is used to make the 'frame' even wider. This is
+  /// needed if the spline is to be 'shifted' up (evaluated as if it had been prefiltered
+  /// with a higher-degree prefilter) - see shift().
+  ///
+  /// While bspline objects allow very specific parametrization, most use cases won't use
+  /// parameters beyond the first few. The only mandatory parameter is, obviously, the
+  /// shape of the knot point data, the original data which the spline is built over.
+  /// This shape 'returns' as the bspline object's 'core' shape. If this is the only
+  /// parameter passed to the constructor, the resulting bspline object will be a
+  /// cubic b-spline with mirror boundary conditions, generated with an implicit
+  /// extrapolation scheme to a 'good' quality, no smoothing, and allocating it's own
+  /// storage for the coefficients, and the resuling bspline object will be suitable for
+  /// use with vspline's evaluation code.
+  
+  // TODO: when bracing/framing is applied, we might widen the array size to a
+  // multiple of the Vc:Vector's Size for the given data type to have better-aligned
+  // access. This may or may not help, has to be tested. We might also want to position
+  // the origin of the brace to an aligned position, since evaluation is based there.
+  
+  // TODO: while the coice to keep the value_types and math_type closely related makes
+  // for simple code, with the more flexible formulation of the prefiltering code we might
+  // widen class bspline's scope to accept input of other types and/or use a different
+  // math_type.
+
+  bspline ( shape_type _core_shape ,                ///< shape of knot point data
+            int _spline_degree = 3 ,                ///< spline degree with reasonable default
+            bcv_type _bcv = bcv_type ( MIRROR ) ,   ///< boundary conditions and common default
+            prefilter_strategy _strategy = BRACED , ///< default strategy is the 'implicit' scheme
+            int _horizon = -1 ,                     ///< width of frame for explicit scheme
+            double _tolerance = -1.0 ,              ///< acceptable error (relative to unit pulse)
+            double _smoothing = 0.0 ,               ///< apply smoothing to data before prefiltering
+            int headroom = 0 ,                      ///< additional headroom, for 'shifting'
+            view_type _space = view_type()          ///< coefficient storage to 'adopt'
+          )
+  : core_shape ( _core_shape ) ,
+    spline_degree ( _spline_degree ) ,
+    bcv ( _bcv ) ,
+    smoothing ( _smoothing ) ,
+    strategy ( _strategy )
+  {
+    if ( _tolerance < 0.0 )
+    {
+      // heuristic: 'reasonable' defaults
+      if ( std::is_same < real_type , float > :: value )
+        tolerance = .000001 ;
+      else if ( std::is_same < real_type , double > :: value )
+        tolerance = .0000000000001 ;
+      else
+        tolerance = 0.0000000000000000001 ;
+    }
+
+    // heuristic: horizon for reasonable precision - we assume that no one in their right
+    // minds would want a negative horizon ;)
+
+    real_type max_pole = .00000000000000000001 ;
+    if ( spline_degree > 1 )
+      max_pole = fabs ( vspline_constants::precomputed_poles [ spline_degree ] [ 0 ] ) ;
+    if ( smoothing > max_pole )
+      max_pole = smoothing ;
+
+    if ( _horizon < 0 )
+      horizon = ceil ( log ( tolerance ) / log ( max_pole ) ) ; // TODO what if tolerance == 0.0?
+    else
+      horizon = _horizon ; // whatever the user specifies
+
+    // first, calculate all the various shapes and sizes used internally
+    setup_metrics ( headroom ) ;
+//     std::cout << "container shape: " << container_shape << std::endl ;
+
+    // now either adopt external memory or allocate memory for the coefficients
+    if ( _space.hasData() )
+    {
+      // caller has provided space for the coefficient array. This space has to
+      // be at least as large as the container_shape we have determined
+      // to make sure it's compatible with the other parameters
+      if ( ! ( allGreaterEqual ( _space.shape() , container_shape ) ) )
+        throw shape_mismatch ( "the intended container shape does not fit into the shape of the storage space passed in" ) ;
+      // if the shape matches, we adopt the data in _space;
+      // since 'container' was default-constructed, assignment results in a view
+      // to the data in _space, not in copying the data. This means that if the data
+      // _space refers to change or are deallocated, the bspline will become invalid
+      // as well. We take a view to the container_shape-sized subarray only.
+      // _coeffs remains uninitialized.
+      container = _space.subarray ( shape_type() , container_shape ) ;
+    }
+    else
+    {
+      // _space was default-constructed and has no data.
+      // in this case we allocate a container array
+      array_type help ( container_shape ) ;
+      // and swap with the empty default-constructed array _coeffs
+      // so that the memory is automatically deallocated when the bspline
+      // object is destroyed
+      _coeffs.swap ( help ) ;
+      container = _coeffs ;
+    }
+    // finally we set the views to the braced core area and the core area
+    coeffs = container.subarray ( left_frame - left_brace ,
+                                  left_frame + core_shape + right_brace ) ;
+    core = coeffs.subarray ( left_brace , left_brace + core_shape ) ;
+  } ;
+
+  /// copy constructor. This will result in a view to the same data and is therefore
+  /// lightweight. But the data viewed by this spline must remain accessible.
+  
+  // TODO this code should handle '_coeffs' via a shared pointer or such,
+  // so that the 'last one out' destroys the MultiArray, if the bspline object
+  // initially owns the data.
+
+  bspline ( const bspline& other )
+  : strategy ( other.strategy ) ,
+    container ( other.container ) ,
+    coeffs ( other.coeffs ) ,
+    core ( other.core ) ,
+    spline_degree ( other.spline_degree ) ,
+    bcv ( other.bcv ) ,
+    tolerance ( other.tolerance ) ,
+    smoothing ( other.smoothing ) ,
+    braced ( other.braced ) ,
+    horizon ( other.horizon ) ,
+    left_brace ( other.left_brace ) ,
+    right_brace ( other.right_brace ) ,
+    left_frame ( other.left_frame ) ,
+    right_frame ( other.right_frame ) ,
+    container_shape ( other.container_shape ) ,
+    core_shape ( other.core_shape ) ,
+    braced_shape ( other.braced_shape )
+  { } ;
+  
+  bspline operator= ( const bspline& other )
+  {
+    return bspline ( *this ) ;
+  }
+  
+  /// get a bspline object for a single channel of the data. This is also lightweight
+  /// and requires the viewed data to remain present as long as the channel view is used.
+  /// the channel view inherits all metrics from it's parent, only the MultiArrayViews
+  /// to the data are different.
+  
+  channel_view_type get_channel_view ( const int & channel )
+  {
+    assert ( channel < channels ) ;
+    
+    real_type * base = (real_type*) ( container.data() ) ;
+    base += channel ;
+    auto stride = container.stride() ;
+    stride *= channels ;
+    
+    MultiArrayView < dimension , real_type >
+      channel_container ( container.shape() , stride , base ) ;
+
+    return channel_view_type ( core_shape , 
+                               spline_degree ,
+                               bcv ,
+                               strategy ,
+                               horizon ,
+                               tolerance ,
+                               smoothing ,
+                               0 ,
+                               channel_container // coefficient storage to 'adopt'
+                             ) ;
+  } ;
+
+  /// prefilter converts the knot point data in the 'core' area into b-spline
+  /// coefficients. Depending on the strategy chosen in the b-spline object's
+  /// constructor, bracing/framing may be applied. Even if the degree of the
+  /// spline is zero or one, prefilter() should be called because it also
+  /// performs the bracing, if any, which may still be needed if the spline
+  /// is 'shifted' - unless the stratgey is UNBRACED, of course.
+  ///
+  /// If data are passed in, they have to have precisely the shape
+  /// we have set up in core (_core_shape passed into the constructor).
+  /// These data will then be used in place of any data present in the
+  /// bspline object to calculate the coefficients. They won't be looked at
+  /// after prefilter() terminates, so it's safe to pass in some MultiArrayView
+  /// which is destroyed after the call to prefilter().
+
+  void prefilter ( view_type data = view_type() ) ///< view to knot point data to use instead of 'core'
+  {
+    if ( data.hasData() )
+    {
+      // if the user has passed in data, they have to have precisely the shape
+      // we have set up in core (_core_shape passed into the constructor).
+      // This can have surprising effects if the container array isn't owned by the
+      // spline but constitutes a view to data kept elsewhere (by passing _space to the
+      // constructor).
+      if ( data.shape() != core_shape )
+        throw shape_mismatch
+         ( "when passing data to prefilter, they have to have precisely the core's shape" ) ;
+      if ( strategy == EXPLICIT )
+      {
+        // the explicit scheme requires the data and frame to be together in the
+        // containing array, so we have to copy the data into the core.
+        core = data ;
+      }
+      // the other strategies can move the data from 'data' into the spline's memory
+      // during coefficient generation, so we needn't copy them in first.
+    }
+    else
+    {
+      // otherwise, we assume data are already in 'core' and we operate in-place
+      // note, again, the semantics of the assignment here: since 'data' has no data,
+      // the assignment results in 'adopting' the data in core rather than copying them
+      data = core ;
+    }
+
+    // per default the output will be braced. This does require the output
+    // array to be sufficiently larger than the input; class bracer has code
+    // to provide the right sizes
+
+    bracer<view_type> br ;
+
+    // for the explicit scheme, we use boundary condition 'guess' which tries to
+    // provide a good guess for the initial coefficients with a small computational
+    // cost. using zero-padding instead introduces a sharp discontinuity at the
+    // margins, which we want to avoid.
+
+    bcv_type explicit_bcv ( GUESS ) ;
+
+    switch ( strategy )
+    {
+      case UNBRACED:
+        // only call the solver, don't do any bracing. If necessary, bracing can be
+        // aplied later by a call to brace() - provided the bspline object has space
+        // for the brace.
+        solve < view_type , view_type , math_type >
+              ( data ,
+                core ,
+                bcv ,
+                spline_degree ,
+                tolerance ,
+                smoothing
+              ) ;
+        break ;
+      case BRACED:
+        // solve first, passing in BC codes to pick out the appropriate functions to
+        // calculate the initial causal and anticausal coefficient, then brace result.
+        // note how, just as in brace(), the whole frame is filled, which may be more
+        // than is strictly needed by the evaluator.
+        solve < view_type , view_type , math_type >
+              ( data ,
+                core ,
+                bcv ,
+                spline_degree ,
+                tolerance ,
+                smoothing
+              ) ;
+        // using the more general code here now, since the frame may be larger
+        // than strictly necessary for the given spline degree due to a request
+        // for additional headroom
+        for ( int d = 0 ; d < dimension ; d++ )
+          br.apply ( container , bcv[d] , left_frame[d] , right_frame[d] , d ) ;
+        break ;
+      case EXPLICIT:
+        // first fill frame using BC codes passed in, then solve with BC code GUESS
+        // this automatically fills the brace as well, since it's part of the frame.
+        // TODO: the values in the frame will not come out precisely the same as they
+        // would by filling the brace after the coefficients have been calculated.
+        // The difference will be larger towards the margin of the frame, and we assume
+        // that due to the small support of the evaluation the differences near the
+        // margin of the core data will be negligible, having picked a sufficiently
+        // large frame size. This is debatable. If it's a problem, a call to brace()
+        // after prefilter() will brace again, now with coefficients from the core.
+        for ( int d = 0 ; d < dimension ; d++ )
+          br.apply ( container , bcv[d] , left_frame[d] , right_frame[d] , d ) ;
+        solve < view_type , view_type , math_type >
+              ( container ,
+                container ,
+                explicit_bcv ,
+                spline_degree ,
+                tolerance ,
+                smoothing
+              ) ;
+        break ;
+      case MANUAL:
+        // like EXPLICIT, but don't apply a frame, assume a frame was applied
+        // by external code. process whole container with GUESS BC. For cases
+        // where the frame can't be constructed by applying any of the stock bracing
+        // modes. Note that if any data were passed into this routine, in this case
+        // they will be silently ignored (makes no sense overwriting the core after
+        // having manually framed it in some way)
+        solve < view_type , view_type , math_type >
+              ( container ,
+                container ,
+                explicit_bcv ,
+                spline_degree ,
+                tolerance ,
+                smoothing
+              ) ;
+        break ;
+    }
+  }
+
+  /// if the spline coefficients are already known, they obviously don't need to be
+  /// prefiltered. But in order to be used by vspline's evaluation code, they need to
+  /// be 'braced' - the 'core' coefficients have to be surrounded by more coeffcients
+  /// covering the support the evaluator needs to operate without bounds checking
+  /// inside the spline's defined range. brace() performs this operation. brace()
+  /// assumes the bspline object has been set up with the desired initial parameters,
+  /// so that the boundary conditions and metrics are already known and storage is
+  /// available. If brace() is called with an empty view (or without parameters),
+  /// it assumes the coefficients are in the spline's core already and simply
+  /// fills in the 'empty' space around them. If data are passed to brace(), they
+  /// have to be the same size as the spline's core and are copied into the core
+  /// before the bracing is applied.
+
+  void brace ( view_type data = view_type() ) ///< view to knot point data to use instead of 'core'
+  {
+    if ( data.hasData() )
+    {
+      // if the user has passed in data, they have to have precisely the shape
+      // we have set up in core
+
+      if ( data.shape() != core_shape )
+        throw shape_mismatch
+         ( "when passing data to prefilter, they have to have precisely the core's shape" ) ;
+      
+      // we copy the data into the core
+      core = data ;
+    }
+
+    // we use class bracer to do the work, creating the brace for all axes in turn
+
+    bracer<view_type> br ;
+
+    for ( int d = 0 ; d < dimension ; d++ )
+      br.apply ( container , bcv[d] , left_frame[d] , right_frame[d] , d ) ;
+  }
+
+  /// overloaded constructor for 1D splines. This is useful because if we don't
+  /// provide it, the caller would have to pass TinyVector < T , 1 > instead of T
+  /// for the shape and the boundary condition.
+  
+  bspline ( long _core_shape ,                      ///< shape of knot point data
+            int _spline_degree = 3 ,                ///< spline degree with reasonable default
+            bc_code _bc = MIRROR ,                  ///< boundary conditions and common default
+            prefilter_strategy _strategy = BRACED , ///< default strategy is the 'implicit' scheme
+            int _horizon = -1 ,                     ///< width of frame for explicit scheme
+            double _tolerance = -1.0 ,              ///< acceptable error (relative to unit pulse)
+            double _smoothing = 0.0 ,               ///< apply smoothing to data before prefiltering
+            int headroom = 0 ,                      ///< additional headroom, for 'shifting'
+            view_type _space = view_type()          ///< coefficient storage to 'adopt'
+          )
+  :bspline ( TinyVector < long , 1 > ( _core_shape ) ,
+             _spline_degree ,
+             bcv_type ( _bc ) ,
+             _strategy ,
+             _horizon ,
+             _tolerance ,
+             _smoothing ,
+             headroom ,
+             _space
+           )
+  {
+    static_assert ( _dimension == 1 , "bspline: 1D constructor only usable for 1D splines" ) ;
+  } ;
+
+  /// shift will change the interpretation of the data in a bspline object.
+  /// d is taken as a difference to add to the current spline degree. The coefficients
+  /// remain the same, but creating an evaluator from the shifted spline will make
+  /// the evaluator produce data *as if* the coefficients were those of a spline
+  /// of the changed order. Shifting with positive d will efectively blur the
+  /// interpolated signal, shifting with negative d will sharpen it.
+  /// For shifting to work, the spline has to have enough 'headroom', meaning that
+  /// spline_degree + d, the new spline degree, has to be greater or equal to 0
+  /// and smaller than the largest supported spline degree (lower twenties) -
+  /// and, additionally, there has to bee a wide-enough brace to allow evaluation
+  /// with the wider kernel of the higher-degree spline's reconstruction filter.
+  /// So if a spline is set up with degree 0 and shifted to degree 5, it has to be
+  /// constructed with an additional headroom of 3 (see the constructor).
+  
+  // TODO consider moving the concept of shifting to class evaluator
+
+  int shift ( int d )
+  {
+    int new_degree = spline_degree + d ;
+    if ( new_degree < 0 || new_degree > 24 )
+      return 0 ;
+
+    bracer<view_type> br ;
+    shape_type new_left_brace = br.left_corner ( bcv , new_degree ) ;
+    shape_type new_right_brace = br.right_corner ( bcv , new_degree ) ;
+    if (    allLessEqual ( new_left_brace , left_frame )
+             && allLessEqual ( new_right_brace , right_frame ) )
+    {
+      // perform the shift
+      spline_degree = new_degree ;
+      left_brace = new_left_brace ;
+      right_brace = new_right_brace ;
+      braced_shape = core_shape + left_brace + right_brace ;
+
+      shape_type coefs_offset = left_frame - new_left_brace ;
+      coeffs.reset() ;
+      coeffs = container.subarray ( coefs_offset , coefs_offset + braced_shape ) ;
+    }
+    else
+    {
+      // can't shift
+      std::cout << "can't shift" << std::endl ;
+      d = 0 ;
+    }
+
+    return d ;
+  }
+
+  /// helper function to << a bspline object to an ostream
+
+  friend ostream& operator<< ( ostream& osr , const bspline& bsp )
+  {
+    osr << "dimension:................... " << bsp.dimension << endl ;
+    osr << "degree:...................... " << bsp.spline_degree << endl ;
+    osr << "boundary conditions:......... " ;
+    for ( auto bc : bsp.bcv )
+      osr << " " << bc_name [ bc ] ;
+    osr << endl ;
+    osr << endl ;
+    osr << "shape of container array:.... " << bsp.container.shape() << endl ;
+    osr << "shape of braced coefficients: " << bsp.coeffs.shape() << endl ;
+    osr << "shape of core:............... " << bsp.core.shape() << endl ;
+    osr << "braced:...................... " << ( bsp.braced ? std::string ( "yes" ) : std::string ( "no" ) ) << endl ;
+    osr << "left brace:.................. " << bsp.left_brace << endl ;
+    osr << "right brace:................. " << bsp.right_brace << endl ;
+    osr << "left frame:.................. " << bsp.left_frame << endl ;
+    osr << "right frame:................. " << bsp.right_frame << endl ;
+    osr << ( bsp._coeffs.hasData() ? "bspline object owns data" : "data are owned externally" ) << endl ;
+    osr << "container base adress:....... " << bsp.container.data() << endl ;
+    return osr ;
+  }
+
+} ;
+
+} ; // end of namespace vspline
+
+#endif // VSPLINE_BSPLINE_H
diff --git a/common.h b/common.h
new file mode 100644
index 0000000..0f8b44c
--- /dev/null
+++ b/common.h
@@ -0,0 +1,317 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file common.h
+
+    \brief definitions common to all files in this project, utility code
+    
+    This file contains
+    
+    - a traits class fixing the simdized types used for vectorized code
+    
+    - exceptions used throughout vspline
+    
+    - constants and enums used throughout vspline
+  
+*/
+
+#ifndef VSPLINE_COMMON
+#define VSPLINE_COMMON
+
+#include <vigra/multi_array.hxx>
+
+#ifdef USE_VC
+
+#include <Vc/Vc>
+
+#endif
+
+namespace vspline {
+
+#ifdef USE_VC
+
+/// using definition for the 'elementary type' of a type via vigra's
+/// ExpandElementResult mechanism.
+
+template < class T >
+using ET = typename vigra::ExpandElementResult < T > :: type ;
+
+/// struct vector_traits is used throughout vspline to determine simdized
+/// data types. While inside the vspline code base, vsize, the number of
+/// elementary type members in a simdized type, is kept variable in form of
+/// the template argument _vsize, in code using vspline you'd expect to find
+/// a program-wide definition deriving the number of elements from the
+/// type commonly used for arithmetics, like
+///
+/// #define VSIZE (vspline::vector_traits<float>::size)
+///
+/// If other elementary types are used, their simdized types are fixed to use
+/// the same number of elements by using the program-wide number of elements:
+///
+/// typedef vspline::vector_traits<double,VSIZE>::ele_v double_v_type ;
+///
+/// typedef vigra::TinyVector < int , 3 > triplet_type ;
+///
+/// typedef vspline::vector_traits<triplet_type,VSIZE>::type triplet_v_type ;
+///
+/// using a default _vsize of twice the elementary type's Vc::Vector's size
+/// works best on my system, but may not be optimal elsewhere.
+
+template < class T ,
+           int _vsize = 2 * Vc::Vector < ET<T> > :: Size >
+struct vector_traits
+{
+  enum { size = _vsize } ;
+
+  enum { dimension = vigra::ExpandElementResult < T > :: size } ;
+
+  typedef ET<T> ele_type ;
+
+  typedef vigra::TinyVector < ele_type , dimension > nd_ele_type ;
+  
+  typedef Vc::SimdArray < ele_type , size > ele_v ;
+  
+  typedef vigra::TinyVector < ele_v , dimension > nd_ele_v ;
+  
+  typedef vigra::TinyVector < ele_v , dimension > type ;
+  
+//   typedef typename std::conditional
+//           < dimension == 1 ,
+//             ele_v ,          
+//             vigra::TinyVector < ele_v , dimension >
+//           > :: type type ;
+} ;
+
+#endif
+
+// TODO The exceptions need some work. My use of exceptions is a bit sketchy...
+
+/// for interfaces which need specific implementations we use:
+
+struct not_implemented
+: std::invalid_argument
+{
+  not_implemented ( const char * msg )
+  : std::invalid_argument ( msg ) { }  ;
+} ;
+
+/// dimension-mismatch is thrown if two arrays have different dimensions
+/// which should have the same dimensions.
+
+struct dimension_mismatch
+: std::invalid_argument
+{
+  dimension_mismatch ( const char * msg )
+  : std::invalid_argument ( msg ) { }  ;
+} ;
+
+/// shape mismatch is the exception which is thrown if the shapes of
+/// an input array and an output array do not match.
+
+struct shape_mismatch
+: std::invalid_argument
+{
+  shape_mismatch  ( const char * msg )
+  : std::invalid_argument ( msg ) { }  ;
+} ;
+
+/// exception which is thrown if an opertion is requested which vspline
+/// does not support
+
+struct not_supported
+: std::invalid_argument
+{
+  not_supported  ( const char * msg )
+  : std::invalid_argument ( msg ) { }  ;
+} ;
+
+/// out_of_bounds is thrown by mapping mode REJECT for out-of-bounds coordinates
+/// this exception is left without a message, it only has a very specific application,
+/// and there it may be thrown often, so we don't want anything slowing it down.
+
+struct out_of_bounds
+{
+} ;
+
+/// exception which is thrown when an assiging an rvalue which is larger than
+/// what the lvalue can hold
+
+struct numeric_overflow
+: std::invalid_argument
+{
+  numeric_overflow  ( const char * msg )
+  : std::invalid_argument ( msg ) { }  ;
+} ;
+
+/// This enumeration is used for codes connected to boundary conditions. There are
+/// two aspects to boundary conditions: During prefiltering, if the implicit scheme is used,
+/// the initial causal and anticausal coefficients have to be calculated in a way specific to
+/// the chosen boundary conditions. Bracing, both before prefiltering when using the explicit
+/// scheme, and after prefiltering when using the implicit scheme, also needs these codes to
+/// pick the appropriate extrapolation code to extend the knot point data/coefficients beyond
+/// the core array.
+
+typedef enum { 
+  MIRROR ,    ///< mirror on the bounds, so that f(-x) == f(x)
+  PERIODIC,   ///< periodic boundary conditions
+  REFLECT ,   ///< reflect, so  that f(-1) == f(0) (mirror between bounds)
+  NATURAL,    ///< natural boundary conditions, f(-x) + f(x) == 2 * f(0)
+  CONSTANT ,  ///< clamp. used for framing, with explicit prefilter scheme
+  ZEROPAD ,   ///< used for boundary condition, bracing
+  IDENTITY ,  ///< used as solver argument, mostly internal use
+  GUESS ,     ///< used with EXPLICIT scheme to keep margin errors low
+  SPHERICAL , ///< use for spherical panoramas, y axis
+} bc_code;
+
+/// This enumeration is used by the convenience class 'bspline' to determine the prefiltering
+/// scheme to be used.
+
+typedef enum { UNBRACED , ///< implicit scheme, no bracing applied
+               BRACED ,   ///< implicit scheme, bracing will be applied
+               EXPLICIT , ///< explicit scheme, frame with extrapolated signal, brace
+               MANUAL     ///< like explicit, but don't frame before filtering
+} prefilter_strategy  ;
+
+/// bc_name is for diagnostic output of bc codes
+
+const std::string bc_name[] =
+{
+  "MIRROR" ,
+  "PERIODIC",
+  "REFLECT" ,
+  "NATURAL",
+  "CONSTANT" ,
+  "ZEROPAD" ,
+  "IDENTITY" ,
+  "GUESS" ,
+  "SPHERICAL" ,
+} ;
+
+} ; // end of namespace vspline
+
+#ifdef USE_VC
+
+// by defining the relevant traits for Vc::Vectors and Vc::Simdarrays,
+// vspline is able to use vigra arithmetics with these types.
+// This is intended mainly to convert legacy code iterating over
+// TinyVectors of Vc SIMD types by code which directly applies
+// arithmetic operations to the TinyVectors themselves.
+// This is great. consider:
+//
+//  typedef Vc::SimdArray < float , 6 > simd_type ;
+//  typedef vigra::TinyVector < simd_type , 3 > nd_simd_type ;
+//  simd_type a { -1 , 1 , 2 , 3 , 4 , 5 } ;
+//  simd_type b { 2 , 3 , 4 , 5 , 1 , 2 } ;
+//  nd_simd_type aaa { a , a , a } ;
+//  nd_simd_type bbb = { b , b , b } ;
+//  auto ccc = aaa + sqrt ( bbb ) ;
+
+namespace vigra
+{
+  template < typename real_type , int N >
+  struct NumericTraits < Vc::SimdArray < real_type , N > >
+  {
+      typedef Vc::SimdArray < real_type , N > Type;
+      typedef Type Promote;
+      typedef Type UnsignedPromote;
+      typedef Type RealPromote;
+      typedef std::complex<RealPromote> ComplexPromote;
+      typedef Type ValueType;
+      
+      typedef VigraFalseType isIntegral;
+      typedef VigraFalseType isScalar;
+      typedef VigraFalseType isSigned;
+      typedef VigraFalseType isOrdered;
+      typedef VigraFalseType isComplex;
+      
+      static Type zero() { return Type::Zero() ; }
+      static Type one() { return Type::One() ; }
+      static Type nonZero() { return Type::One() ; }
+      
+      static Promote toPromote(Type v) { return v; }
+      static RealPromote toRealPromote(Type v) { return v; }
+      static Type fromPromote(Promote v) { return v; }
+      static Type fromRealPromote(RealPromote v) { return v; }
+  };
+  
+  template < typename real_type >
+  struct NumericTraits < Vc::Vector < real_type > >
+  {
+      typedef Vc::Vector < real_type > Type;
+      typedef Type Promote;
+      typedef Type UnsignedPromote;
+      typedef Type RealPromote;
+      typedef std::complex<RealPromote> ComplexPromote;
+      typedef Type ValueType;
+      
+      typedef VigraFalseType isIntegral;
+      typedef VigraFalseType isScalar;
+      typedef VigraFalseType isSigned;
+      typedef VigraFalseType isOrdered;
+      typedef VigraFalseType isComplex;
+      
+      static Type zero() { return Type::Zero() ; }
+      static Type one() { return Type::One() ; }
+      static Type nonZero() { return Type::One() ; }
+      
+      static Promote toPromote(Type v) { return v; }
+      static RealPromote toRealPromote(Type v) { return v; }
+      static Type fromPromote(Promote v) { return v; }
+      static Type fromRealPromote(RealPromote v) { return v; }
+  };
+  
+  // note that for now, we limit definition of PromoteTraits
+  // to homogeneous operations. I tried to define the promotion
+  // traits for float and double but failed, because I could not
+  // figure out what vigra means by typeToSize() used in
+  // metaprogramming.h
+
+  template < typename real_type , int N >
+  struct PromoteTraits < Vc::SimdArray < real_type , N > ,
+                         Vc::SimdArray < real_type , N > >
+  {
+      typedef typename Vc::SimdArray < real_type , N > Promote;
+      typedef typename Vc::SimdArray < real_type , N > toPromote;
+  };
+
+} ;
+
+#endif // USE_VC
+
+#endif // VSPLINE_COMMON
diff --git a/debian/changelog b/debian/changelog
new file mode 100644
index 0000000..c78bb37
--- /dev/null
+++ b/debian/changelog
@@ -0,0 +1,5 @@
+vspline (0.1.2-1) UNRELEASED; urgency=low
+
+  * intended initial upload to alioth
+  
+ -- Kay F. Jahnke <kfjahnke at gmail.com>  Wed, 28 Jun 2017 12:00:00 +0200
diff --git a/debian/compat b/debian/compat
new file mode 100644
index 0000000..f599e28
--- /dev/null
+++ b/debian/compat
@@ -0,0 +1 @@
+10
diff --git a/debian/control b/debian/control
new file mode 100644
index 0000000..54fe8af
--- /dev/null
+++ b/debian/control
@@ -0,0 +1,38 @@
+Source: vspline
+Maintainer: Debian Science Maintainers <debian-science-maintainers at lists.alioth.debian.org>
+Uploaders: Kay F. Jahnke <kfjahnke at gmail.com>
+Section: math
+Priority: extra
+Build-Depends: debhelper (>= 10)
+Build-Depends-Indep: libvigraimpex-dev
+Standards-Version: 3.9.8
+Vcs-Browser: https://anonscm.debian.org/git/debian-science/packages/vspline.git
+Vcs-Git: https://anonscm.debian.org/git/debian-science/packages/vspline.git
+Homepage: https://bitbucket.org/kfj/vspline
+
+Package: vspline-dev
+Architecture: all
+Depends: libvigraimpex-dev
+Suggests: clang,
+          vc-dev
+Description: header-only C++ template library for uniform b-spline processing
+ vspline can create b-splines of:
+ .
+  -  arbitrary real data types and their aggregates
+  -  coming in strided memory
+  -  with a reasonable selection of boundary conditions
+  -  used in either an implicit or an explicit scheme of extrapolation
+  -  arbitrary spline orders
+  -  arbitrary dimensions of the spline
+  -  in multithreaded code
+  -  using the CPU's vector units if possible
+ .
+ on the evaluation side it provides:
+ .
+  -  evaluation of the spline at point locations in the defined range
+  -  evaluation of the spline's derivatives
+  -  mapping of arbitrary coordinates into the defined range
+  -  evaluation of nD arrays of coordinates ('remap' function)
+  -  coordinate-fed remap function ('index_remap')
+  -  functor-based remap, aka 'transform' function
+  -  functor-based 'apply' function
diff --git a/debian/copyright b/debian/copyright
new file mode 100644
index 0000000..6c80ba7
--- /dev/null
+++ b/debian/copyright
@@ -0,0 +1,28 @@
+Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Upstream-Name: vspline
+Source: <https://bitbucket.org/kfj/vspline>
+
+Files: *
+Copyright: 2015-2017 Kay F. Jahnke <kfjahnke at gmail.com>
+License: Expat
+ Permission is hereby granted, free of charge, to any person
+ obtaining a copy of this software and associated documentation
+ files (the "Software"), to deal in the Software without
+ restriction, including without limitation the rights to use,
+ copy, modify, merge, publish, distribute, sublicense, and/or
+ sell copies of the Software, and to permit persons to whom the
+ Software is furnished to do so, subject to the following
+ conditions:
+ .
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the
+ Software.
+ .
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ OTHER DEALINGS IN THE SOFTWARE.
diff --git a/debian/debhelper-build-stamp b/debian/debhelper-build-stamp
new file mode 100644
index 0000000..9508a41
--- /dev/null
+++ b/debian/debhelper-build-stamp
@@ -0,0 +1 @@
+vspline-dev
diff --git a/debian/files b/debian/files
new file mode 100644
index 0000000..0aa7637
--- /dev/null
+++ b/debian/files
@@ -0,0 +1 @@
+vspline-dev_0.1.2-1_all.deb math extra
diff --git a/debian/rules b/debian/rules
new file mode 100755
index 0000000..57e0691
--- /dev/null
+++ b/debian/rules
@@ -0,0 +1,10 @@
+#!/usr/bin/make -f
+
+# for now we rely entirely upon dh.
+
+# There are no build instructions for the vspline-example package,
+# since it's content is meant to be instructive - if the sources are
+# compiled, the resulting binaries aren't meant to be installed.
+
+%:
+	dh $@
diff --git a/debian/source/format b/debian/source/format
new file mode 100644
index 0000000..163aaf8
--- /dev/null
+++ b/debian/source/format
@@ -0,0 +1 @@
+3.0 (quilt)
diff --git a/debian/vspline-dev.examples b/debian/vspline-dev.examples
new file mode 100644
index 0000000..d3ba779
--- /dev/null
+++ b/debian/vspline-dev.examples
@@ -0,0 +1,13 @@
+example/channels.cc
+example/complex.cc
+example/eval.cc
+example/gradient.cc
+example/gsm2.cc
+example/gsm.cc
+example/impulse_response.cc
+example/pano_extract.cc
+example/roundtrip.cc
+example/slice2.cc
+example/slice3.cc
+example/slice.cc
+example/splinus.cc
diff --git a/debian/vspline-dev.install b/debian/vspline-dev.install
new file mode 100644
index 0000000..ad4f949
--- /dev/null
+++ b/debian/vspline-dev.install
@@ -0,0 +1 @@
+*.h  usr/include/vspline
diff --git a/debian/vspline-dev.substvars b/debian/vspline-dev.substvars
new file mode 100644
index 0000000..978fc8b
--- /dev/null
+++ b/debian/vspline-dev.substvars
@@ -0,0 +1,2 @@
+misc:Depends=
+misc:Pre-Depends=
diff --git a/debian/watch b/debian/watch
new file mode 100644
index 0000000..ee7b826
--- /dev/null
+++ b/debian/watch
@@ -0,0 +1,13 @@
+# upstream is kept in a bitbucket repository, which offers tar.gz files for download
+# for all tags which are present. this is a very convenient feature, since it produces
+# what's needed as 'upstream tar ball' without any further ado and the process of stepping
+# up to a new version can be automated. hence this watch file.
+
+version=4
+
+# Initially I used version tags derived from the date, like YYYYMMDD, and I only
+# switched to using 'debian-friendly' tags like xx.yy.zz later. So in order to not
+# pick up these old-style tags I have to be more explicit about the tags uscan should
+# match, that's why I use the more specific RE
+
+https://bitbucket.org/kfj/vspline/downloads?tab=tags .*/([0-9]+\.[0-9]+\.[0-9]+)\.tar\.gz
diff --git a/doxy.h b/doxy.h
new file mode 100644
index 0000000..59dd8bb
--- /dev/null
+++ b/doxy.h
@@ -0,0 +1,324 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+// This header doesn't contain any code, only the text for the main page of the documentation.
+
+/*! \mainpage
+
+ \section intro_sec Introduction
+
+ vspline is a header-only generic C++ library for the creation and processing of uniform B-splines. It aims to be as comprehensive as feasibly possible, yet at the same time producing code which performs well, so that it can be used in production.
+ 
+ vspline was developed on a Linux system using clang++ and g++. It has not been tested with other systems or compilers, and as of this writing I am aware that the code probably isn't portable. My code uses elements from the C++11 standard (mainly the auto keyword and range-based for loops).
+ 
+ vspline relies heavily on two other libraries:
+ 
+ - <a href="http://ukoethe.github.io/vigra/">VIGRA</a>, mainly for handling of multidimensional arrays and general signal processing
+ 
+ - <a href="https://compeng.uni-frankfurt.de/index.php?id=vc">Vc</a>, for the use of the CPU's vector units
+ 
+ I find VIGRA indispensible, omitting it from vspline is not really an option. Use of Vc is optional, though, and has to be activated by defining 'USE_VC'. This should be done by passing -DUSE_VC to the compiler; defining USE_VC only for parts of a project may or may not work.
+ 
+ I have made an attempt to generalize the code so that it can handle
+
+ - arbitrary real data types and their aggregates
+ 
+ - a reasonable selection of boundary conditions
+ 
+ - prefiltering with implicit and explicit extrapolation schemes
+ 
+ - arbitrary spline orders
+ 
+ - arbitrary dimensions of the spline
+ 
+ - in multithreaded code
+ 
+ - using the CPU's vector units if possible
+
+On the evaluation side I provide
+
+ - evaluation of the spline at point locations in the defined range
+ 
+ - evaluation of the spline's derivatives
+
+ - mapping of arbitrary coordinates into the defined range
+ 
+ - evaluation of nD arrays of coordinates (generalized remap function)
+ 
+ - functor based remap (aka 'transform') and 'apply' functions
+ 
+ \section install_sec Installation
+ 
+ vspline is header-only, so it's sufficient to place the headers where your code can access them. VIGRA and Vc are supposed to be installed in a location where they can be found so that includes along the lines of #include <vigra/...> succeed.
+
+ \section compile_sec Compilation
+ 
+ While your distro's packages may be sufficient to get vspline up and running, you may need newer versions of VIGRA and Vc. At the time of this writing the latest versions commonly available were Vc 1.2.0 and VIGRA 1.11.0; I compiled Vc and VIGRA from source, using up-to-date pulls from their respective repositories.
+ 
+ update: ubuntu 17.04 has vigra and Vc packages which are sufficiently up-to-date.
+ 
+ To compile software using vspline, I use clang++:
+ 
+~~~~~~~~~~~~~~
+ clang++ -D USE_VC -pthread -O3 -march=native --std=c++11 your_code.cc -lVc -lvigraimpex
+~~~~~~~~~~~~~~
+ 
+ where the -lvigraimpex can be omitted if vigraimpex (VIGRA's image import/export library) is not used, and linking libVc.a in statically is a good option; on my system
+ the resulting code is faster.
+ 
+ On my previous system I had to add -fabi-version=6 to avoid certain issues with Vc.
+ 
+ Please note that an executable using Vc produced on your system may likely not work on a machine with another CPU. It's best to compile on the intended target. Alternatively, the target architecture can be passed explicitly to the compiler (-march...). 'Not work' in this context means that it may as well crash due to an illegal instruction or wrong alignment.
+ 
+ If you can't use Vc, the code can be made to compile without Vc by omitting -D USE_VC and other flags relevant for Vc:
+ 
+~~~~~~~~~~~~~~
+ clang++ -pthread -O3 --std=c++11 your_code.cc -lvigraimpex
+~~~~~~~~~~~~~~
+ 
+ IF you don't want to use clang++, g++ will also work.
+ 
+ All access to Vc in the code is inside #ifdef USE_VC .... #endif statements, so not defining USE_VC will effectively prevent it's use.
+ 
+ \section license_sec License
+
+ vspline is free software, licensed under this license:
+ 
+~~~~~~~~~~~~
+    vspline - a set of generic tools for creation and evaluation
+              of uniform b-splines
+
+            Copyright 2015 - 2017 by Kay F. Jahnke
+
+    Permission is hereby granted, free of charge, to any person
+    obtaining a copy of this software and associated documentation
+    files (the "Software"), to deal in the Software without
+    restriction, including without limitation the rights to use,
+    copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the
+    Software is furnished to do so, subject to the following
+    conditions:
+
+    The above copyright notice and this permission notice shall be
+    included in all copies or substantial portions of the
+    Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND
+    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+    OTHER DEALINGS IN THE SOFTWARE.
+~~~~~~~~~~~~
+
+ \section quickstart_sec Quickstart
+ 
+ If you stick with the high-level code, using class bspline or the remap function, most of the parametrization is easy. Here are a few examples what you can do.
+ 
+ Let's suppose you have data in a 2D vigra MultiArray 'a'. vspline can handle real data like float and double, and also their 'aggregates', meaning data types like pixels or vigra's TinyVector. But for now, let's assume you have plain float data. Creating the bspline object is easy:
+ 
+~~~~~~~~~~~~~~
+#include <vspline/vspline.h>
+
+...
+
+typedef vspline::bspline < float , 2 > spline_type ; // fix the type of the spline
+ 
+spline_type bspl ( a.shape() ) ; // create bspline object 'bspl' suitable for your data
+ 
+bspl.core = a ;         // copy the source data into the bspline object's 'core' area
+ 
+bspl.prefilter() ; // run prefilter() to convert original data to b-spline coefficients
+~~~~~~~~~~~~~~
+ 
+ The memory needed to hold the coefficients is allocated when the bspline object is constructed.
+ 
+ Obviously many things have been done by default here: The default spline degree was used - it's 3, for a cubic spline. Also, boundary treatment mode 'MIRROR' was used per default. Further default parameters cause the spline to be 'braced' so that it can be evaluated with vspline's evaluation routines, Vc (if compiled in) was used for prefiltering, and the process is automatically partitioned and run in parallel by a thread pool. The only mandatory template arguments are the value type a [...]
+ 
+ While the sequence of operations indicated here looks a bit verbose (why not create the bspline object by a call like bspl(a) ?), in 'real' code you would use bspl.core straight away as the space to contain your data - you might get the data from a file or by some other process or do something like this  where the bspline object provides the array and you interface it via a view to it's 'core':
+   
+~~~~~~~~~~~~~~
+vspline::bspline < double , 1 > bsp ( 10001 , degree , vspline::MIRROR ) ;
+ 
+auto v1 = bsp.core ; // get a view to the bspline's 'core'
+ 
+for ( auto & r : v1 ) r = ... ; // assign some values
+ 
+bsp.prefilter() ; // perform the prefiltering
+~~~~~~~~~~~~~~
+ 
+ This is a common idiom, because it reflects a common mode of operation where you don't need the original, unfiltered data any more after creating the spline, so the prefiltering is done in-place overwriting the original data. If you do need the original data later, you can also use a third idiom:
+ 
+~~~~~~~~~~~~~~
+vigra::MultiArrayView < 3 , double > my_data ( vigra::Shape3 ( 5 , 6 , 7 ) ) ;
+ 
+vspline::bspline < double , 3 > bsp ( my_data.shape() ) ;
+ 
+bsp.prefilter ( my_data ) ;
+~~~~~~~~~~~~~~
+ 
+ Here, the bspline object is first created with the appropriate 'core' size, and prefilter() is called with an array matching the bspline's core. This results in my_data being read into the bspline object during the first pass of the prefiltering process.
+ 
+ There are more ways of setting up a bspline object, please refer to class bspline's constructor. Of course you are also free to directly use vspline's lower-level routines to create a set of coefficients. The lowest level of filtering routine is simply a forward-backward recursive filter with a set of arbitrary poles. This code is in filter.h.
+ 
+ Next you may want to evaluate the spline from the first example at some pair of coordinates x, y. Evaluation of the spline can be done using vspline's 'evaluator' objects. Using the highest level of access, these objects are set up with a bspline object and, after being set up, provide methods to evaluate the spline at given cordinates. Technically, evaluator objects are functors which don't have mutable state (all state is created at creation time and constant afterwards), so they are  [...]
+
+~~~~~~~~~~~~~~
+// for a 2D spline, we want 2D coordinates
+ 
+typedef vigra::TinyVector < float ,2 > coordinate_type ;
+ 
+// get the appropriate evaluator type
+ 
+typedef evaluator_type < coordinate_type , double > eval_type ;
+ 
+// create the evaluator
+ 
+eval_type ev ( bspl ) ;
+ 
+// Now, assuming you have float x and y: 
+ 
+double result = ev ( coordinate_type ( x , y ) ) ; // evaluate at (x,y)
+~~~~~~~~~~~~~~
+
+ Again, some things have happened by default. The evaluator was constructed with a bspline object, making sure that the evaluator is compatible. vspline can also calculate the spline's derivatives. The default is plain evaluation, but you can pass a request for production of derivatives to the evaluator's constructor. Let's assume you want the first derivative along axis 0 (the x axis):
+
+~~~~~~~~~~~~~
+eval_type eval_dx ( bsp , { 1 , 0 } ) ; // ask for an evaluator producing dx
+ 
+float dx = eval_dx ( { x , y } ) ;      // use the evaluator
+~~~~~~~~~~~~~
+
+ For every constellation of derivatives you'll have to create a distinct evaluator.
+ This is not an expensive operation (unless you use very high spline degrees) - the same coefficients are used in all cases, only the weight functors used internally differ. Calculating the spline's derivatives is even slightly faster than plain evaluation, since there are less multiplications to perform.
+ 
+ What about the remap functions? The little introduction demonstrated how you can evaluate the spline at a single location. Most of the time, though, you'll require evaluation at many coordinates. This is what remap functions do. Instead of a single coordinate, you pass a whole vigra::MultiArrayView full of coordinates to it - and another MultiArrayView of the same dimension and shape to accept the results of evaluating the spline at every coordinate in the first array. Here's a simple e [...]
+
+~~~~~~~~~~~~
+// create a 1D array containing (2D) coordinates into 'a'
+ 
+vigra::MultiArray < 1 , coordinate_type > coordinate_array ( 3 ) ;
+ 
+... // fill in the coordinates
+
+// create an array to accomodate the result of the remap operation
+ 
+vigra::MultiArray < 1 , float > target_array ( 3 ) ;
+ 
+// perform the remap
+ 
+vspline::remap < coordinate_type , float > ( a , coordinate_array , target_array ) ;
+~~~~~~~~~~~~
+
+ now the three resultant values are in the target array.
+ 
+ This is an 'ad-hoc' remap, passing source data as an array. You can also set up a bspline object and perform a remap using an evaluator for this bspline object:
+ 
+ vspline::remap < ev_type , 2 > ( ev , coordinate_array , target_array ) ; 
+ 
+ While this routine is also called remap, it has wider scope: 'ev_type' can be any functor providing a suitable interface for providing a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and a vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be any data type suitable for processing with vspline (elementary real types and their un [...]
+ 
+ This form of remap might be named 'transform' and is similar to vigra's point operator code, but uses vspline's automatic multithreading and vectorization to make it very efficient. There's a variation of it where the 'coordinate array' and the 'target array' are the same, effectively performing an in-place transformation, which is useful for things like coordinate transformations or colour space manipulations. This variation is called 'apply'.
+ 
+ There is a variation on remap called 'index_remap'. This one doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in. This variant is helpful when a remap uses a coordinate transformation before evaluating; here, the functor starts out receiving the discrete target coordinates, performs the coordinate transform and then feeds the transformed coordinate to some evaluation routine providing the final r [...]
+ 
+ Class vspline::unary_functor is coded to make it easy to implement functors for image processing pipelines. For more complex operations, you'd code a funtor representing your processing pipeline - often by delegating to 'inner' objects also derived from vspline::unary_functor - and finally use remap or index_remap to bulk-process your data with this functor. This is about as efficient as it gets, since the data are only accessed once, and vspline's remapping code does the tedious work o [...]
+ 
+ And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arrays of coordinates. So if you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level object 'bspline' or the remap functions.
+ 
+ While one might argue that the remap routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that only by tightly coupling them with the b-spline code I can make them really fast. And only by processing several values at once (by multithreading and vectorization) the hardware can be exploited fully. But you're free to omit the remap code, the headers build on top of each other, and remap.h is pretty much at the top.
+ 
+\section speed_sec Speed
+
+ While performance will vary from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:
+
+~~~~~~~~~~~~~~~~~~~~~
+testing bc code MIRROR spline degree 3
+avg 32 x prefilter:........................ 13.093750 ms
+avg 32 x remap from unsplit coordinates:... 59.218750 ms
+avg 32 x remap with internal spline:....... 75.125000 ms
+avg 32 x index_remap ...................... 57.781250 ms
+
+testing bc code MIRROR spline degree 3 using Vc
+avg 32 x prefilter:........................ 9.562500 ms
+avg 32 x remap from unsplit coordinates:... 22.406250 ms
+avg 32 x remap with internal spline:....... 35.687500 ms
+avg 32 x index_remap ...................... 21.656250 ms
+~~~~~~~~~~~~~~~~~~~~~
+
+As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higer-degree splines doesn't take much more time (when using Vc):
+
+~~~~~~~~~~~~~~~~~~~~~
+testing bc code MIRROR spline degree 5 using Vc
+avg 32 x prefilter:........................ 10.687500 ms
+
+testing bc code MIRROR spline degree 7 using Vc
+avg 32 x prefilter:........................ 13.656250 ms
+~~~~~~~~~~~~~~~~~~~~~
+
+Using double precision arithmetics, vectorization doesn't help so much, and prefiltering is actually slower on my system when using Vc. Doing a complete roundtrip run on your system should give you an idea about which mode of operation best suits your needs.
+
+\section design_sec Design
+ 
+ You can probably do everything vspline does with other software - there are several freely available implementations of b-spline interpolation and remap routines. What I wanted to create was an implementation which was as general as possible and at the same time as fast as possible, and, on top of that, comprehensive.
+
+ These demands are not easy to satisfy at the same time, but I feel that my design comes  close. While generality is achieved by generic programming, speed needs exploitation of hardware features, and merely relying on the compiler is not enough. The largest speedup I saw was from multithreading the code. This may seem like a trivial observation, but my design is influenced by it: in order to efficiently multithread, the problem has to be partitioned so that it can be processed by indepe [...]
+ 
+ Another speedup method is data-parallel processing. This is often thought to be the domain of GPUs, but modern CPUs also offer it in the form of vector units. I chose implementing data-parallel processing in the CPU, as it offers tight integration with unvectorized CPU code. It's almost familiar terrain, and the way from writing conventional CPU code to vector unit code is not too far, when using tools like Vc, which abstract the hardware away. Using horizontal vectorization does requir [...]
+ 
+ To use vectorized evaluation efficiently, incoming data have to be presented to the evaluation code in vectorized form, but usually they will come from interleaved  memory. After the evaluation is complete, they have to be stored again to interleaved memory. The deinterleaving and interleaving operations take time and the best strategy is to load once from interleaved memory, perform all necessary operations on vector data and finally store once. The sequence of operations performed on  [...]
+
+ Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via remap from a precalculated warp array. On my system, I have almost reached that goal - my remap times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main tesing ground is pv, my panorama viewer. Here I can often take the spline d [...]
+ 
+ On the other hand, even without using vectorization, the code is certainly fast enough for casual use and may suffice for some production scenarios. This way, vigra becomes the only dependency, and the same binary will work on a wide range of hardware.
+ 
+ \section Literature
+ 
+ There is a large amount of literature on b-splines available online. Here's a pick:
+ 
+ http://bigwww.epfl.ch/thevenaz/interpolation/
+ 
+ http://soliton.ae.gatech.edu/people/jcraig/classes/ae4375/notes/b-splines-04.pdf
+ 
+ http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-basis.html
+ 
+ http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-ex-1.html
+*/
diff --git a/eval.h b/eval.h
new file mode 100644
index 0000000..9b9a57a
--- /dev/null
+++ b/eval.h
@@ -0,0 +1,1480 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file eval.h
+
+    \brief code to evaluate uniform b-splines
+
+    This body of code contains class evaluator and auxilliary classes which are
+    needed for it's smooth operation.
+
+    The evaluation is a reasonably straightforward process: A subset of the coefficient
+    array, containing coefficients 'near' the point of interest, is picked out, and
+    a weighted summation over this subset produces the result of the evaluation.
+    The complex bit is to have the right coefficients in the first place
+    (this is what prefiltering does), and to use the appropriate weights on
+    the coefficient window. For b-splines, there is an efficient method to
+    calculate the weights by means of a matrix multiplication, which is easily
+    extended to handle b-spline derivatives as well. Since this code lends itself
+    to a generic implementation, and it can be parametrized by the spline's order,
+    and since the method performs well, I use it here in preference to the code
+    which Thevenaz uses (which is, for the orders of splines it encompasses, the matrix
+    multiplication written out with a few optimizations, like omitting multiplications
+    with zero, and slightly more concise calculation of powers)
+    
+    Evaluation of a b-spline is, compared to prefiltering, more computationally intensive
+    and less memory-bound, so the profit from vectorization, especially for float data,
+    is more pronounced here than for the prefiltering. On my system, I found single-precision
+    operation was about three to four times as fast as unvectorized code (AVX2).
+    
+    The central class of this file is class evaluator. evaluator objects are set up to
+    provide evaluation of a specific b-spline. Once they are set up they don't change and
+    effectively become pure functors with several overloaded evaluation methods for different
+    constellations of parameters. The evaluation methods typically take their arguments per
+    reference. The details of the evaluation variants, together with explanations of
+    specializations used for extra speed, can be found with the individual evaluation
+    routines.
+    
+    What do I mean by the term 'pure' functor? It's a concept from functional programming.
+    It means that calling the functor will not have any effect on the functor itself - it
+    can't change once it has been constructed. This has several nice effects: it can
+    potentially be optimized very well, it is thread-safe, and it will play well with
+    functioanl programming concepts - and it's conceptionally appealing.
+    
+    Code using class evaluator will probably use it at some core place where it is
+    part of some processing pipeline. An example would be an image processing program:
+    one might have some outer loop generating arguments (typically SIMDIZED types)
+    which are processed one after the other to yield a result. The processing will
+    typically have several stages, like coordinate generation and transformations,
+    then use class evaluator to pick an interpolated intermediate result, which is
+    further processed by, say, colour or data type manipulations before finally
+    being stored in some target container. The whole processing pipeline can be
+    coded to become a single functor, with one of class evaluator's eval-type
+    routines embedded somewhere in the middle, and all that's left is code to
+    efficiently handle the source and destination to provide arguments to the
+    pipeline - like the code in remap.h. And since the code in remap.h is made to
+    provide the data feeding and storing, the only coding needed is for the pipeline,
+    which is where the 'interesting' stuff happens.
+*/
+
+#ifndef VSPLINE_EVAL_H
+#define VSPLINE_EVAL_H
+
+#include "bspline.h"
+#include "unary_functor.h"
+
+namespace vspline {
+
+using namespace std ;
+using namespace vigra ;
+
+// is_singular tests if a type is either a plain fundamental or a Vc SIMD type.
+// The second possibility is only considered if Vc is used at all.
+// This test serves to differentiate between nD values like vigra TinyVectors
+// which fail the test and singular values, which pass. Note that this test
+// fails vigra::TinyVector < T , 1 > even though one might consider it 'singular'.
+
+template < class T >
+using is_singular = typename
+  std::conditional
+  <    std::is_fundamental < T > :: value
+#ifdef USE_VC
+    || Vc::is_simd_vector < T > :: value
+#endif
+    ,
+    std::true_type ,
+    std::false_type
+  > :: type ;
+
+// next we have coordinate splitting functions. For odd splines, coordinates
+// are split into an integral part and a remainder between 0 and 1, which is
+// used for weight generation.
+// we have two variants for odd_split and the dispatch below
+
+template < typename ic_t , typename rc_t , int vsize = 1 >
+void odd_split ( rc_t v , ic_t& iv , rc_t& fv , std::true_type )
+{
+  rc_t fl_i = std::floor ( v ) ;
+  fv = v - fl_i ;
+  iv = ic_t ( fl_i )  ;
+}
+
+template < typename ic_t , typename rc_t , int vsize = 1 >
+void odd_split ( rc_t v , ic_t& iv , rc_t& fv , std::false_type )
+{
+  for ( int d = 0 ; d < vigra::ExpandElementResult < rc_t > :: size ; d++ )
+    odd_split ( v[d] , iv[d] , fv[d] , std::true_type() ) ;
+}
+
+template < typename ic_t , typename rc_t , int vsize = 1 >
+void odd_split ( rc_t v , ic_t& iv , rc_t& fv )
+{
+  odd_split ( v , iv , fv , is_singular<rc_t>() ) ;
+}
+
+// for even splines, the integral part is obtained by rounding. when the
+// result of rounding is subtracted from the original coordinate, a value
+// between -0.5 and 0.5 is obtained which is used for weight generation.
+// we have two variants for even_split and the dispatch below
+
+template < typename ic_t , typename rc_t , int vsize = 1 >
+void even_split ( rc_t v , ic_t& iv , rc_t& fv , std::true_type )
+{
+  rc_t fl_i = std::round ( v ) ;
+  fv = v - fl_i ;
+  iv = ic_t ( fl_i ) ;
+}
+
+template < typename ic_t , typename rc_t , int vsize = 1 >
+void even_split ( rc_t v , ic_t& iv , rc_t& fv , std::false_type )
+{
+  for ( int d = 0 ; d < vigra::ExpandElementResult < rc_t > :: size ; d++ )
+    even_split ( v[d] , iv[d] , fv[d] , std::true_type() ) ;
+}
+
+template < typename ic_t , typename rc_t , int vsize = 1 >
+void even_split ( rc_t v , ic_t& iv , rc_t& fv )
+{
+  even_split ( v , iv , fv , is_singular<rc_t>() ) ;
+}
+
+// TODO describe the use of the 'weight matrix' rather than technical details
+
+/// The routine 'calculate_weight_matrix' originates from vigra. I took the original
+/// routine BSplineBase<spline_order, T>::calculateWeightMatrix() from vigra and changed it
+/// in several ways:
+///
+/// - the spline degree is now a runtime parameter, not a template argument
+/// - the derivative degree is passed in as an additional parameter, directly
+///   yielding the appropriate weight matrix needed to calculate a b-spline's derivative
+///   with only a slight modification to the original code
+/// - the code uses my modified bspline basis function which takes the degree as a
+///   run time parameter instead of a template argument and works with integral
+///   operands and precalculated values, which makes it very fast, even for high
+///   spline degrees. bspline_basis() is in basis.h.
+
+template < class target_type >
+MultiArray < 2 , target_type > calculate_weight_matrix ( int degree , int derivative )
+{
+  const int order = degree + 1 ;
+  
+  if ( derivative >= order ) // guard against impossible parameters
+    return MultiArray < 2 , target_type >() ;
+
+  // allocate space for the weight matrix
+  MultiArray < 2 , target_type > res = MultiArray < 2 , target_type > ( order , order - derivative ) ;
+  
+  long double faculty = 1.0 ;
+  
+  for ( int row = 0 ; row < order - derivative ; row++ )
+  {
+    if ( row > 1 )
+      faculty *= row ;
+
+    int x = degree / 2 ; // (note: integer division)
+
+    // we store to a MultiArray, which is row-major, so storing as we do
+    // places the results in memory in the precise order in which we want to
+    // use them in the weight calculation.
+    // note how we pass x to bspline_basis() as an integer. This way, we pick
+    // a very efficient version of the basis function which only evaluates at
+    // whole numbers. This basis function version does hardly any calculations
+    // but instead relies on precalculated values. see bspline_basis in prefilter.h
+    // note: with large degrees (20+), this still takes a fair amount of time, but
+    // rather seconds than minutes with the standard routine.
+    
+    for ( int column = 0 ; column < order ; ++column , --x )
+      res ( column , row ) = bspline_basis<long double> ( x , degree , row + derivative ) / faculty;
+  }
+
+  return res;
+}
+
+/// while we deal with B-splines here, there is no need to limit the evaluator
+/// code only to B-spline basis functions. The logic is the same for any type of evaluation
+/// which functions like a separable convolution of an equilateral subarray of the coefficient
+/// array, and the only thing specific to b-splines is the weight generation.
+///
+/// So I coded the general case, which can use any weight generation function. Coding this
+/// introduces a certain degree of complexity, which I feel is justified for the flexibility
+/// gained. The complexity is mainly due to the fact that, while
+/// we can write a simple (templated) function to generate weights (as above), we can't pass
+/// such a template as an object to a function. Instead we use an abstract base class for
+/// the weight functor and inherit from it for specific weight generation methods.
+///
+/// I made some investigations towards coding evaluation of splines with different orders
+/// along the different axes, but this introduced too much extra complexity for my taste and
+/// took me too far away from simply providing efficient code for b-splines, so I abandoned
+/// the attempts. Therefore the weight functors for a specific spline all have to have a common
+/// spline_order and generate spline_order weights. The only way to force lesser order weight
+/// functors into this scheme is to set some of the weights to zero. Weight functors
+/// of higher spline_order than the spline can't be accomodated, if that should be necessary,
+/// the spline_order of the entire spline has to be raised.
+///
+/// Note the use of 'delta' in the functions below. this is due to the fact that these functors
+/// are called with the fractional part of a real valued coordinate.
+///
+/// first we define a base class for (multi-)functors calculating weights.
+/// this base class can accomodate weight calculation with any weight generation
+/// function using the same signature. It is not specific to b-splines.
+/// We access the weight functors via a pointer to this base class in the code below.
+
+template < typename ele_type , // elementary type of value_type
+           typename rc_type ,  // type of real-valued coordinate
+           int vsize = 1 >
+struct weight_functor_base
+{
+  // we define two pure virtual overloads for operator(), one for unvectorized
+  // and one for vectorized operation. In case the scope of evaluation is extended
+  // to other types of values, we'll have to add the corresponding signatures here.
+  
+  virtual void operator() ( ele_type* result , const rc_type& delta ) = 0 ;
+  
+#ifdef USE_VC
+
+  typedef typename vector_traits < ele_type , vsize > :: ele_v ele_v ;
+  typedef typename vector_traits < rc_type , vsize > :: ele_v rc_v ;
+
+  virtual void operator() ( ele_v* result , const rc_v& delta ) = 0 ;
+
+#endif
+} ;
+
+/// this functor calculates weights for a b-spline or it's derivatives.
+/// with d == 0, the weights are calculated for plain evaluation.
+/// Initially I implemented weight_matrix as a static member, hoping the code
+/// would perform better, but I could not detect significant benefits. Letting
+/// the constructor choose the derivative gives more flexibility and less type
+/// proliferation.
+
+template < typename target_type ,   // type for weights (may be a simdized type)
+           typename ele_type ,      // elementary type of value_type
+           typename delta_type >    // type for deltas (may be a simdized type)
+
+struct bspline_derivative_weights
+{
+  typedef typename MultiArray < 2 , ele_type > :: iterator wm_iter ;
+
+  // TODO I would make this const, but in vigra, the iterator obtained by calling begin()
+  // on a const array is protected (multi_ierator.hxx, 438) why is this so?
+
+  MultiArray < 2 , ele_type > weight_matrix ;
+  const int degree ;
+  const int derivative ;
+  const int columns ;
+  wm_iter wm_begin ;
+  wm_iter wm_end ;
+
+  bspline_derivative_weights ( int _degree , int _derivative = 0 )
+  : weight_matrix ( calculate_weight_matrix < ele_type > ( _degree , _derivative ) ) ,
+    degree ( _degree ) ,
+    derivative ( _derivative ) ,
+    columns ( _degree + 1 )
+  { 
+    wm_begin = weight_matrix.begin() ;
+    wm_end = weight_matrix.end() ;
+  } ;
+  
+  void operator() ( target_type* result , const delta_type & delta )
+  {
+    target_type power ( delta ) ;
+    wm_iter factor_it = wm_begin ;
+    const wm_iter factor_end = wm_end ;
+
+    // the result is initialized with the first row of the 'weight matrix'.
+    // We save ourselves multiplying it with delta^0.
+ 
+    for ( int c = 0 ; c < columns ; c++ )
+    {
+      result[c] = *factor_it ;
+      ++factor_it ;
+    }
+    
+    if ( degree )
+    {
+      for ( ; ; )
+      {
+        for ( int c = 0 ; c < columns ; c++ )
+        {
+          result[c] += power * *factor_it ;
+          ++factor_it ;
+        }
+        if ( factor_it == factor_end ) // avoid multiplication if exhausted, break now
+          break ;
+        power *= target_type ( delta ) ; // otherwise produce next power(s) of delta(s)
+      }
+    }
+  }
+} ;
+
+/// we derive from the weight functor base class to obtain a (multi-) functor
+/// specifically for (derivatives of) a b-spline :
+
+template < typename ele_type , typename rc_type , int vsize = 1 >
+struct bspline_derivative_weight_functor
+: public weight_functor_base < ele_type , rc_type , vsize >
+{
+  typedef weight_functor_base < ele_type , rc_type , vsize > base_class ;
+
+  // set up the fully specialized functors to which operator() delegates:
+
+  bspline_derivative_weights < ele_type , ele_type , rc_type >  weights ;
+
+#ifdef USE_VC
+  using typename base_class::ele_v ;
+  using typename base_class::rc_v ;
+  
+  bspline_derivative_weights < ele_v , ele_type , rc_v >  weights_v ; 
+#endif
+
+  bspline_derivative_weight_functor ( int degree , int d = 0 )
+  : weights ( degree , d )
+#ifdef USE_VC
+  , weights_v ( degree , d )
+#endif
+  {
+  }
+  
+  // handle the weight calculation by delegation to the functors set up at construction
+  
+  virtual void operator() ( ele_type* result , const rc_type& delta )
+  {
+    weights ( result , delta ) ;
+  }
+
+#ifdef USE_VC
+  virtual void operator() ( ele_v* result , const rc_v& delta )
+  {
+    weights_v ( result , delta ) ;
+  }
+#endif
+} ;
+
+// not a very useful weight function, but it suits to prove the concept of plug-in
+// weight functions works as intended. Instead of calculating the weights, as in the functor
+// above, this one simply returns equal weights. The result is that, no matter what delta
+// is passed in, the weights are the same and average over the coefficients to which they
+// are applied.
+// Note here that one important property of the weights is that they constitute
+// a partition of unity. Both the (derivative) b-spline weights and this simple
+// weight functor share this property.
+// currently unused, but left in for demonstration purposes
+
+/*
+
+template < typename rc_type >
+struct average_weight_functor
+: public weight_functor_base < rc_type >
+{
+  typedef weight_functor_base < rc_type > base_class ;
+  using typename base_class::rc_type_v ;
+
+  const rc_type weight ;
+  const int order ;
+  
+  average_weight_functor ( int degree )
+  : weight ( rc_type ( 1.0 ) / rc_type ( degree + 1 ) ) ,
+    order ( degree + 1 )
+  { } ;
+  
+  virtual void operator() ( rc_type* result , const rc_type& delta )
+  {
+    for ( int e = 0 ; e < order ; e++ )
+      result[e] = weight ;
+  }
+  
+  virtual void operator() ( rc_type_v* result , const rc_type_v& delta )
+  {
+    for ( int e = 0 ; e < order ; e++ )
+      result[e] = weight ;
+  }  
+} ;
+
+*/
+
+/// class evaluator encodes evaluation of a B-spline. The evaluation relies on 'braced'
+/// coefficients, as they are normally provided by a vspline::bspline object (the exception
+/// being bspline objects created with UNBRACED or MANUAL strategy). While the most
+/// general constructor will accept a MultiArrayView to coefficients (including the necessary
+/// 'brace'), this will rarely be used, and an evaluator will be constructed from a bspline
+/// object. In the most trivial case there are only two thing which need to be done:
+///
+/// The specific type of evaluator has to be established by providing the relevant template
+/// arguments. Here, we need two types: the 'coordinate type' and the 'value type'.
+///
+/// - The coordinate type is encoded as a vigra::TinyVector of some real data type - if you're
+/// doing image processing, the typical type would be a vigra::TinyVector < float , 2 >.
+///
+/// - The value type has to be either an elementary real data type such as 'float' or 'double',
+/// or a vigra::TinyVector of such an elementary type. Other data types which can be handled
+/// by vigra's ExpandElementResult mechanism should also work. When processing colour images,
+/// your value type would typically be a vigra::TinyVector < float , 3 >.
+///
+/// Additionally, the bool template argument 'even_spline_order' has to be set. This is due to
+/// the fact that even splines (degree 0, 2, 4...) and odd splines (1, 3, 5...) are computed
+/// differently, and making the distinction at run-time would be possible, but less efficient.
+///
+/// With the evaluator's type established, an evaluator of this type can be constructed by
+/// passing a vspline::bspline object to the constructor. Naturally, the bspline object has
+/// to contain data of the same value type, and the spline has to have the same number of
+/// dimensions as the coordinate type.
+/// 
+/// I have already hinted at the evaluation process used, but here it is again in a nutshell:
+/// The coordinate at which the spline is to be evaluated is split into it's integral part
+/// and a remaining fraction. The integral part defines the location where a window from the
+/// coefficient array is taken, and the fractional part defines the weights to use in calculating
+/// a weighted sum over this window. This weighted sum represents the result of the evaluation.
+/// Coordinate splitting is done with the method split(), which picks the appropriate variant
+/// (different code is needed for odd and even splines)
+///
+/// The generation of the weights to be applied to the window
+/// of coefficients is performed by employing the weight functors above. What's left to do is
+/// to bring all the components together, which happens in class evaluator. The workhorse
+/// code in the subclasses _eval and _v_eval takes care of performing the necessary operations
+/// recursively over the dimensions of the spline.
+///
+/// The code in class evaluator begins with a sizeable constructor which sets up as much
+/// as possible to support the evaluation code to do it's job as fast as possible. Next follows
+/// the unvectorized code, finally the vectorized code.
+///
+/// There is a variety of overloads of class evaluator's eval() method available, because
+/// class evaluator inherits from vspline::unary_functor.
+///
+/// The evaluation strategy is to have all dependencies of the evaluation except for the actual
+/// coordinates taken care of by the constructor - and immutable for the evaluator's lifetime.
+/// The resulting object has no state which is modified after construction, making it thread-safe.
+/// It also constitutes a 'pure' function in a functional-programming sense, because it has
+/// no mutable state and no side-effects, as can be seen by the fact that the eval methods
+/// are all marked const.
+///
+/// The eval() overloads form a hierarchy, as evaluation progresses from accepting unsplit real
+/// coordinates to split coordinates and finally offsets and weights. This allows calling code to
+/// handle parts of the delegation hierarchy itself, only using class evaluator at a specific level.
+///
+/// By providing the evaluation in this way, it becomes easy for calling code to integrate
+/// the evaluation into more complex functors. Consider, for example, code
+/// which generates coordinates with a functor, then evaluates a b-spline at these coordinates,
+/// and finally subjects the resultant values to some postprocessing. All these processing
+/// steps can be bound into a single functor, and the calling code can be reduced to polling
+/// this functor until it has obtained the desired number of output values.
+///
+/// While the 'unspecialized' evaluator will try and do 'the right thing' by using general
+/// purpose code fit for all eventualities, for time-critical operation there is a
+/// specialization which can be used to make the code faster:
+///
+/// - template argument 'specialize' can be set to 0 to forcibly use (more efficient) nearest
+/// neighbour interpolation, which has the same effect as simply running with degree 0 but avoids
+/// code which isn't needed for nearest neighbour interpolation (like the application of weights,
+/// which is futile under the circumstances, the weight always being 1.0).
+/// specialize can also be set to 1 for explicit n-linear interpolation. Any other value will
+/// result in the general-purpose code being used.
+///
+/// Note that, contrary to my initial implementation, all forms of coordinate mapping were
+/// removed from class evaluator. The 'mapping' which is left is, more aptly, called
+/// 'splitting', since the numeric value of the incoming coordinate is never modified.
+/// Folding arbitrary coordinates into the spline's defined range now has to be done
+/// externally, typically by wrapping class evaluator together with some coordinate
+/// modification code into a combined vspline::unary_functor. map.h provides code for
+/// common mappings, see there.
+///
+/// class evaluator inherits from class unary_functor, which implements the broader concept.
+/// class evaluator has some methods to help with special cases, but apart from that it is
+/// a standard vspline::unary_functor. This inheritance gives us the full set of class
+/// unary_functor's methods, among them the convenient overloads of operator() which
+/// allow us to invoke class evaluator's evaluation with function call syntax.
+///
+/// Note how the number of vector elements is fixed here by picking the number of ele_type
+/// which vspline::vector_traits considers appropriate. There should rarely be a need to
+/// choose a different number of vector elements: evaluation will often be the most
+/// computationally intensive part of a processing chain, and therefore this choice is
+/// sensible. But it's not mandatory.
+
+// TODO: we inherit from uf_types to use the standard evaluation
+// type system, but I'd like to incorporate that into unary_functor.
+// Can I have these types in some other way? I'd like to get rid
+// of uf_types altogether, and this is the only other place where
+// it's used.
+
+template < typename _coordinate_type , // nD real coordinate
+           typename _value_type ,      // type of coefficient/result
+#ifdef USE_VC
+           // nr. of vector elements
+           int _vsize = vspline::vector_traits < _value_type > :: size ,
+#else
+           int _vsize = 1 ,
+#endif
+           // specialize for degree 0 or 1 spline
+           typename specialize = std::integral_constant<int,-1>
+         >
+class evaluator_policy
+: public uf_types < _coordinate_type , _value_type , _vsize >
+{
+public:
+
+  typedef _value_type value_type ; // == base_type::out_type
+
+  // we want to access facilites of the base class (vspline::uf_types<...>)
+  // so we use a typedef for the base class.
+
+  typedef uf_types < _coordinate_type , _value_type , _vsize > base_type ;
+
+  // pull in standard evaluation type system with this macro:
+
+  using_unary_functor_types ( base_type ) ;
+  
+  // relying on base type to provide these:
+  
+  typedef in_ele_type rc_type ;    // elementary type of a coordinate
+  typedef out_ele_type ele_type ;  // elementary type of value_type
+
+  // Initially I was using a template argument for this flag, but it turned out
+  // that using a const bool set at construction time performed just as well.
+  // Since this makes using class evaluator easier, I have chosen to go this way.
+
+  const bool even_spline_order ;   // flag containing the 'evenness' of the spline
+
+  enum { dimension = dim_in }  ;
+  enum { level = dim_in - 1 }  ;
+  enum { channels = dim_out } ;
+
+  // types for nD integral indices, these are not in base_type
+  
+  typedef int ic_type ;            // we're only using int for indices
+  
+  typedef vigra::TinyVector < ic_type , dimension > nd_ic_type ;
+  
+  // we don't use:  typedef in_type nd_rc_type ;
+  // because if the spline is 1D, in_type comes out as, say, a plain float,
+  // while the code in class evaluator expects TinyVector<float,1>. This produces
+  // no inconvenience, since there are specializations for 1D splines.
+
+  typedef vigra::TinyVector < rc_type , dimension > nd_rc_type ;
+
+  /// view_type is used for a view to the coefficient array
+
+  typedef MultiArrayView < dimension , value_type >                view_type ;
+
+  /// type used for nD array coordinates, array shapes
+
+  typedef typename view_type::difference_type                      shape_type ;
+
+  typedef vigra::TinyVector < int , dimension >                    derivative_spec_type ;
+
+  typedef typename MultiArrayView < 1 , ic_type > :: const_iterator offset_iterator ;
+  
+  typedef vigra::MultiCoordinateIterator<dimension>                nd_offset_iterator ;
+  
+  typedef MultiArrayView < dimension + 1 , ele_type >              component_view_type ;
+  
+  typedef typename component_view_type::difference_type            expanded_shape_type ;
+  
+  typedef vigra::TinyVector < ele_type* , channels >               component_base_type ;
+
+  typedef weight_functor_base < ele_type , rc_type , vsize > weight_functor_base_type ;
+
+  /// in the context of b-spline calculation, this is the weight generating
+  /// functor which will be used:
+
+  typedef bspline_derivative_weight_functor < ele_type , rc_type , vsize >
+    bspline_weight_functor_type ;
+  
+  // to try out gaussian weights, one might instead use
+  // typedef gaussian_weight_functor < ele_type > bspline_weight_functor_type ;
+  
+  /// we need one functor per dimension:
+    
+  typedef vigra::TinyVector < weight_functor_base_type* , dimension > nd_weight_functor ;
+  
+  // while in the context of B-splines the weight functors are, of course, functors which
+  // calculate the weights via the B-spline basis functions, the formulation we use here
+  // allows us to use any set of functors that satisfy the argument type. With this
+  // flexible approach, trying out other basis functions becomes simple: just write the
+  // functor and pass it in, it doesn't have to have anything to do with B-splines at all.
+  // Another way to employ a different weight generation method is by passing different
+  // weights into operator() below.
+  // By default class evaluator will use b-spline weights.
+  
+private:
+  
+  nd_weight_functor fweight ;       ///< set of pointers to weight functors, one per dimension
+  const view_type & coefficients ;   ///< b-spline coefficient array
+  const shape_type expanded_stride ;                 ///< strides in terms of expanded value_type
+  MultiArray < 1 , ic_type > offsets ;               ///< offsets in terms of value_type
+  MultiArray < 1 , ic_type > component_offsets ;     ///< offsets in terms of ele_type, for vc op
+  component_base_type component_base ;
+  component_view_type component_view ;
+  bspline_weight_functor_type wfd0 ; ///< default weight functor: underived bspline
+  const int spline_degree ;
+  const int spline_order ;
+  const int window_size ;
+
+public:
+
+  const int & get_order() const
+  {
+    return spline_order ;
+  }
+
+  const int & get_degree() const
+  {
+    return spline_degree ;
+  }
+
+  const shape_type & get_stride() const
+  {
+    return coefficients.stride() ;
+  }
+
+  /// this constructor is the most flexible variant and will ultimately be called by all other
+  /// constructor overloads. This constructor will not usually be called directly - rather
+  /// use the overloads taking vspline::bspline objects. The constructor takes three arguments:
+  /// - a vigra::MultiArrayView to (braced) coefficients
+  /// - the degree of the spline (3 == cubic spline)
+  /// - specification of the desired derivative of the spline, defaults to 0 (plain evaluation).
+
+  evaluator_policy ( const view_type & _coefficients ,
+              int _spline_degree ,
+              derivative_spec_type _derivative )
+  : coefficients ( _coefficients ) ,
+    spline_degree ( _spline_degree ) ,
+    even_spline_order ( ! ( _spline_degree & 1 ) ) ,
+    spline_order ( _spline_degree + 1 ) ,
+    component_view ( _coefficients.expandElements ( 0 ) ) ,
+    expanded_stride ( channels * _coefficients.stride() ) ,
+    wfd0 ( _spline_degree , 0 ) ,
+    window_size ( std::pow ( _spline_degree + 1 , int(dimension) ) )
+  {
+    // initalize the weight functors. In this constructor, we use only bspline weight
+    // functors, even though the evaluator can operate with all weight functors
+    // filling in the right number of basis values given a delta. To make use of this
+    // flexibility, one would derive from this class or write another constructor.
+    // Note how we code so that the default case (plain evaluation with no derivatives)
+    // results in use of only one single weight functor.
+
+    for ( int d = 0 ; d < dimension ; d++ )
+    {
+      if ( _derivative[d] )
+      {
+        fweight[d] = new bspline_weight_functor_type ( _spline_degree , _derivative[d] ) ;
+      }
+      else
+      {
+        fweight[d] = &wfd0 ; // pick the default if derivative is 0
+      }
+    }
+    
+    // calculate the number of offsets needed and create the array to hold them
+    // The evaluation forms a weighted sum of a window into the coeffcicent array.
+    // The sequence of offsets we calculate here is the set of pointer differences
+    // from the first element in that window to all elements in the window. It's
+    // another way of coding this window, where all index calculations have already
+    // been done beforehand rather than performing it during the traversal of the
+    // window by means of stride/shape arithmetic. Coding the window in this fashion
+    // also makes it easy to vectorize the code.
+    
+    offsets = MultiArray < 1 , ptrdiff_t > ( window_size ) ;
+    component_offsets = MultiArray < 1 , ptrdiff_t > ( window_size ) ;
+    
+    // we fill the offset array in a simple fashion: we do a traversal of the window
+    // and calculate the pointer difference for every element reached. We use the
+    // same loop to code the corresponding offsets to elementary values (ele_type)
+  
+    auto sample = coefficients.subarray ( shape_type() , shape_type(spline_order) ) ;
+    auto base = sample.data() ;
+    auto target = offsets.begin() ;
+    auto component_target = component_offsets.begin() ;
+
+    for ( auto &e : sample )
+    {
+      *target = &e - base ;
+      *component_target = channels * *target ;
+      ++target ;
+      ++component_target ;
+    }
+
+    // set up a set of base adresses for the component channels. This is needed
+    // for the vectorization, as the vector units can only work on elementary types
+    // (ele_type) and not on aggregates, like pixels.
+    
+    expanded_shape_type eshp ;
+    for ( int i = 0 ; i < channels ; i++ )
+    {
+      eshp[0] = i ;
+      component_base[i] = &(component_view[eshp]) ;
+    }
+  } ;
+
+  /// simplified constructors from a bspline object
+  /// when using the higher-level interface to vspline's facilities - via class bspline -
+  /// the bspline object provides the coefficients. The derivative specification is passed
+  /// in just as for the general constructor. Note that the derivative specification can be
+  /// individually chosen for each axis.
+  
+  evaluator_policy ( const bspline < value_type , dimension > & bspl ,
+              derivative_spec_type _derivative = derivative_spec_type ( 0 )
+            )
+  : evaluator_policy ( bspl.coeffs ,
+                bspl.spline_degree ,
+                _derivative )
+  {
+    if ( bspl.spline_degree > 1 && ! bspl.braced )
+      throw not_supported ( "for spline degree > 1: evaluation needs braced coefficients" ) ;
+  } ;
+
+  /// obtain_weights calculates the weights to be applied to a section of the coefficients from
+  /// the fractional parts of the split coordinates. What is calculated here is the evaluation
+  /// of the spline's basis function at dx, dx+1 , dx+2..., but doing it naively is computationally
+  /// expensive, as the evaluation of the spline's basis function at arbitrary values has to
+  /// look at the value, find out the right interval, and then calculate the value with the
+  /// appropriate function. But we always have to calculate the basis function for *all*
+  /// intervals anyway, and the method used here performs this tasks efficiently using a
+  /// vector/matrix multiplication.
+  ///
+  /// If the spline is more than 1-dimensional, we need a set of weights for every dimension.
+  /// The weights are accessed with a 2D vigra MultiArrayView.
+  ///
+  /// Contrary to my initial implementation, I fill the 'workspace' in order ascending with
+  /// the axis, so now weights for axis 0 are first etc.. This results in a slighly funny-looking
+  /// initial call to _eval, but the confusion from reverse-filling the workspace was probably worse.
+
+  // TODO: change (*(fweight[axis]))() to work on 'weight' rather than on a pointer?
+  
+  template < typename nd_rc_type , typename weight_type >
+  void obtain_weights ( const MultiArrayView < 2 , weight_type > & weight ,
+                        const nd_rc_type& c ) const
+  {
+    auto ci = c.cbegin() ;
+    for ( int axis = 0 ; axis < dimension ; ++ci , ++axis )
+      (*(fweight[axis])) ( weight.data() + axis * spline_order , *ci ) ;
+  }
+
+  /// obtain weight for a single axis
+
+  template < typename rc_type , typename weight_type >
+  void obtain_weights ( weight_type * p_weight ,
+                        const int & axis ,
+                        const rc_type& c ) const
+  {
+    (*(fweight[axis])) ( p_weight , c ) ;
+  }
+
+  /// split function. This function is used to split incoming real coordinates
+  /// into an integral and a remainder part, which are used at the core of the
+  /// evaluation. selection of even or odd splitting is done via the const bool
+  /// flag 'even_spline_order' My initial implementation had this flag as a
+  /// template argument, but this way it's more flexible and there seems to
+  /// be no runtime penalty. This method delegates to the free function templates
+  /// even_split and odd_split, respectively, which are defined above class evaluator.
+
+  template < class IT , class RT >
+  void split ( const RT& input , IT& select , RT& tune ) const
+  {
+    if ( even_spline_order )
+      even_split < IT , RT , vsize > ( input , select , tune ) ;
+    else
+      odd_split < IT , RT , vsize > ( input , select , tune ) ;
+  }
+
+  /// _eval is the workhorse routine and implements the recursive arithmetic needed to
+  /// evaluate the spline. First the weights for the current dimension are obtained
+  /// from the weights object passed in. Once the weights are known, they are successively
+  /// multiplied with the results of recursively calling _eval for the next
+  /// lower dimension and the products are summed up to produce the return value.
+  /// The scheme of using a recursive evaluation has several benefits:
+  /// - it needs no explicit intermediate storage of partial sums (uses stack instead)
+  /// - it makes the process dimension-agnostic in an elegant way
+  /// - therefore, the code is also thread-safe
+  /// _eval works with a base pointer and an iterator over offsets, just like the vectorized version.
+  /// note that this routine is used for operation on braced splines, with the sequence of offsets to be
+  /// visited fixed at the evaluator's construction. But in this non-vectorized routine, passing in a
+  /// different sequence of offsets for evaluation in an area where boundary conditions apply would be
+  /// feasible (akin to Thevenaz' indexing), even though it would require a lot of new logic, since
+  /// currently the bracing takes care of the boundary conditions.
+  ///  
+  /// I'd write a plain function template and partially specialize it for 'level', but that's
+  /// not allowed, so I use a functor instead.
+  
+  template < int level , class dtype >
+  struct _eval
+  {
+    dtype operator() ( const dtype* & pdata ,
+                       offset_iterator & ofs ,
+                       const MultiArrayView < 2 , ele_type > & weight
+                     ) const
+    {
+      dtype result = dtype() ;
+      for ( int i = 0 ; i < weight.shape(0) ; i++ )
+      {
+        result +=   weight [ Shape2 ( i , level ) ]
+                  * _eval < level - 1 , dtype >() ( pdata , ofs , weight ) ;
+      }
+      return result ;
+    }
+  } ;
+
+  /// at level 0 the recursion ends, now we finally apply the weights for axis 0
+  /// to the window of coefficients. Note how ofs is passed in per reference. This looks
+  /// wrong, but it's necessary: When, in the course of the recursion, the level 0
+  /// routine is called again, it needs to access the next bunch of spline_order coefficients.
+  /// Just incrementing the reference saves us incrementing higher up.
+  
+  template < class dtype >
+  struct _eval < 0 , dtype >
+  {
+    dtype operator() ( const dtype* & pdata ,
+                       offset_iterator & ofs ,
+                       const MultiArrayView < 2 , ele_type > & weight
+                     ) const
+    {
+      dtype result = dtype() ;
+      for ( int i = 0 ; i < weight.shape(0) ; i++ )
+      {
+        result += pdata [ *ofs ] * weight [ Shape2 ( i , 0 ) ] ;
+        ++ofs ;
+      }
+      return result ;
+    }
+  } ;
+
+  /// _eval_linear implements the specialization for n-linear (degree 1) b-splines.
+  /// here, there is no gain to be had from working with precomputed per-axis weights,
+  /// the weight generation is trivial. So the specialization given here is faster:
+  
+  template < int level , class dtype >
+  struct _eval_linear
+  {
+    dtype operator() ( const dtype* & pdata ,
+                       offset_iterator & ofs ,
+                       const nd_rc_type& tune
+                     ) const
+    {
+      dtype result = dtype ( 1.0 - tune[level] )
+                     * _eval_linear < level - 1 , dtype >() ( pdata , ofs , tune ) ;
+      result += tune[level]
+                * _eval_linear < level - 1 , dtype >() ( pdata , ofs , tune ) ;
+      return result ;
+    }
+  } ;
+
+  /// again, level 0 terminates the recursion
+  
+  template < class dtype >
+  struct _eval_linear < 0 , dtype >
+  {
+    dtype operator() ( const dtype* & pdata ,
+                       offset_iterator & ofs ,
+                       const nd_rc_type& tune
+                     ) const
+    {
+      dtype result = dtype ( 1.0 - tune[0] ) * pdata [ *ofs ] ;
+      ++ofs ;
+      result += tune[0] * pdata [ *ofs ] ;
+      ++ofs ;
+      return result ;
+    }
+  } ;
+    
+  // next are evaluation routines. there are quite a few, since I've coded for operation
+  // from different starting points and both for vectorized and nonvectorized operation.
+  
+  /// evaluation variant which takes an offset and a set of weights. This is the final delegate,
+  /// calling the recursive _eval method. Having the weights passed in via a const MultiArrayView &
+  /// allows calling code to provide their own weights, together with their shape, in one handy
+  /// packet. And in the normal sequence of delegation inside class eval, the next routine 'up'
+  /// can also package the weights nicely in a MultiArrayView. Note that select is now an ic_type,
+  /// a single integral value representing an offset into the coefficient array.
+  
+  void eval ( const ic_type & select ,
+              const MultiArrayView < 2 , ele_type > & weight ,
+              value_type & result ) const
+  {
+    const value_type * base = coefficients.data() + select ;
+    // offsets reflects the positions inside the subarray:
+    offset_iterator ofs = offsets.begin() ;
+    result = _eval<level,value_type>() ( base , ofs , weight ) ;
+  }
+
+  /// 'penultimate' delegate, taking a multidimensional index 'select' to the beginning of
+  /// the coefficient window to process. Here the multidimensional index is translated
+  /// into an offset from the coefficient array's base adress. Carrying the information as
+  /// a reference to a multidimensional index all the way to this point does no harm, it's
+  /// only a reference after all.
+
+  void eval ( const nd_ic_type & select ,
+              const MultiArrayView < 2 , ele_type > & weight ,
+              value_type & result ) const
+  {
+    eval ( sum ( select * coefficients.stride() ) , weight , result ) ;
+  }
+
+  /// evaluation variant taking the components of a split coordinate, first the shape_type
+  /// representing the origin of the coefficient window to process, second the fractional parts
+  /// of the coordinate which are needed to calculate the weights to apply to the coefficient
+  /// window. Note that the weights aren't as many values as the window has coefficients, but
+  /// only spline_order weights per dimension; the 'final' weights would be the outer product of the
+  /// sets of weights for each dimension, which we don't explicitly use here. Also note that
+  /// we limit the code here to have the same number of weights for each axis.
+  /// Here we have the code for the specializations affected by the template argument
+  /// 'specialize' (TODO silly name) which provide specialized code for degree 0 and 1
+  /// b-splines (aka nearest-neighbour and n-linear interpolation)
+  
+  void eval ( const nd_ic_type& select ,
+              const nd_rc_type& tune ,
+              value_type & result ) const
+  {
+    if ( specialize::value == 0 )
+    {
+      // nearest neighbour. simply pick the coefficient
+      result = coefficients [ select ] ;
+    }
+    else if ( specialize::value == 1 )
+    {
+      // linear interpolation. use specialized _eval_linear object
+      const value_type * base =   coefficients.data()
+                                + sum ( select * coefficients.stride() ) ;
+      offset_iterator ofs = offsets.begin() ;  // offsets reflects the positions inside the subarray
+      result = _eval_linear<level,value_type>() ( base , ofs , tune ) ;
+    }
+    else
+    {
+      // general case. this works for degree 0 and 1 as well, but is less efficient
+
+      MultiArray < 2 , ele_type > weight ( Shape2 ( spline_order , dimension ) ) ;
+      
+      // now we call obtain_weights, which will fill in 'weight' with weights for the
+      // given set of fractional coordinate parts in 'tune'
+
+      obtain_weights ( weight , tune ) ;
+      eval ( select , weight , result ) ;   // delegate
+    }
+  }
+
+  /// this eval variant take an unsplit coordinate and splits it into an integral
+  /// and a remainder part, using split(). The resulting 'split' coordinate is fed
+  /// to the routine above.
+  
+  void eval ( const nd_rc_type& c , // unsplit coordinate
+              value_type & result ) const
+  {
+    nd_ic_type select ;
+    nd_rc_type tune ;
+    
+    // split coordinate into integral and remainder parts
+    
+    split ( c , select , tune ) ;
+    
+    // we now have the split coordinate in 'select'
+    // and 'tune', and we delegate to the next level:
+
+    eval ( select , tune , result ) ;
+  }
+  
+  /// variant taking a plain rc_type for a coordinate. only for 1D splines!
+  /// This is to avoid hard-to-track errors which might ensue from allowing
+  /// broadcasting of single rc_types to nd_rc_types for D>1. We convert the
+  /// single coordinate (of rc_type) to an aggregate of 1 to fit the signature
+  /// of the nD code above, which will then be optimized away again by the
+  /// compiler.
+  
+  void eval ( const rc_type& c , // single 1D coordinate
+              value_type & result ) const
+  {
+    static_assert ( dimension == 1 ,
+                    "evaluation at a single real coordinate is only allowed for 1D splines" ) ;
+    nd_rc_type cc ( c ) ;
+    eval ( cc , result ) ;
+  }
+
+  /// variant of the above routine returning the result instead of depositing it
+  /// in a reference
+  
+  value_type eval ( const rc_type& c ) const
+  {
+    static_assert ( dimension == 1 ,
+                    "evaluation at a single real coordinate is only allowed for 1D splines" ) ;
+    nd_rc_type cc ( c ) ;
+    value_type result ;
+    eval ( cc , result ) ;
+    return result ;
+  }
+
+  /// alternative implementation of the last part of evaluation. Here, we calculate the
+  /// outer product of the component vectors in 'weight' and use flat_eval to obtain the
+  /// weighted sum over the coefficient window.
+
+  /// evaluation using the outer product of the weight component vectors. This is simple,
+  /// no need for recursion here, it's simply the sum of all values in the window, weighted
+  /// with their corresponding weights. note that there are no checks; if there aren't enough
+  /// weights the code may crash. Note that this code is formulated as a template and can be
+  /// used both for vectorized and unvectorized operation.
+
+  template < class dtype , class weight_type >
+  void flat_eval ( const dtype * const & pdata ,
+                   const weight_type * const pweight ,
+                   value_type & result ) const
+  {
+    result = pweight[0] * pdata[offsets[0]] ;
+    for ( int i = 1 ; i < window_size ; i++ )
+      result += pweight[i] * pdata[offsets[i]] ;
+  }
+
+  /// calculation of the outer product of the component vectors of the weights:
+
+  template < int _level , class dtype >
+  struct outer_product
+  {
+    void operator() ( dtype * & target ,
+                      const MultiArrayView < 2 , dtype > & weight ,
+                      dtype factor
+                    ) const
+    {
+      if ( _level == level ) // both are template args, this conditional has no runtime cost
+      {
+        // if _level == level this is the outermost call, and factor is certainly
+        // 1.0. hence we can omit it:
+        for ( int i = 0 ; i < weight.shape(0) ; i++ )
+        {
+          outer_product < _level - 1 , dtype >() ( target ,
+                                                    weight ,
+                                                    weight [ Shape2 ( i , level ) ] ) ;
+        }
+      }
+      else
+      {
+        // otherwise we need it.
+        for ( int i = 0 ; i < weight.shape(0) ; i++ )
+        {
+          outer_product < _level - 1 , dtype >() ( target ,
+                                                    weight ,
+                                                    weight [ Shape2 ( i , level ) ] * factor ) ;
+        }
+      }
+    }
+  } ;
+  
+  template < class dtype >
+  struct outer_product < 0 , dtype >
+  {
+    void operator() ( dtype * & target ,
+                      const MultiArrayView < 2 , dtype > & weight ,
+                      dtype factor
+                    ) const
+    {
+     if ( level == 0 )
+      {
+        // if level == 0, this is a 1D scenario, and all that is left is copying the weights
+        // verbatim to target. TODO: for this special case (only occuring in 1D)
+        // we could avoid this copy and use the data from 'weight' directly
+        for ( int i = 0 ; i < weight.shape(0) ; i++ )
+        {
+          *target = weight [ Shape2 ( i , 0 ) ] ;
+          ++target ;
+        }
+      }
+      else
+      {
+        for ( int i = 0 ; i < weight.shape(0) ; i++ )
+        {
+          *target = weight [ Shape2 ( i , 0 ) ] * factor ;
+          ++target ;
+        }
+      }
+    }
+  } ;
+
+  /// evaluation variant which first calculates the final weights and then applies them
+  /// using a simple summation loop. currently unused, since it is slower, but it can be
+  /// used instead of eval() with the same signature.
+  
+  void flat_eval ( const ic_type & select ,
+                   const MultiArrayView < 2 , ele_type > & weight ,
+                   value_type & result ) const
+  {
+    // get a pointer to the coefficient window's beginning
+    const value_type * base = coefficients.data() + select ;
+    // prepare space for the multiplied-out weights
+    ele_type flat_weight [ window_size ] ;
+    // we need an instance of a pointer to these weights here, passing it in
+    // per reference to be manipulated by the code it's passed to
+    ele_type * iter = flat_weight ;
+    // now we multiply out the weights using outer_product()
+    outer_product < level , ele_type >() ( iter , weight , 1.0 ) ;
+    // finally we delegate to flat_eval above
+    flat_eval ( base , flat_weight , result ) ;
+  }
+
+#ifdef USE_VC
+
+  /// vectorized version of _eval. This works just about the same way as _eval, with the
+  /// only difference being the innner loop over the channels, which is necessary because
+  /// in the vector code we can't code for vectors of, say, pixels, but only for vectors of
+  /// elementary types, like float.
+  /// to operate with vsize values synchronously, we need a bit more indexing than in the
+  /// non-vectorized version. the second parameter, origin, constitutes a gather operand
+  /// which, applied to the base adress, handles a set of windows to be processed in parallel.
+  /// if the gather operation is repeated with offsetted base addresses, the result vector is
+  /// built in the same way as the single result value in the unvectorized code above.
+  /// note that the vectorized routine couldn't function like this if it were to
+  /// evaluate unbraced splines: it relies on the bracing and can't do without it, because
+  /// it works with a fixed sequence of offsets, whereas the evaluation of an unbraced spline
+  /// would use a different offset sequence for values affected by the boundary condition.
+
+  typedef typename base_type::out_ele_v ele_v ;
+  typedef typename base_type::out_v mc_ele_v ;
+  typedef typename vector_traits < nd_ic_type , vsize > :: type nd_ic_v ;
+  typedef typename vector_traits < nd_rc_type , vsize > :: type nd_rc_v ;
+  
+  template < class dtype , int level >
+  struct _v_eval
+  {
+    dtype operator() ( const component_base_type& base , ///< base adresses of components
+                       const ic_v& origin ,              ///< offsets to evaluation window origins
+                       offset_iterator & ofs ,           ///< offsets to coefficients inside this window
+                       const MultiArrayView < 2 , ele_v > & weight ) const ///< weights to apply
+    {
+      dtype sum = dtype() ;    ///< to accumulate the result
+      dtype subsum ; ///< to pick up the result of the recursive call
+
+      for ( int i = 0 ; i < weight.shape ( 0 ) ; i++ )
+      {
+        subsum = _v_eval < dtype , level - 1 >() ( base , origin , ofs , weight );
+        for ( int ch = 0 ; ch < channels ; ch++ )
+        {
+          sum[ch] += weight [ Shape2 ( i , level ) ] * subsum[ch] ;
+        }
+      }
+      return sum ;
+    }  
+  } ;
+
+  /// the level 0 routine terminates the recursion
+  
+  template < class dtype >
+  struct _v_eval < dtype , 0 >
+  {
+    dtype operator() ( const component_base_type& base , ///< base adresses of components
+                       const ic_v& origin ,              ///< offsets to evaluation window origins
+                       offset_iterator & ofs ,           ///< offsets to coefficients in this window
+                       const MultiArrayView < 2 , ele_v > & weight ) const ///< weights to apply
+    {
+      dtype sum = dtype() ;
+
+      for ( int i = 0 ; i < weight.shape ( 0 ) ; i++ )
+      {
+        for ( int ch = 0 ; ch < channels ; ch++ )
+        {
+          sum[ch] += weight [ Shape2 ( i , 0 ) ] * ele_v ( base[ch] , origin + *ofs ) ;
+        }
+        ++ofs ;
+      }
+      return sum ;
+    }  
+  } ;
+
+  // for linear (degree=1) interpolation we use a specialized routine
+  // which is slightly faster, because it directly uses the 'tune' values
+  // instead of building an array of weights first and passing that in.
+
+  template < class dtype , int level >
+  struct _v_eval_linear
+  {
+    dtype operator() ( const component_base_type& base , // base adresses of components
+                       const ic_v& origin ,        // offsets to evaluation window origins
+                       offset_iterator & ofs ,     // offsets to coefficients inside this window
+                       const in_v& tune ) const       // weights to apply
+    {
+      dtype sum ;    ///< to accumulate the result
+      dtype subsum ;
+
+      sum = _v_eval_linear < dtype , level - 1 >() ( base , origin , ofs , tune ) ;
+      for ( int ch = 0 ; ch < channels ; ch++ )
+      {
+        sum[ch] *= ( rc_type ( 1.0 ) - tune [ level ] ) ;
+      }
+      
+      subsum = _v_eval_linear < dtype , level - 1 >() ( base , origin , ofs , tune );
+      for ( int ch = 0 ; ch < channels ; ch++ )
+      {
+        sum[ch] += ( tune [ level ] ) * subsum[ch] ;
+      }
+      
+      return sum ;
+    }  
+  } ;
+
+  /// the level 0 routine terminates the recursion
+  
+  template < class dtype >
+  struct _v_eval_linear < dtype , 0 >
+  {
+    dtype operator() ( const component_base_type& base , // base adresses of components
+                       const ic_v& origin ,              // offsets to evaluation window origins
+                       offset_iterator & ofs ,           // offsets to coefficients in this window
+                       const in_v& tune ) const       // weights to apply
+    {
+      dtype sum ;
+      auto o1 = *ofs ;
+      ++ofs ;
+      auto o2 = *ofs ;
+      ++ofs ;
+      
+      for ( int ch = 0 ; ch < channels ; ch++ )
+      {
+        sum[ch] = ( rc_type ( 1.0 ) - tune [ 0 ] )
+                  * ele_v ( base[ch] , origin + o1 ) ;
+                  
+        sum[ch] += tune [ 0 ]
+                   * ele_v ( base[ch] , origin + o2 ) ; 
+      }
+      
+      
+      return sum ;
+    }  
+  } ;
+
+  // vectorized variants of the evaluation routines:
+  
+  /// this is the vectorized version of the final delegate, calling _v_eval. The first
+  /// argument is a vector of offsets to windows of coefficients, the vectorized equivalent
+  /// of the single offset in the unvectorized routine.
+  /// The second argument is a 2D array of vecorized weights, the third a reference to
+  /// the result.
+  
+  void eval ( const ic_v& select ,  // offsets to lower corners of the subarrays
+              const MultiArrayView < 2 , ele_v > & weight , // vectorized weights
+              out_v & result ) const
+  {
+    // we need an instance of this iterator because it's passed into _v_eval by reference
+    // and manipulated by the code there:
+    
+    offset_iterator ofs = component_offsets.begin() ;
+    
+    // now we can call the recursive _v_eval routine yielding the result
+    
+    result = _v_eval < out_v , level >() ( component_base , select , ofs , weight ) ;
+  }
+
+  /// cdeval starts from a set of offsets to coefficient windows, so here
+  /// the nD integral indices to the coefficient windows have already been 'condensed'
+  /// into 1D offsets into the coefficient array's memory.
+  /// KFJ 2017-08-20 I'd like tto change the code so that I can directly operate
+  /// on 1D data instead of having to use TinyVector<T,1>. For this case, this
+  /// eval overload and the next would have the same signature, so I changed the
+  /// name to cdeval for the time being. cdeval might be factored into the next eval
+  /// since I think it's not really necessary as a separate routine.
+  /// Here we have the specializations affected by the template argument 'specialize'
+  /// which activates more efficient code for degree 0 (nearest neighbour) and degree 1
+  /// (linear interpolation) splines. I draw the line here; one might add further
+  /// specializations, but from degree 2 onwards the weights are reused several times
+  /// so looking them up in a small table (as the general-purpose code for unspecialized
+  /// operation does) should be more efficient (TODO test).
+  
+  /// we have three variants, depending on 'specialize'. first is the specialization
+  /// for nearest-neighbour interpolation, which doesn't delegate further, since the
+  /// result can be obtained directly by gathering from the coefficients:
+  
+  void cdeval ( const ic_v& select ,  // offsets to coefficient windows
+                const in_v& tune ,    // fractional parts of the coordinates
+                out_v & result ,      // target
+                std::integral_constant < int , 0 > ) const
+  {
+    // nearest neighbour. here, no weights are needed and we can instantly
+    // gather the data from the coefficient array.
+
+    for ( int ch = 0 ; ch < channels ; ch++ )
+      result[ch].gather ( component_base[ch] , select ) ;
+  }
+  
+  /// linear interpolation. this uses a specialized _v_eval object which
+  /// directly processes 'tune' instead of creating an array of weights first.
+
+  void cdeval ( const ic_v& select ,  // offsets to coefficient windows
+                const in_v& tune ,    // fractional parts of the coordinates
+                out_v & result ,      // target
+                std::integral_constant < int , 1 > ) const
+  {
+    offset_iterator ofs = component_offsets.begin() ;
+  
+    result = _v_eval_linear < out_v , level >()
+              ( component_base , select , ofs , tune ) ;
+   }
+  
+  /// finally, the general uniform b-spline evaluation.
+  /// Passing any number apart from 0 or 1 as 'specialize' template argument results
+  /// in the use of this general-purpose code, which can also handle degree 0 and 1
+  /// splines, albeit less efficiently.
+
+  template < int anything >
+  void cdeval ( const ic_v& select ,  // offsets to coefficient windows
+                const in_v& tune ,    // fractional parts of the coordinates
+                out_v & result ,      // target
+                std::integral_constant < int , anything > ) const
+  {
+    MultiArray < 2 , ele_v > weight ( Shape2 ( spline_order , dimension ) ) ;
+
+    // obtain_weights is the same code as for unvectorized operation, the arguments
+    // suffice to pick the right template arguments
+  
+    obtain_weights ( weight , tune ) ;
+    
+    // having obtained the weights, we delegate to the final delegate.
+    
+    eval ( select , weight , result ) ;
+  }
+
+  /// here we transform incoming nD coordinates into offsets into the coefficient
+  /// array's memory. In my experiments I found that switching from nD indices
+  /// to offsets is best done sooner rather than later, even though one  might suspect
+  /// that simply passing on the reference to the nD index and converting it 'further down'
+  /// shouldn't make much difference. Probably it's easier for the optimizer to
+  /// see the conversion closer to the emergence of the nD index from the nD real
+  /// coordinate coming in.
+  /// note that we use both the strides and offsets appropriate for an expanded array,
+  /// and component_base has pointers to the elementary type.
+
+  void eval ( const nd_ic_v & select , // nD coordinates to coefficient windows
+              const nd_rc_v & tune ,   // fractional parts of the coordinates
+              out_v & result ) const
+  {
+    // condense the nD index into an offset
+    ic_v origin = select[0] * ic_type ( expanded_stride [ 0 ] ) ;
+    for ( int d = 1 ; d < dimension ; d++ )
+      origin += select[d] * ic_type ( expanded_stride [ d ] ) ;
+    
+    // pass on to overload taking the offset, dispatching on 'specialize'
+    cdeval ( origin , tune , result , specialize() ) ;
+  }
+
+  /// This variant of eval() works directly on vector data (of unsplit coordinates)
+  /// This burdens the calling code with (de)interleaving the data. But often the calling
+  /// code performs a traversal of a large body of data and is therefore in a better position
+  /// to perform the (de)interleaving e.g. by a gather/scatter operation, or already receives
+  /// the data in simdized form. This latter case is actually quite common, because 'real'
+  /// applications will rarely use class evaluator directly. Instead, the evaluator will usually
+  /// be used as some 'inner' component of another functor, which is precisely why class
+  /// evaluator is implemented as a 'pure' functor, containing only state fixed at construction.
+  /// So whatever deinterleaving and preprocessing and postprocessing and reinterleaving
+  /// may be performed 'outside' is irrelevant here where we receive SIMD data only.
+  /// In this routine, we make the move from incoming real coordinates to separate
+  /// nD integral indices and fractional parts.
+  
+  void eval ( const in_v & input ,    // number of dimensions * coordinate vectors
+              out_v & result )  const // number of channels * value vectors
+  {
+    nd_ic_v select ;
+    in_v tune ;
+
+    // split the coordinate into integral and an remainder part
+    
+    split ( input , select , tune ) ;
+    
+    // delegate to eval with split coordinates
+
+    eval ( select , tune , result ) ;
+  }
+
+  void eval ( const in_v & input ,        // number of dimensions * coordinate vectors
+              out_ele_v & result )  const // single value vector
+  {
+    static_assert ( dim_out == 1 ,
+                    "this evaluation routine is for single-channel data only" ) ;
+    out_v helper ;
+    eval ( input , helper ) ;
+    result = helper[0] ;
+  }
+
+  /// ditto, for 1D coordinates
+
+  void eval ( const in_ele_v & input , // single coordinate vector
+              out_v & result ) const   // simdized mD output
+  {
+    static_assert ( dim_in == 1 ,
+                    "this evaluation routine is for 1D coordinates only" ) ;
+    in_v helper ;
+    helper[0] = input ;
+    eval ( helper , result ) ;
+  }
+
+  void eval ( const in_ele_v & input ,    // single coordinate vector
+              out_ele_v & result )  const // single value vector
+  {
+    static_assert ( dim_in == 1 && dim_out == 1  ,
+                    "this evaluation routine is for single-channel data and 1D coordinates only") ;
+    in_v in_helper ;
+    in_helper[0] = input ;
+    out_v out_helper ;
+    eval ( in_helper , out_helper ) ;
+    result = out_helper[0] ;
+  }
+
+#endif // USE_VC
+
+  ~evaluator_policy()
+  {
+    // we don't want a memory leak!
+    for ( int d = 0 ; d < dimension ; d++ )
+    {
+      if ( fweight[d] != &wfd0 )
+        delete fweight[d] ;
+    }
+  }
+} ;
+
+/// the definition for vspline::evaluator incorporates the policy class
+/// into a vspline::unary_functor:
+
+template < typename _coordinate_type , // nD real coordinate
+           typename _value_type ,      // type of coefficient/result
+#ifdef USE_VC
+             // nr. of vector elements
+           int _vsize = vspline::vector_traits < _value_type > :: size ,
+#else
+           int _vsize = 1 ,
+#endif
+           typename specialize = std::integral_constant<int,-1>
+         >
+using evaluator = unary_functor < _coordinate_type ,
+                                  _value_type ,
+                                  _vsize ,
+                                  evaluator_policy ,
+                                  specialize > ;
+
+} ; // end of namespace vspline
+
+#endif // VSPLINE_EVAL_H
diff --git a/example/channels.cc b/example/channels.cc
new file mode 100644
index 0000000..f81f777
--- /dev/null
+++ b/example/channels.cc
@@ -0,0 +1,150 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// channels.cc
+///
+/// demonstrates the use of 'channel views'
+/// This example is derived from 'slice.cc', we use the same volume
+/// as source data. But instead of producing an image output, we create
+/// three separate colour channels of the bspline object and assert that
+/// the evaluation of the channel views is identical with the evaluation
+/// of the 'mother' spline.
+///
+/// compile with:
+/// clang++ -std=c++11 -march=native -o channels -O3 -pthread -DUSE_VC channels.cc -lvigraimpex -lVc
+/// g++ also works.
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+
+int main ( int argc , char * argv[] )
+{
+  // pixel_type is the result type, an RGB float pixel
+  typedef vigra::TinyVector < float , 3 > pixel_type ;
+  
+  // voxel_type is the source data type
+  typedef vigra::TinyVector < float , 3 > voxel_type ;
+  
+  // coordinate_type has a 3D coordinate
+  typedef vigra::TinyVector < float , 3 > coordinate_type ;
+  
+  // warp_type is a 2D array of coordinates
+  typedef vigra::MultiArray < 2 , coordinate_type > warp_type ;
+  
+  // target_type is a 2D array of pixels  
+  typedef vigra::MultiArray < 2 , pixel_type > target_type ;
+  
+  // we want a b-spline with natural boundary conditions
+  vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ;
+  
+  // create quintic 3D b-spline object containing voxels
+  vspline::bspline < voxel_type , 3 >
+    space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ;
+
+  // here we create the channel view. Since these are merely views
+  // to the same data, no data will be copied, and it doesn't matter
+  // whether we create these views before or after prefiltering.
+
+  auto red_channel = space.get_channel_view ( 0 ) ;
+  auto green_channel = space.get_channel_view ( 1 ) ;
+  auto blue_channel = space.get_channel_view ( 2 ) ;
+
+  // fill the b-spline's core with a three-way gradient
+
+  for ( int z = 0 ; z < 10 ; z++ )
+  {
+    for ( int y = 0 ; y < 10 ; y++ )
+    {
+      for ( int x = 0 ; x < 10 ; x++ )
+      {
+        voxel_type & c ( space.core [ vigra::Shape3 ( x , y , z ) ] ) ;
+        c[0] = 25.5 * x ;
+        c[1] = 25.5 * y ;
+        c[2] = 25.5 * z ;
+      }
+    }
+  }
+  
+  // prefilter the b-spline
+  space.prefilter() ;
+  
+  // now make a warp array with 1920X1080 3D coordinates
+  warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+  
+  // we want the coordinates to follow this scheme:
+  // warp(x,y) = (x,1-x,y)
+  // scaled appropriately
+  
+  for ( int y = 0 ; y < 1080 ; y++ )
+  {
+    for ( int x = 0 ; x < 1920 ; x++ )
+    {
+      coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
+      c[0] = float ( x ) / 192.0 ;
+      c[1] = 10.0 - c[0] ;
+      c[2] = float ( y ) / 108.0 ;
+    }
+  }
+  
+  // get an evaluator for the b-spline
+
+  typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ;
+  ev_type ev ( space ) ;
+  
+  // the evaluators of the channel views have their own type:
+  
+  typedef vspline::evaluator < coordinate_type , float > ch_ev_type ;
+  
+  // we create the three evaluators for the three channel views
+
+  ch_ev_type red_ev ( red_channel ) ;
+  ch_ev_type green_ev ( green_channel ) ;
+  ch_ev_type blue_ev ( blue_channel ) ;
+
+  // and make sure the evaluation results match
+
+  for ( int y = 0 ; y < 1080 ; y++ )
+  {
+    for ( int x = 0 ; x < 1920 ; x++ )
+    {
+      coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
+      assert ( ev ( c ) [ 0 ] == red_ev ( c ) ) ;
+      assert ( ev ( c ) [ 1 ] == green_ev ( c ) ) ;
+      assert ( ev ( c ) [ 2 ] == blue_ev ( c ) ) ;
+    }
+  }
+
+  std::cout << "success" << std::endl ;
+  exit ( 0 ) ;
+}
diff --git a/example/complex.cc b/example/complex.cc
new file mode 100644
index 0000000..0a23f90
--- /dev/null
+++ b/example/complex.cc
@@ -0,0 +1,74 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// \file complex.cc
+///
+/// \brief demonstrate use of b-spline over std::complex data
+///
+/// vspline handles std::complex data like pairs of the complex
+/// type's value_type, and uses a vigra::TinyVector of two
+/// simdized value_types as the vectorized type.
+///
+/// compile: clang++ -std=c++11 -march=native -o complex -O3 -pthread -DUSE_VC complex.cc -lvigraimpex -lVc
+
+#include <iomanip>
+#include <assert.h>
+#include <complex>
+#include <vspline/multithread.h>
+#include <vspline/vspline.h>
+
+int main ( int argc , char * argv[] )
+{
+  vspline::bspline < std::complex < float > , 1 > bsp ( 100000 , 3 , vspline::MIRROR ) ;
+  auto v1 = bsp.core ;
+  v1 [ 50000 ] = std::complex<float> ( 1.0 + 1i ) ;
+  bsp.prefilter() ;
+
+  typedef vspline::evaluator < float , std::complex<float> > ev_type ;
+  
+  ev_type ev ( bsp ) ;
+  for ( float k = 49999.0 ; k < 50001.0 ; k += .1 )
+  {
+    std::cout << "ev(" << k << ") = " << ev(k) << std::endl ;
+  }
+  
+#ifdef USE_VC
+
+  for ( float k = 49999.0 ; k < 50001.0 ; k += .1 )
+  {
+    // feed the evaluator with vectors, just to show off. Note how the
+    // result appears as a vigra::TinyVector of two ev_type::out_v.
+    typename ev_type::in_v vk ( k ) ;
+    std::cout << "ev(" << vk << ") = " << ev(vk) << std::endl ;
+  }
+  
+#endif
+}
diff --git a/example/eval.cc b/example/eval.cc
new file mode 100644
index 0000000..ca6b11b
--- /dev/null
+++ b/example/eval.cc
@@ -0,0 +1,177 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// eval.cc
+///
+/// takes a set of knot point values from cin, calculates a 1D b-spline
+/// over them, and evaluates it at coordinates taken from cin.
+/// The output shows how the coordinate is split into integral and real
+/// part by the mapping used for the specified boundary condition
+/// and the result of evaluating the spline at this point.
+///
+/// compile: clang++ -std=c++11 -o eval -pthread eval.cc
+
+#include <vspline/vspline.h>
+#include <iomanip>
+
+using namespace std ;
+using namespace vigra ;
+using namespace vspline ;
+
+int main ( int argc , char * argv[] )
+{
+  // get the spline degree and boundary conditions from the console
+
+  cout << "enter spline degree: " ;
+  int spline_degree ;
+  cin >> spline_degree ;
+  
+  int bci = -1 ;
+  bc_code bc ;
+  
+  while ( bci < 1 || bci > 4 )
+  {
+    cout << "choose boundary condition" << endl ;
+    cout << "1) MIRROR" << endl ;
+    cout << "2) PERIODIC" << endl ;
+    cout << "3) REFLECT" << endl ;
+    cout << "4) NATURAL" << endl ;
+    cin >> bci ;
+  }
+  
+  switch ( bci )
+  {
+    case 1 :
+      bc = MIRROR ;
+      break ;
+    case 2 :
+      bc = PERIODIC ;
+      break ;
+    case 3 :
+      bc = REFLECT ;
+      break ;
+    case 4 :
+      bc = NATURAL ;
+      break ;
+  }
+  // put the BC code into a TinyVector
+  TinyVector < bc_code , 1 > bcv ( bc ) ;
+
+//   int mci = -1 ;
+//   map_code mc ;
+//   
+//   while ( mci < 1 || mci > 6 )
+//   {
+//     cout << "choose mapping mode" << endl ;
+//     cout << "1) MAP_MIRROR" << endl ;
+//     cout << "2) MAP_PERIODIC" << endl ;
+//     cout << "3) MAP_REFLECT" << endl ;
+//     cout << "4) MAP_LIMIT" << endl ;
+//     cout << "5) MAP_REJECT" << endl ;
+//     cout << "6) MAP_RAW" << endl ;
+//     cin >> mci ;
+//   }
+//   
+//   switch ( mci )
+//   {
+//     case 1 :
+//       mc = MAP_MIRROR ;
+//       break ;
+//     case 2 :
+//       mc = MAP_PERIODIC ;
+//       break ;
+//     case 3 :
+//       mc = MAP_REFLECT ;
+//       break ;
+//     case 4 :
+//       mc = MAP_LIMIT ;
+//       break ;
+//     case 5 :
+//       mc = MAP_REJECT ;
+//       break ;
+//     case 6 :
+//       mc = MAP_RAW ;
+//       break ;
+//   }
+//   // put the mapping code into a TinyVector
+//   TinyVector < map_code , 1 > mcv ( mc ) ;
+
+  TinyVector < int , 1 > deriv_spec ( 0 ) ;
+  // obtain knot point values
+
+  double v ;
+  std::vector<double> dv ;
+  cout << "enter knot point values (end with EOF)" << endl ;
+  while ( cin >> v )
+    dv.push_back ( v ) ;
+
+  cin.clear() ;
+  
+  // put the size into a TinyVector
+  TinyVector < int , 1 > shape ( dv.size() ) ;
+  
+  // fix the type for the bspline object
+  typedef bspline < double , 1 > spline_type ;
+  spline_type bsp  ( shape , spline_degree , bcv ) ; // , EXPLICIT ) ;
+  cout << "created bspline object:" << endl << bsp << endl ;
+
+  // fill the data into the spline's 'core' area
+  for ( size_t i = 0 ; i < dv.size() ; i++ )
+    bsp.core[i] = dv[i] ;
+
+  // prefilter the data
+  bsp.prefilter() ;
+  
+  cout << fixed << showpoint << setprecision(12) ;
+  cout << "spline coefficients (with frame)" << endl ;
+  for ( auto& coeff : bsp.container )
+    cout << " " << coeff << endl ;
+
+  // fix the type for the evaluator and create it
+  typedef evaluator < double , double > eval_type ;
+  eval_type ev ( bsp , deriv_spec ) ; // , mcv ) ;
+//   auto map = ev.get_mapping() ;
+  int ic ;
+  double rc ;
+
+  cout << "enter coordinates to evaluate (end with EOF)" << endl ;
+  while ( ! cin.eof() )
+  {
+    // get a coordinate
+    cin >> v ;
+    // evaluate it
+    double res = ev ( v ) ;
+    // apply the mapping to the coordinate to output that as well
+    ev.split ( v , ic , rc ) ;
+
+    cout << v << " -> ( " << ic << " , " << rc << " ) -> " << res << endl ;
+  }
+}
diff --git a/example/gradient.cc b/example/gradient.cc
new file mode 100644
index 0000000..e7ef513
--- /dev/null
+++ b/example/gradient.cc
@@ -0,0 +1,116 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// gradient.cc
+///
+/// If we create a b-spline over an array containing, at each grid point,
+/// the sum of the grid point's coordinates, each 1D row, column, etc will
+/// hold a linear gradient with first derivative == 1. If we use NATURAL
+/// BCs, evaluating the spline with real coordinates anywhere inside the
+/// defined range should produce precisely the sum of the coordinates.
+/// This is a good test for both the precision of the evaluation and it's
+/// correct functioning, particularly with higher-D arrays.
+///
+/// compile: clang++ -O3 -DUSE_VC -march=native -std=c++11 -pthread -o gradient gradient.cc -lVc
+
+#include <vspline/vspline.h>
+#include <random>
+
+using namespace std ;
+
+int main ( int argc , char * argv[] )
+{
+  typedef vspline::bspline < double , 3 > spline_type ;
+  typedef typename spline_type::shape_type shape_type ;
+  typedef typename spline_type::view_type view_type ;
+  typedef typename spline_type::bcv_type bcv_type ;
+  
+  // let's have a knot point array with nicely odd shape
+
+  shape_type core_shape = { 35 , 43 , 19 } ;
+  
+  // we have to use a longish call to the constructor since we want to pass
+  // 0.0 to 'tolerance' and it's way down in the argument list, so we have to
+  // explicitly pass a few arguments which usually take default values before
+  // we have a chance to pass the tolerance
+
+  spline_type bspl ( core_shape ,                    // shape of knot point array
+                     3 ,                             // cubic b-spline
+                     bcv_type ( vspline::NATURAL ) , // natural boundary conditions
+                     vspline::BRACED ,               // implicit scheme, bracing coeffs
+                     -1 ,                            // default, not using EXPLICIT
+                     0.0 ) ;                         // tolerance 0.0 for this test!
+
+  // get a view to the bspline's core, to fill it with data
+
+  view_type core = bspl.core ;
+  
+  // create the gradient in each dimension
+
+  for ( int d = 0 ; d < bspl.dimension ; d++ )
+  {
+    for ( int c = 0 ; c < core_shape[d] ; c++ )
+      core.bindAt ( d , c ) += c ;
+  }
+  
+  // now prefilter the spline
+
+  bspl.prefilter() ;
+
+  // set up an evaluator
+
+  typedef vigra::TinyVector < double , 3 > coordinate_type ;
+  typedef vspline::evaluator < coordinate_type , double > evaluator_type ;
+  
+  evaluator_type ev ( bspl ) ;
+  
+  // we want to bombard the evaluator with random in-range coordinates
+  
+  std::random_device rd;
+  std::mt19937 gen(rd());
+  // std::mt19937 gen(12345);   // fix starting value for reproducibility
+
+  coordinate_type c ;
+  
+  // here comes our test, feed 100 random 3D coordinates and compare the
+  // evaluator's result with the expected value, which is precisely the
+  // sum of the coordinate's components
+
+  for ( int times = 0 ; times < 100 ; times++ )
+  {
+    for ( int d = 0 ; d < bspl.dimension ; d++ )
+      c[d] = ( core_shape[d] - 1 ) * std::generate_canonical<double, 20>(gen) ;
+    double result = ev ( c ) ;
+    double delta = result - sum ( c ) ;
+
+    cout << "eval(" << c << ") = " << result << " -> delta = " << delta << endl ;
+  }
+}
diff --git a/example/gsm.cc b/example/gsm.cc
new file mode 100644
index 0000000..b237130
--- /dev/null
+++ b/example/gsm.cc
@@ -0,0 +1,138 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// gsm.cc
+///
+/// implementation of gsm.cc, performing the calculation of the
+/// gradient squared magnitude in a loop using two evaluators for
+/// the two derivatives, adding the squared magnitudes and writing
+/// the result to an image file
+///
+/// compile with:
+/// clang++ -std=c++11 -march=native -o gsm -O3 -pthread -DUSE_VC gsm.cc -lvigraimpex -lVc
+///
+/// invoke passing an image file. the result will be written to 'gsm.tif'
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+
+#ifdef USE_VC
+const int VSIZE = vspline::vector_traits < float > :: size ;
+#else
+const int VSIZE = 1 ;
+#endif
+
+// we silently assume we have a colour image
+typedef vigra::RGBValue<float,0,1,2> pixel_type; 
+
+// coordinate_type has a 2D coordinate
+typedef vigra::TinyVector < float , 2 > coordinate_type ;
+
+// target_type is a 2D array of pixels  
+typedef vigra::MultiArray < 2 , pixel_type > target_type ;
+
+// b-spline evaluator producing float pixels
+typedef vspline::evaluator < coordinate_type , // incoming coordinate's type
+                             pixel_type ,      // singular result data type
+                             VSIZE             // vector size
+                           > ev_type ;
+
+int main ( int argc , char * argv[] )
+{
+  vigra::ImageImportInfo imageInfo ( argv[1] ) ;
+
+  // we want a b-spline with natural boundary conditions
+  vigra::TinyVector < vspline::bc_code , 2 > bcv ( vspline::NATURAL ) ;
+  
+  // create cubic 2D b-spline object containing the image data
+  vspline::bspline < pixel_type , 2 > bspl ( imageInfo.shape() , 3 , bcv ) ;
+  
+  // load the image data into the b-spline's core
+  vigra::importImage ( imageInfo , bspl.core ) ;
+  
+  // prefilter the b-spline
+  bspl.prefilter() ;
+  
+  // we create two evaluators for the b-spline, one for the horizontal and
+  // one for the vertical gradient. The derivatives for a b-spline are requested
+  // by passing a TinyVector with as many elements as the spline's dimension
+  // with the desired derivative degree for each dimension. Here we want the
+  // first derivative in x and y direction:
+  
+  const vigra::TinyVector < float , 2 > dx1_spec { 1 , 0 } ;
+  const vigra::TinyVector < float , 2 > dy1_spec { 0 , 1 } ;
+  
+  // we pass the derivative specifications to the two evaluators' constructors
+  
+  ev_type xev ( bspl , dx1_spec ) ;
+  ev_type yev ( bspl , dy1_spec ) ;
+  
+  // this is where the result should go:
+
+  target_type target ( imageInfo.shape() ) ;
+
+  // quick-shot solution, iterating in a loop, not vectorized
+  
+  auto start = vigra::createCoupledIterator ( target ) ;
+  auto end = start.getEndIterator() ;
+  
+  for ( auto it = start ; it < end ; ++it )
+  {
+    // we fetch the discrete coordinate from the coupled iterator
+    // and instantiate a coordinate_type from it. Note that we can't pass
+    // the discrete coordinate directly to the evaluator's operator()
+    // because this fails to be disambiguated.
+    
+    coordinate_type crd ( it.get<0>() ) ;
+    
+    // now we get the two gradients by evaluating the gradient evaluators
+    // at the given coordinate
+    
+    pixel_type dx = xev ( crd ) ;
+    pixel_type dy = yev ( crd ) ;
+    
+    // and conclude by writing the sum of the squared gradients to target
+
+    it.get<1>() = dx * dx + dy * dy ;
+  }
+  
+  // store the result with vigra impex
+  vigra::ImageExportInfo eximageInfo ( "gsm.tif" );
+  
+  vigra::exportImage ( target ,
+                       eximageInfo
+                       .setPixelType("UINT8") ) ;
+  
+  exit ( 0 ) ;
+}
diff --git a/example/gsm2.cc b/example/gsm2.cc
new file mode 100644
index 0000000..7f43107
--- /dev/null
+++ b/example/gsm2.cc
@@ -0,0 +1,197 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// gsm2.cc
+///
+/// alternative implementation of gsm.cc, performing the calculation of the
+/// gradient squared magnitude with a functor and index_remap, which is faster since
+/// the whole operation is multithreaded and potentially vectorized.
+///
+/// compile with:
+/// clang++ -std=c++11 -march=native -o gsm -O3 -pthread -DUSE_VC gsm.cc -lvigraimpex -lVc
+///
+/// invoke passing an image file. the result will be written to 'gsm2.tif'
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+
+#ifdef USE_VC
+const int VSIZE = vspline::vector_traits < float > :: size ;
+#else
+const int VSIZE = 1 ;
+#endif
+
+// we silently assume we have a colour image
+typedef vigra::RGBValue<float,0,1,2> pixel_type; 
+
+// coordinate_type has a 2D coordinate
+typedef vigra::TinyVector < float , 2 > coordinate_type ;
+
+// type of b-spline object
+typedef vspline::bspline < pixel_type , 2 > spline_type ;
+
+// target_type is a 2D array of pixels  
+typedef vigra::MultiArray < 2 , pixel_type > target_type ;
+
+// b-spline evaluator producing float pixels
+typedef vspline::evaluator < coordinate_type , // incoming coordinate's type
+                             pixel_type ,      // singular result data type
+                             VSIZE             // vector size
+                           > ev_type ;
+
+/// we build a vspline::unary_functor which calculates the sum of gradient squared
+/// magnitudes. The code here isn't generic, for a general-purpose gsm evaluator
+/// one would need a bit more of a coding effort, but we want to demonstrate
+/// the principle here.
+/// Note how the 'compound evaluator' we construct follows a pattern of
+/// - derive from vspline::unary_functor
+/// - keep const references to 'inner' types
+/// - pass these in the constructor, yielding a 'pure' functor
+/// - if the vector code is identical to the unvectorized code, implement
+///   eval() with a template                      
+
+template < typename coordinate_type ,
+           typename pixel_type ,
+           int vsize >
+struct gsm_policy
+{
+  // we create two evaluators for the b-spline, one for the horizontal and
+  // one for the vertical gradient. The derivatives for a b-spline are requested
+  // by passing a TinyVector with as many elements as the spline's dimension
+  // with the desired derivative degree for each dimension. Here we want the
+  // first derivative in x and y direction:
+
+  const vigra::TinyVector < float , 2 > dx1_spec { 1 , 0 } ;
+  const vigra::TinyVector < float , 2 > dy1_spec { 0 , 1 } ;
+
+  // we keep two 'inner' evaluators, one for each direction
+  
+  const ev_type xev , yev ;
+  
+  // which are initialized in the constructor, using the bspline and the
+  // derivative specifiers
+  
+  gsm_policy ( const spline_type & bspl )
+  : xev ( bspl , dx1_spec ) ,
+    yev ( bspl , dy1_spec )
+  { } ;
+  
+  // since the code is the same for vectorized and unvectorized
+  // operation, we can write a template:
+  
+  template < class IN , class OUT >
+  void eval ( const IN & c ,
+                    OUT & result ) const
+  {
+    auto dx = xev ( c ) ; // get the gradient in x direction
+    auto dy = yev ( c ) ; // get the gradient in y direction
+    
+    // TODO: really, we'd like to write:
+    // result = dx * dx + dy * dy ;
+    // but fail due to problems with type inference, so we need to be
+    // a bit more explicit:
+    
+    dx *= dx ;            // square the gradients
+    dy *= dy ;
+    dx += dy ;
+    
+    result = dx ;         // assign to result
+  } 
+  
+} ;
+
+typedef vspline::unary_functor < coordinate_type ,
+                                 pixel_type ,
+                                 VSIZE ,
+                                 gsm_policy > ev_gsm ;
+                                 
+int main ( int argc , char * argv[] )
+{
+  // get the image file name
+  
+  vigra::ImageImportInfo imageInfo ( argv[1] ) ;
+
+  // we want a b-spline with natural boundary conditions
+  
+  vigra::TinyVector < vspline::bc_code , 2 > bcv ( vspline::NATURAL ) ;
+  
+  // create cubic 2D b-spline object containing the image data
+  
+  spline_type bspl ( imageInfo.shape() , // the shape of the data for the spline
+                     3 ,                 // degree 3 == cubic spline
+                     bcv                 // specifies natural BCs along both axes
+                   ) ;
+  
+  // load the image data into the b-spline's core. This is a common idiom:
+  // the spline's 'core' is a MultiArrayView to that part of the spline's
+  // data container which is meant to hold the input data. This saves loading
+  // the image to some memory first and then transferring the data into
+  // the spline. Since the core is a vigra::MultiarrayView, we can pass it
+  // to importImage as the desired target for loading the image from disk.
+                   
+  vigra::importImage ( imageInfo , bspl.core ) ;
+  
+  // prefilter the b-spline
+
+  bspl.prefilter() ;
+  
+  // now we can construct the gsm evaluator
+  
+  ev_gsm ev ( bspl ) ;
+  
+  // this is where the result should go:
+  
+  target_type target ( imageInfo.shape() ) ;
+
+  // now we obtain the result by performing an index_remap. index_remap
+  // successively passes discrete indices into the target to the evaluator
+  // it's invoked with, storing the result of the evaluator's evaluation
+  // at the self-same coordinates. This is done multithreaded and vectorized
+  // automatically, so it's very convenient, if the evaluator is at hand.
+  // So here we have invested moderately more coding effort in the evaluator
+  // and are rewarded with being able to use the evaluator with vspline's
+  // high-level code for a very fast implementation of our gsm problem.
+  
+  vspline::index_remap < ev_gsm > ( ev , target ) ;
+
+  // store the result with vigra impex
+
+  vigra::ImageExportInfo eximageInfo ( "gsm2.tif" );
+  
+  vigra::exportImage ( target ,
+                       eximageInfo
+                       .setPixelType("UINT8") ) ;
+  
+  exit ( 0 ) ;
+}
diff --git a/example/impulse_response.cc b/example/impulse_response.cc
new file mode 100644
index 0000000..12d5e5a
--- /dev/null
+++ b/example/impulse_response.cc
@@ -0,0 +1,134 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// \file impulse_response.cc
+///
+/// \brief get the impulse response of a b-spline prefilter
+/// 
+/// filter a unit pulse with a b-spline prefilter of a given degree
+/// and display the central section of the result
+///
+/// compile with:
+/// g++ -std=c++11 -o impulse_response -O3 -pthread -DUSE_VC=1 impulse_response.cc -lVc
+///
+/// to get the central section with values beyond +/- 0.0042 of a degree 5 b-spline:
+///
+/// impulse_response 5 .0042
+///
+/// producing this output:
+/// 
+/// double ir_5[] = {
+/// -0.0084918610197410 ,
+/// +0.0197222540252632 ,
+/// -0.0458040841925519 ,
+/// +0.1063780046433000 ,
+/// -0.2470419274022756 ,
+/// +0.5733258709616592 ,
+/// -1.3217294729875093 ,
+/// +2.8421709220216247 ,
+/// -1.3217294729875098 ,
+/// +0.5733258709616593 ,
+/// -0.2470419274022757 ,
+/// +0.1063780046433000 ,
+/// -0.0458040841925519 ,
+/// +0.0197222540252632 ,
+/// -0.0084918610197410 ,
+///  } ;
+///
+/// which, when used as a convolution kernel, will have the same effect on a signal
+/// as applying the recursive filter itself, but with lessened precision due to windowing.
+///
+/// note how three different ways of getting the result are given, the variants
+/// using lower-level access to the filter are commented out.
+///
+/// The array size used here may seem overly large, but this program also serves as
+/// a test for prefiltering 1D arrays with 'fake 2D processing' which only occurs
+/// with large 1D arrays, see filter.h for more on the topic.
+
+#include <iomanip>
+#include <assert.h>
+#include <vspline/multithread.h>
+#include <vspline/vspline.h>
+
+int main ( int argc , char * argv[] )
+{
+  int degree = std::atoi ( argv[1] ) ;
+  double cutoff = std::atof ( argv[2] ) ;
+  
+  assert ( degree >= 0 && degree < 25 ) ;
+  
+  int npoles = degree / 2 ;
+  const double * poles = vspline_constants::precomputed_poles [ degree ] ;
+
+// using the highest-level access to prefiltering, we code:
+
+  vspline::bspline < double , 1 > bsp ( 100001 , degree , vspline::MIRROR ) ;
+  auto v1 = bsp.core ;
+  v1 [ 50000 ] = 1.0 ;
+  bsp.prefilter() ;
+
+// using slightly lower-level access to the prefiltering code, we could achieve
+// the same result with:
+// 
+//   typedef vigra::MultiArray < 1 , double > array_t ;
+//   vigra::TinyVector < vspline::bc_code , 1 > bcv ;
+//   bcv[0] = vspline::MIRROR ;
+//   array_t v1 ( 100001 ) ;
+//   v1[50000] = 1.0 ;
+//   vspline::solve < array_t , array_t , double >
+//     ( v1 , v1 , bcv , degree , 0.000000000001 ) ;
+  
+// and, going yet one level lower, this code also produces the same result:
+  
+//   vigra::MultiArray < 1 , double > v1 ( 100001 ) ;
+//   v1[50000] = 1.0 ;
+//   typedef decltype ( v1.begin() ) iter_type ;
+//   vspline::filter < iter_type , iter_type , double >
+//     f ( v1.size() ,
+//         vspline::overall_gain ( npoles , poles ) ,
+//         vspline::MIRROR ,
+//         npoles ,
+//         poles ,
+//         0.000000000001 ) ;
+//   f.solve ( v1.begin() ) ;
+        
+  std::cout << "double ir_" << degree << "[] = {" << std::endl ;
+  std::cout << std::fixed << std::showpos << std::showpoint << std::setprecision(16);
+
+  for ( int k = 0 ; k < 100001 ; k++ )
+  {
+    if ( fabs ( v1[k] ) > cutoff )
+    {
+      std::cout << v1[k] << " ," << std::endl ;
+    }
+  }
+  std::cout << "} ;" << std::endl ;
+}
diff --git a/example/roundtrip.cc b/example/roundtrip.cc
new file mode 100644
index 0000000..137b969
--- /dev/null
+++ b/example/roundtrip.cc
@@ -0,0 +1,393 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// roundtrip.cc
+///
+/// load an image, create a b-spline from it, and restore the original data,
+/// both by normal evaluation and by convolution with the reconstruction kernel.
+/// all of this is done 16 times each with different boundary conditions,
+/// spline degrees and in float and double arithmetic, the processing times
+/// and differences between input and restored signal are printed to cout.
+///
+/// obviously, this is not a useful program, it's to make sure the engine functions as
+/// intended and all combinations of float and double as values and coordinates compile
+/// and function as intended, also giving an impression of the speed of processing.
+/// On my system I can see here that vectorization with double values doesn't perform
+/// better than the unvectorized code.
+///
+/// compile:
+/// clang++ -std=c++11 -march=native -o roundtrip -O3 -pthread -DUSE_VC=1 roundtrip.cc -lvigraimpex -lVc
+///
+/// invoke: roundtrip <image file>
+///
+/// there is no image output.
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+#include <vigra/accumulator.hxx>
+#include <vigra/multi_math.hxx>
+
+#define PRINT_ELAPSED
+
+#ifdef PRINT_ELAPSED
+#include <ctime>
+#include <chrono>
+#endif
+
+using namespace std ;
+
+using namespace vigra ;
+
+/// check for differences between two arrays
+
+template < class view_type >
+void check_diff ( const view_type& a , const view_type& b )
+{
+  using namespace vigra::multi_math ;
+  using namespace vigra::acc;
+  
+  typedef typename view_type::value_type value_type ;
+  typedef typename vigra::ExpandElementResult < value_type > :: type real_type ;
+  typedef MultiArray<2,real_type> error_array ;
+
+  error_array ea ( vigra::multi_math::squaredNorm ( b - a ) ) ;
+  AccumulatorChain<real_type,Select< Mean, Maximum> > ac ;
+  extractFeatures(ea.begin(), ea.end(), ac);
+  std::cout << "warped image diff Mean: " << sqrt(get<Mean>(ac)) << std::endl;
+  std::cout << "warped image diff Maximum: " << sqrt(get<Maximum>(ac)) << std::endl;
+}
+
+template < class view_type , typename real_type , typename rc_type >
+void run_test ( view_type & data ,
+                vspline::bc_code bc ,
+                int DEGREE ,
+                int TIMES = 32 )
+{
+  typedef typename view_type::value_type pixel_type ;
+  typedef typename view_type::difference_type Shape;
+  typedef MultiArray < 2 , pixel_type > array_type ;
+  typedef int int_type ;
+
+#ifdef USE_VC
+  
+  // we use simdized types with as many elements as vector_traits
+  // considers appropriate for a given real_type, which is the elementary
+  // type of the (pixel) data we process:
+  
+  const int vsize = vspline::vector_traits < real_type > :: size ;
+  
+  // for vectorized coordinates, we use simdized coordinates with as many
+  // elements as the simdized values hold:
+
+  typedef typename vspline::vector_traits < rc_type , vsize > :: type rc_v ;
+  
+#else
+  
+  const int vsize = 1 ;
+  
+#endif
+
+  TinyVector < vspline::bc_code , 2 > bcv ( bc ) ;
+  
+  int Nx = data.width() ;
+  int Ny = data.height() ;
+//   cout << "Nx: " << Nx << " Ny: " << Ny << endl ;
+
+  vspline::bspline < pixel_type , 2 > bsp ( data.shape() , DEGREE , bcv ) ; // , vspline::EXPLICIT ) ;
+  bsp.core = data ;
+//   cout << "created bspline object:" << endl << bsp << endl ;
+  
+  // first test: time prefilter
+
+#ifdef PRINT_ELAPSED
+  std::chrono::system_clock::time_point start = std::chrono::system_clock::now();
+  std::chrono::system_clock::time_point end ;
+#endif
+  
+  for ( int times = 0 ; times < TIMES ; times++ )
+    bsp.prefilter() ;
+  
+#ifdef PRINT_ELAPSED
+  end = std::chrono::system_clock::now();
+  cout << "avg " << TIMES << " x prefilter:........................ "
+       << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
+       << " ms" << endl ;
+#endif
+  
+  // do it again, data above are useless after 10 times filtering
+  bsp.core = data ;
+  bsp.prefilter() ;
+
+  // get a view to the core coefficients (those which aren't part of the brace)
+  view_type cfview = bsp.core ;
+
+  // set the coordinate type
+  typedef vigra::TinyVector < rc_type , 2 > coordinate_type ;
+  
+  // set the evaluator type
+  typedef vspline::evaluator<coordinate_type,pixel_type> eval_type ;
+
+  // create the evaluator. We create an evaluator using a mapping which corresponds
+  // to the boundary conditions we're using, instead of the default 'REJECT' mapping.
+  // For natural boundary conditions, there is no corresponding mapping and we use
+  // MAP_LIMIT instead.
+  
+// KFJ 2017-08-10 no more mapping in class evaluator!
+//   vspline::map_code mc ;
+//   switch ( bc )
+//   {
+//     case vspline::MIRROR:
+//       mc = vspline::MAP_MIRROR ;
+//       break ;
+//     case vspline::NATURAL:
+//       mc = vspline::MAP_LIMIT ;
+//       break ;
+//     case vspline::REFLECT:
+//       mc = vspline::MAP_REFLECT ;
+//       break ;
+//     case vspline::PERIODIC:
+//       mc = vspline::MAP_PERIODIC ;
+//       break ;
+//     default:
+//       mc = vspline::MAP_REJECT ;
+//       break ;
+//   }
+  
+  // create the evaluator for the b-spline, using plain evaluation (no derivatives)
+  // and the same mapping mode for both axes:
+  
+  eval_type ev ( bsp ) ;           // spline
+//                  { 0 , 0 } ,     // (no) derivatives
+//                  { mc , mc } ) ; // mapping code as per switch above
+
+  // type for coordinate array
+  typedef vigra::MultiArray<2, coordinate_type> coordinate_array ;
+  
+  int Tx = Nx ;
+  int Ty = Ny ;
+
+  // now we create a warp array of coordinates at which the spline will be evaluated.
+  // Also create a target array to contain the result.
+
+  coordinate_array fwarp ( Shape ( Tx , Ty ) ) ;
+  array_type _target ( Shape(Tx,Ty) ) ;
+  view_type target ( _target ) ;
+  
+  rc_type dfx = 0.0 , dfy = 0.0 ; // currently evaluating right at knot point locations
+  
+  for ( int times = 0 ; times < 1 ; times++ )
+  {
+    for ( int y = 0 ; y < Ty ; y++ )
+    {
+      rc_type fy = (rc_type)(y) + dfy ;
+      for ( int x = 0 ; x < Tx ; x++ )
+      {
+        rc_type fx = (rc_type)(x) + dfx ;
+        // store the coordinate to fwarp[x,y]
+        fwarp [ Shape ( x , y ) ] = coordinate_type ( fx , fy ) ;
+      }
+    }
+  }
+ 
+  // second test. perform a remap using fwarp as warp array. Since fwarp contains
+  // the discrete coordinates to the knot points, converted to float, the result
+  // should be the same as the input within the given precision
+
+#ifdef PRINT_ELAPSED
+  start = std::chrono::system_clock::now();
+#endif
+  
+  for ( int times = 0 ; times < TIMES ; times++ )
+    vspline::remap < eval_type , 2 >
+      ( ev , fwarp , target ) ;
+
+  
+#ifdef PRINT_ELAPSED
+  end = std::chrono::system_clock::now();
+  cout << "avg " << TIMES << " x remap from unsplit coordinates:... "
+       << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
+       << " ms" << endl ;
+#endif
+       
+  check_diff<view_type> ( target , data ) ;
+
+  // third test: do the same with the remap routine which internally creates
+  // a b-spline ('one-shot remap')
+
+#ifdef PRINT_ELAPSED
+  start = std::chrono::system_clock::now();
+#endif
+  
+  for ( int times = 0 ; times < TIMES ; times++ )
+    vspline::remap < coordinate_type , pixel_type , 2 >
+      ( data , fwarp , target , bcv , DEGREE ) ;
+
+ 
+#ifdef PRINT_ELAPSED
+  end = std::chrono::system_clock::now();
+  cout << "avg " << TIMES << " x remap with internal spline:....... "
+       << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
+       << " ms" << endl ;
+#endif
+
+  check_diff<view_type> ( target , data ) ;
+
+  // fourth test: perform an index_remap directly using te b-spline evaluator
+  // as the index_remap's functor. This is, yet again, the same, because
+  // it evaluates at all discrete positions, but now without the warp array:
+  // the index_remap feeds the evaluator with the discrete coordinates.
+
+#ifdef PRINT_ELAPSED
+  start = std::chrono::system_clock::now();
+#endif
+  
+  for ( int times = 0 ; times < TIMES ; times++ )
+    vspline::index_remap < eval_type >
+      ( ev , target ) ;
+
+#ifdef PRINT_ELAPSED
+  end = std::chrono::system_clock::now();
+  cout << "avg " << TIMES << " x index_remap ..................... "
+       << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
+       << " ms" << endl ;
+#endif
+
+  cout << "difference original data/restored data:" << endl ;
+  check_diff<view_type> ( target , data ) ;
+
+  // fifth test: use 'restore' which internally delegates to grid_eval. This is
+  // usually slightly faster than the previous way to restore the original data,
+  // but otherwise makes no difference.
+
+#ifdef PRINT_ELAPSED
+  start = std::chrono::system_clock::now();
+#endif
+  
+  for ( int times = 0 ; times < TIMES ; times++ )
+    vspline::restore < 2 , pixel_type , rc_type > ( bsp , target ) ;
+  
+#ifdef PRINT_ELAPSED
+  end = std::chrono::system_clock::now();
+  cout << "avg " << TIMES << " x restore original data: .......... "
+       << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
+       << " ms" << endl ;
+#endif
+       
+  cout << "difference original data/restored data:" << endl ;
+  check_diff<view_type> ( data , target ) ;
+  cout << endl ;
+}
+
+template < class real_type , class rc_type >
+void process_image ( char * name )
+{
+  cout << fixed << showpoint ; //  << setprecision(32) ;
+  
+  // the import and info-displaying code is taken from vigra:
+
+  vigra::ImageImportInfo imageInfo(name);
+  // print some information
+  std::cout << "Image information:\n";
+  std::cout << "  file format: " << imageInfo.getFileType() << std::endl;
+  std::cout << "  width:       " << imageInfo.width() << std::endl;
+  std::cout << "  height:      " << imageInfo.height() << std::endl;
+  std::cout << "  pixel type:  " << imageInfo.getPixelType() << std::endl;
+  std::cout << "  color image: ";
+  if (imageInfo.isColor())    std::cout << "yes (";
+  else                        std::cout << "no  (";
+  std::cout << "number of channels: " << imageInfo.numBands() << ")\n";
+
+  typedef vigra::RGBValue<real_type,0,1,2> pixel_type; 
+  typedef vigra::MultiArray<2, pixel_type> array_type ;
+  typedef vigra::MultiArrayView<2, pixel_type> view_type ;
+
+  // to test that strided data are processed correctly, we load the image
+  // to an inner subarray of containArray
+  
+//   array_type containArray(imageInfo.shape()+vigra::Shape2(3,5));
+//   view_type imageArray = containArray.subarray(vigra::Shape2(1,2),vigra::Shape2(-2,-3)) ;
+  
+  // alternatively, just use the same for both
+  
+  array_type containArray ( imageInfo.shape() );
+  view_type imageArray ( containArray ) ;
+  
+  vigra::importImage(imageInfo, imageArray);
+  
+  // test these bc codes:
+
+  vspline::bc_code bcs[] =
+  {
+    vspline::MIRROR ,
+    vspline::REFLECT ,
+    vspline::NATURAL ,
+    vspline::PERIODIC
+  } ;
+
+  for ( int b = 0 ; b < 4 ; b++ )
+  {
+    vspline::bc_code bc = bcs[b] ;
+    for ( int spline_degree = 2 ; spline_degree < 8 ; spline_degree++ )
+    {
+#ifdef USE_VC
+      cout << "testing bc code " << vspline::bc_name[bc]
+           << " spline degree " << spline_degree << " using Vc" << endl ;
+#else
+      cout << "testing bc code " << vspline::bc_name[bc]
+           << " spline degree " << spline_degree << endl ;
+#endif
+      run_test < view_type , real_type , rc_type > ( imageArray , bc , spline_degree ) ;
+    }
+  }
+}
+
+int main ( int argc , char * argv[] )
+{
+  cout << "testing float data, float coordinates" << endl ;
+  process_image<float,float> ( argv[1] ) ;
+
+  cout << endl << "testing double data, double coordinates" << endl ;
+  process_image<double,double> ( argv[1] ) ;
+  
+  cout << "testing float data, double coordinates" << endl ;
+  process_image<float,double> ( argv[1] ) ;
+  
+  cout << endl << "testing double data, float coordinates" << endl ;
+  process_image<double,float> ( argv[1] ) ;
+  
+  cout << "reached end" << std::endl ;
+  // oops... we hang here, failing to terminate
+  
+  exit ( 0 ) ;
+}
diff --git a/example/slice.cc b/example/slice.cc
new file mode 100644
index 0000000..4502330
--- /dev/null
+++ b/example/slice.cc
@@ -0,0 +1,127 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// slice.cc
+///
+/// build a 3D volume from samples of the RGB colour space
+/// build a spline over it and extract a 2D slice
+///
+/// compile with:
+/// clang++ -std=c++11 -march=native -o slice -O3 -pthread -DUSE_VC=1 slice.cc -lvigraimpex -lVc
+/// g++ also works.
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+
+int main ( int argc , char * argv[] )
+{
+  // pixel_type is the result type, an RGB float pixel
+  typedef vigra::TinyVector < float , 3 > pixel_type ;
+  
+  // voxel_type is the source data type
+  typedef vigra::TinyVector < float , 3 > voxel_type ;
+  
+  // coordinate_type has a 3D coordinate
+  typedef vigra::TinyVector < float , 3 > coordinate_type ;
+  
+  // warp_type is a 2D array of coordinates
+  typedef vigra::MultiArray < 2 , coordinate_type > warp_type ;
+  
+  // target_type is a 2D array of pixels  
+  typedef vigra::MultiArray < 2 , pixel_type > target_type ;
+  
+  // we want a b-spline with natural boundary conditions
+  vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ;
+  
+  // create quintic 3D b-spline object containing voxels
+  vspline::bspline < voxel_type , 3 >
+    space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ;
+  
+  // fill the b-spline's core with a three-way gradient
+  for ( int z = 0 ; z < 10 ; z++ )
+  {
+    for ( int y = 0 ; y < 10 ; y++ )
+    {
+      for ( int x = 0 ; x < 10 ; x++ )
+      {
+        voxel_type & c ( space.core [ vigra::Shape3 ( x , y , z ) ] ) ;
+        c[0] = 25.5 * x ;
+        c[1] = 25.5 * y ;
+        c[2] = 25.5 * z ;
+      }
+    }
+  }
+  
+  // prefilter the b-spline
+  space.prefilter() ;
+  
+  // get an evaluator for the b-spline
+  typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ;
+  ev_type ev ( space ) ;
+  
+  // now make a warp array with 1920X1080 3D coordinates
+  warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+  
+  // we want the coordinates to follow this scheme:
+  // warp(x,y) = (x,1-x,y)
+  // scaled appropriately
+  
+  for ( int y = 0 ; y < 1080 ; y++ )
+  {
+    for ( int x = 0 ; x < 1920 ; x++ )
+    {
+      coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
+      c[0] = float ( x ) / 192.0 ;
+      c[1] = 10.0 - c[0] ;
+      c[2] = float ( y ) / 108.0 ;
+    }
+  }
+  
+  // this is where the result should go:
+  target_type target ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+
+  // now we perform the remap, yielding the result
+  vspline::remap < ev_type , 2 > ( ev , warp , target ) ;
+
+  // store the result with vigra impex
+  vigra::ImageExportInfo imageInfo ( "slice.tif" );
+  
+  vigra::exportImage ( target ,
+                      imageInfo
+                      .setPixelType("UINT8")
+                      .setCompression("100")
+                      .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ;
+  
+  exit ( 0 ) ;
+}
diff --git a/example/slice2.cc b/example/slice2.cc
new file mode 100644
index 0000000..7a46432
--- /dev/null
+++ b/example/slice2.cc
@@ -0,0 +1,192 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// slice.cc
+///
+/// build a 3D volume from samples of the RGB colour space
+/// build a spline over it and extract a 2D slice
+///
+/// while the result is just about the same as the one we get from slice.cc,
+/// here our volume contains double-precision voxels. Since the evaluator over
+/// the b-spline representing the volume of voxels produces double precision
+/// voxels as output (same as the source data type), but our target takes
+/// float pixels, we have to wrap the evaluator in an outer class which handles
+/// the conversion from doule voxels to float pixels. This is done with a
+/// class derived from vspline::unary_functor. This is quite a mouthful for
+/// a simple double-to-float conversion, but it shows the wrapping mechanism
+/// clearly. Using wrapping like this is verbose, but produces a resulting
+/// functor which is maximally efficient.
+///
+/// compile with:
+/// clang++ -std=c++11 -march=native -o slice2 -O3 -pthread -DUSE_VC=1 slice2.cc -lvigraimpex -lVc
+/// g++ also works.
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+
+// pixel_type is the result type, an RGB float pixel
+typedef vigra::TinyVector < float , 3 > pixel_type ;
+
+// voxel_type is the source data type
+typedef vigra::TinyVector < double , 3 > voxel_type ;
+
+// coordinate_type has a 3D coordinate
+typedef vigra::TinyVector < float , 3 > coordinate_type ;
+
+// warp_type is a 2D array of coordinates
+typedef vigra::MultiArray < 2 , coordinate_type > warp_type ;
+
+// target_type is a 2D array of pixels  
+typedef vigra::MultiArray < 2 , pixel_type > target_type ;
+
+// b-spline evaluator producing double precision voxels
+typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ;
+
+// to move from the b-splines result type (double precision voxel) to the
+// target type (float pixel) we wrap the b-spline evaluator with a class
+// derived from vspline::unary_functor:
+
+template < typename coordinate_type ,
+           typename pixel_type ,
+           int _vsize ,
+           class ev_type >
+struct downcast_policy
+: public vspline::uf_types < coordinate_type , pixel_type , ev_type::vsize >
+{
+public:
+
+  // we want to access facilites of the base class (vspline::uf_types<...>)
+  // so we use a typedef for the base class.
+
+  typedef vspline::uf_types
+    < coordinate_type , pixel_type , ev_type::vsize > base_type ;
+
+  // pull in standard evaluation type system with this macro:
+
+  using_unary_functor_types ( base_type ) ;
+  
+  // we keep a const reference to the wrappee
+  const ev_type & inner ;
+  
+  // which is initialized in the constructor
+  downcast_policy ( const ev_type & _inner )
+  : inner ( _inner )
+  { } ;
+  
+  template < class IN ,
+             class OUT = typename base_type::template out_type_of<IN> >
+  void eval ( const IN & c ,
+                    OUT & result ) const
+  {
+    auto intermediate = inner ( c ) ;
+    result = OUT ( intermediate ) ;
+  } ;
+} ;
+
+template < class ev_type >
+using downcast_type = vspline::unary_functor < coordinate_type ,
+                                               pixel_type ,
+                                               ev_type::vsize ,
+                                               downcast_policy ,
+                                               ev_type
+                                              > ;
+
+int main ( int argc , char * argv[] )
+{
+  // we want a b-spline with natural boundary conditions
+  vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ;
+  
+  // create quintic 3D b-spline object containing voxels
+  vspline::bspline < voxel_type , 3 >
+    space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ;
+  
+  // fill the b-spline's core with a three-way gradient
+  for ( int z = 0 ; z < 10 ; z++ )
+  {
+    for ( int y = 0 ; y < 10 ; y++ )
+    {
+      for ( int x = 0 ; x < 10 ; x++ )
+      {
+        voxel_type & c ( space.core [ vigra::Shape3 ( x , y , z ) ] ) ;
+        c[0] = 25.5 * x ;
+        c[1] = 25.5 * y ;
+        c[2] = 25.5 * z ;
+      }
+    }
+  }
+  
+  // prefilter the b-spline
+  space.prefilter() ;
+  
+  // get an evaluator for the b-spline
+  ev_type ev ( space ) ;
+  
+  // wrap it in a downcast_type
+  downcast_type<ev_type> dc ( ev ) ;
+  
+  // now make a warp array with 1920X1080 3D coordinates
+  warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+  
+  // we want the coordinates to follow this scheme:
+  // warp(x,y) = (x,1-x,y)
+  // scaled appropriately
+  
+  for ( int y = 0 ; y < 1080 ; y++ )
+  {
+    for ( int x = 0 ; x < 1920 ; x++ )
+    {
+      coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
+      c[0] = float ( x ) / 192.0 ;
+      c[1] = 10.0 - c[0] ;
+      c[2] = float ( y ) / 108.0 ;
+    }
+  }
+  
+  // this is where the result should go:
+  target_type target ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+
+  // now we perform the remap, yielding the result
+  vspline::remap < downcast_type<ev_type> , 2 > ( dc , warp , target ) ;
+
+  // store the result with vigra impex
+  vigra::ImageExportInfo imageInfo ( "slice.tif" );
+  
+  vigra::exportImage ( target ,
+                      imageInfo
+                      .setPixelType("UINT8")
+                      .setCompression("100")
+                      .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ;
+  
+  exit ( 0 ) ;
+}
diff --git a/example/slice3.cc b/example/slice3.cc
new file mode 100644
index 0000000..8080756
--- /dev/null
+++ b/example/slice3.cc
@@ -0,0 +1,147 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// slice.cc
+///
+/// build a 3D volume from samples of the RGB colour space
+/// build a spline over it and extract a 2D slice
+///
+/// Here we use a quick shot without elaborate, possibly vectorized, wrapper
+/// classes. Oftentimes all it takes is a single run of an interpolation with
+/// as little programming effort as possible, never mind the performance.
+/// Again we use a b-spline with double-precision voxels as value_type,
+/// but instead of using vspline::remap, which requires a suitable functor
+/// yielding pixel_type, we simply use the evaluator's operator() and
+/// implicitly cast the result to pixel_type.
+///
+/// compile with:
+/// clang++ -std=c++11 -o slice3 -O3 -pthread slice3.cc -lvigraimpex
+/// g++ also works.
+
+#include <vspline/vspline.h>
+
+#include <vigra/stdimage.hxx>
+#include <vigra/imageinfo.hxx>
+#include <vigra/impex.hxx>
+
+// pixel_type is the result type, here we use a vigra::RGBValue for a change
+typedef vigra::RGBValue < unsigned char , 0 , 1 , 2 > pixel_type ;
+
+// voxel_type is the source data type
+typedef vigra::TinyVector < double , 3 > voxel_type ;
+
+// coordinate_type has a 3D coordinate
+typedef vigra::TinyVector < float , 3 > coordinate_type ;
+
+// warp_type is a 2D array of coordinates
+typedef vigra::MultiArray < 2 , coordinate_type > warp_type ;
+
+// target_type is a 2D array of pixels  
+typedef vigra::MultiArray < 2 , pixel_type > target_type ;
+
+// b-spline evaluator producing double precision voxels
+typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ;
+
+int main ( int argc , char * argv[] )
+{
+  // we want a b-spline with natural boundary conditions
+  vigra::TinyVector < vspline::bc_code , 3 > bcv ( vspline::NATURAL ) ;
+  
+  // create quintic 3D b-spline object containing voxels
+  vspline::bspline < voxel_type , 3 >
+    space ( vigra::Shape3 ( 10 , 10 , 10 ) , 5 , bcv ) ;
+  
+  // fill the b-spline's core with a three-way gradient
+  for ( int z = 0 ; z < 10 ; z++ )
+  {
+    for ( int y = 0 ; y < 10 ; y++ )
+    {
+      for ( int x = 0 ; x < 10 ; x++ )
+      {
+        voxel_type & c ( space.core [ vigra::Shape3 ( x , y , z ) ] ) ;
+        c[0] = 25.5 * x ;
+        c[1] = 25.5 * y ;
+        c[2] = 25.5 * z ;
+      }
+    }
+  }
+  
+  // prefilter the b-spline
+  space.prefilter() ;
+  
+  // get an evaluator for the b-spline
+  ev_type ev ( space ) ;
+  
+  // now make a warp array with 1920X1080 3D coordinates
+  warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+  
+  // we want the coordinates to follow this scheme:
+  // warp(x,y) = (x,1-x,y)
+  // scaled appropriately
+  
+  for ( int y = 0 ; y < 1080 ; y++ )
+  {
+    for ( int x = 0 ; x < 1920 ; x++ )
+    {
+      coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
+      c[0] = float ( x ) / 192.0 ;
+      c[1] = 10.0 - c[0] ;
+      c[2] = float ( y ) / 108.0 ;
+    }
+  }
+  
+  // this is where the result should go:
+  target_type target ( vigra::Shape2 ( 1920 , 1080 ) ) ;
+
+  // we are sure warp and target have the same shape. We use iterators
+  // over warp and target and perform the remap 'manually' by iterating over
+  // warp and target synchronously:
+
+  auto coordinate_iter = warp.begin() ;
+  
+  for ( auto & trg : target )
+  {
+    // here we implicitly cast the result of the evaluator down to pixel_type:
+    trg = ev ( *coordinate_iter ) ;
+    ++coordinate_iter ;
+  }
+
+  // store the result with vigra impex
+  vigra::ImageExportInfo imageInfo ( "slice.tif" );
+  
+  vigra::exportImage ( target ,
+                      imageInfo
+                      .setPixelType("UINT8")
+                      .setCompression("100")
+                      .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ;
+  
+  exit ( 0 ) ;
+}
diff --git a/example/splinus.cc b/example/splinus.cc
new file mode 100644
index 0000000..b84abf2
--- /dev/null
+++ b/example/splinus.cc
@@ -0,0 +1,99 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// \file splinus.cc
+///
+/// \brief compare a periodic b-spline with a sine
+/// 
+/// This is a deliberately trivial example using a periodic b-spline
+/// over just two values: 1 and -1. This spline is used to approximate
+/// a sine function. You pass the spline's desired degree on the command
+/// line. Next you enter a number (interpreted as degrees) and the program
+/// will output the sine and the 'splinus' of the given angle.
+/// As you can see when playing with higher degrees, the higher the spline's
+/// degree, the closer the match with the sine. So apart from serving as a
+/// very simple demonstration of using a 1D periodic b-spline, it teaches us
+/// that a periodic b-spline can approximate a sine.
+///
+/// compile with: clang++ -pthread -O3 -std=c++11 splinus.cc -o splinus
+
+#include <assert.h>
+#include <vspline/vspline.h>
+
+int main ( int argc , char * argv[] )
+{
+  assert ( argc > 1 ) ;
+  
+  int degree = std::atoi ( argv[1] ) ;
+  
+  assert ( degree >= 0 && degree < 25 ) ;
+  
+  // create the bspline object
+  
+  vspline::bspline < double ,   // spline's data type
+                     1 >        // one dimension
+    bsp ( 2 ,                   // two values
+          degree ,              // degree as per command line
+          vspline::PERIODIC ) ; // periodic boundary conditions
+          
+  // the bspline object's 'core' is a MultiArrayView to the knot point
+  // data, which we set one by one for this simple example:
+  
+  bsp.core[0] = 1.0 ;
+  bsp.core[1] = -1.0 ;
+  
+  // now we prefilter the data
+  
+  bsp.prefilter() ;
+  
+  // the spline is now ready for use, we create an evaluator
+  // to obtain interpolated values
+  
+  vspline::evaluator < double ,       // evaluator taking double coordinates
+                       double         // and producing double results
+                       > ev ( bsp ) ; // from the bspline object we just made
+
+  while ( true )
+  {
+    std::cout << " > " ;
+    double x ;
+    std::cin >> x ;                // get an angle
+    double xs = x * M_PI / 180.0 ; // sin() uses radians 
+    double xx = x / 180.0 - .5 ;   // 'splinus' has period 1 and is shifted .5
+    
+    // finally we can produce both results. Note how we can use ev, the evaluator,
+    // like an ordinary function.
+
+    std::cout << "sin(" << x << ") = " << sin(xs)
+              << " splin(" << x << ") = " << ev(xx)
+              << " difference: " << sin(xs) - ev(xx) << std::endl ;
+  }
+}
diff --git a/example/use_map.cc b/example/use_map.cc
new file mode 100644
index 0000000..586b424
--- /dev/null
+++ b/example/use_map.cc
@@ -0,0 +1,113 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file use_map.cc
+
+    \brief test program for code in map.h
+*/
+
+#include <iostream>
+#include <vspline/map.h>
+
+// Little tester program using map.h
+
+// TODO: expand this to a unit test
+
+template < class gate_type >
+void test ( float x , const char * mode )
+{
+  std::cout << mode << "\t" ;
+
+  typedef vspline::mapper < float , 4 , gate_type >
+    mapper_type ;
+    
+  gate_type gx ( 0.0 , 1.0 ) ;
+  
+  mapper_type tester ( gx ) ;
+  
+  typedef float crd_type ;
+
+  const crd_type crd { x } ;
+  crd_type res ;
+  
+  tester.eval ( crd , res ) ;
+  std::cout << crd << " -> " << res << " " ;
+  
+#ifdef USE_VC
+
+  typedef Vc::SimdArray<float,4> crd_v ;
+
+  crd_v inv = crd ;
+  crd_v resv ;
+  
+  tester.eval ( inv , resv ) ;
+  std::cout << inv << " -> " << resv ;
+  
+#endif
+  
+  std::cout << std::endl ;
+}
+
+int main ( int argc , char * argv[] )
+{
+  float x ;
+  
+  while ( true )
+  {
+    try
+    {
+      std::cout << "enter coordinate to map to [ 0.0 : 1.0 ]" << std::endl ;
+      std::cin >> x ;
+      test < vspline::gate_type < float , 4 , vspline::limit_policy > >
+        ( x , "LIMIT:    " ) ;
+      test < vspline::gate_type < float , 4 , vspline::constant_policy > >
+         ( x , "CONSTANT:" ) ;
+      test < vspline::gate_type < float , 4 , vspline::mirror_policy > >
+         ( x , "MIRROR:  " ) ;
+      test < vspline::gate_type < float , 4 , vspline::periodic_policy > >
+         ( x , "PERIODIC:" ) ;
+      test < vspline::gate_type < float , 4 , vspline::reject_policy > >
+         ( x , "REJECT:  " ) ;
+    }
+    catch ( vspline::out_of_bounds )
+    {
+      std::cout << "exception out_of_bounds" << std::endl ;
+    }
+  }
+}
diff --git a/filter.h b/filter.h
new file mode 100644
index 0000000..f85064b
--- /dev/null
+++ b/filter.h
@@ -0,0 +1,1905 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file filter.h
+
+    \brief generic implementation of an n-pole forward-backward IIR filter for nD arrays
+    
+    This code was initially part of vspline's prefilter.h, but I factored it out
+    and disentangled it from the remainder of the code, since it's more general and
+    not specific to B-splines.
+    
+    The code in this file provides efficient filtering of nD arrays with an n-pole
+    forward-backward recursive filter, accepting a variety of boundary conditions and
+    optionally using multithreading and/or vectorization to speed things up.
+
+    The data have to be presented as vigra MultiArrayViews of elementary floating point
+    types or their 'aggregates' (TinyVectors, pixels, etc.), the code is dimension-agnostic
+    but templated to the array types used, so the dimensionality is not a run time parameter.
+    
+    Note the code organization is bottom-up, so the highest-level code comes last.
+    Most code using filter.h will only call the final routine, filter_nd.
+    
+    While the initial purpose for the code in this file was, of course, b-spline prefiltering,
+    the generalized version I present here can be used for arbitrary filters. There is probably
+    one other filter which is most useful in the context of vspline: passing a single positive
+    pole in the range of ] 0 , 1 [ smoothes the signal very efficiently.
+*/
+
+// include common.h for the border condition codes
+
+#include <vector>
+#include "common.h"
+
+#ifndef VSPLINE_FILTER_H
+#define VSPLINE_FILTER_H
+
+namespace vspline {
+
+/// overall_gain is a helper routine:
+/// Simply executing the filtering code by itself will attenuate the signal. Here
+/// we calculate the gain which, pre-applied to the signal, will cancel this effect.
+/// While this code was initially part of the filter's constructor, I took it out
+/// to gain some flexibility by passing in the gain as a parameter.
+
+double overall_gain ( const int & nbpoles , const double * const pole )
+{
+  double lambda = 1.0 ;
+
+  for ( int k = 0 ; k < nbpoles ; k++ )
+
+    lambda = lambda * ( 1.0 - pole[k] ) * ( 1.0 - 1.0 / pole[k] ) ;
+  
+  return lambda ;
+}
+  
+/// for each pole passed in, this filter will perform a forward-backward
+/// first order IIR filter, initially on the data passed in via in_iter, subsequently
+/// on the result of the application of the previous pole, using these recursions:
+/// 
+/// forward filter:
+///   
+/// x[n]' = x[n] + p * x[n-1]
+/// 
+/// backward filter:
+/// 
+/// x[n]'' = p * ( x[n+1]' - x[n]' )
+/// 
+/// the result will be deposited via out_iter, which may be an iterator over
+/// the same data in_iter iterates over, in which case operation is in-place.
+/// in_iter can be a const iterator, it's never used for writing data.
+///
+/// class filter needs three template arguments, one for the type of iterator over the
+/// incoming data, one for the type of iterator to the resultant coefficients, and one
+/// for the real type used in arithmetic operations. The iterators' types will usually
+/// be the same, but formulating the code with two separate types makes it more
+/// versatile. The third (optional) template argument, will usually be the elementary
+/// type of the iterator's value_type. When the value_types are vigra aggregates
+/// (TinyVectors etc.) vigra's ExpandElementResult mechanism will provide, but at times
+/// we may wish to be explicit here, e.g. when iterating over simdized types.
+
+template < typename in_iter ,   // iterator over the knot point values
+           typename out_iter ,  // iterator over the coefficient array
+           typename real_type > // type for single real value for calculations
+class filter
+{
+  // both iterators must define value_type and have the same value_type
+
+  typedef typename in_iter::value_type value_type ;
+  
+  static_assert ( std::is_same < typename out_iter::value_type , value_type > :: value ,
+                  "prefilter input and output iterator must have the same value_type" ) ;
+  
+//   // both iterators should be random access iterators.
+//   // currently not enforced
+//   typedef typename std::iterator_traits < in_iter > :: iterator_category in_cat ;
+//   static_assert ( std::is_same < in_cat , std::random_access_iterator_tag > :: value ,
+//                   "prefilter input iterator must be random access iterator"  ) ;
+//                   
+//   typedef typename std::iterator_traits < out_iter > :: iterator_category out_cat ;
+//   static_assert ( std::is_same < out_cat , std::random_access_iterator_tag > :: value ,
+//                   "prefilter output iterator must be random access iterator" ) ;
+  
+  /// typedef the fully qualified type for brevity, to make the typedefs below
+  /// a bit more legible
+
+  typedef filter < in_iter , out_iter , real_type > filter_type ;
+
+  const double* pole ;               ///< poles of the IIR filter
+  std::vector<int> horizon ;         ///< corresponding horizon values
+  const real_type lambda ;           ///< (potentiated) overall gain.  
+  const int npoles ;                 ///< Number of filter poles
+  const int M ;                      ///< length of the data
+
+  /// the solving routine and initial coefficient finding routines are called via method pointers.
+  /// these pointers are typedefed for better legibility:
+  
+  typedef void       ( filter_type::*p_solve )  ( in_iter  input , out_iter output ) ;
+  typedef value_type ( filter_type::*p_icc1 )  ( in_iter  input , int k ) ;
+  typedef value_type ( filter_type::*p_icc2 )  ( out_iter input , int k ) ;
+  typedef value_type ( filter_type::*p_iacc )  ( out_iter input , int k ) ;
+
+  
+  // these are the method pointers used:
+  
+  p_solve _p_solve ; ///< pointer to the solve method
+  p_icc1  _p_icc1 ;  ///< pointer to calculation of initial causal coefficient with different
+  p_icc2  _p_icc2 ;  ///< and equal data types of input and output
+  p_iacc  _p_iacc ;  ///< pointer to calculation of initial anticausal coefficient
+  
+public:
+
+ /// solve() takes two iterators, one to the input data and one to the output space.
+ /// The containers must have the same size. It's safe to use solve() in-place.
+
+ void solve ( in_iter input , out_iter output )
+ {
+   (this->*_p_solve) ( input , output ) ;
+ }
+ 
+ /// for in-place operation we use the same filter routine.
+ /// I checked: a handcoded in-place routine using only a single
+ /// iterator is not noticeably faster than using one with two separate iterators.
+ 
+ void solve ( out_iter data )
+ {
+   (this->*_p_solve) ( data , data ) ;
+ }
+ 
+// I use adapted versions of P. Thevenaz' code to calculate the initial causal and
+// anticausal coefficients for the filter. The code is changed just a little to work
+// with an iterator instead of a C vector.
+
+private:
+
+/// The code for mirrored BCs is adapted from P. Thevenaz' code, the other routines are my
+/// own doing, with aid from a digest of spline formulae I received from P. Thevenaz and which
+/// were helpful to verify the code against a trusted source.
+///
+/// note how, in the routines to find the initial causal coefficient, there are two different
+/// cases: first the 'accelerated loop', which is used when the theoretically infinite sum of
+/// terms has reached sufficient precision, and the 'full loop', which implements the mathematically
+/// precise representation of the limes of the infinite sum towards an infinite number of terms,
+/// which happens to be calculable due to the fact that the absolute value of all poles is < 1 and
+///
+///  lim     n                a
+///         sum a * q ^ k =  ---
+/// n->inf  k=0              1-q
+///
+/// first are mirror BCs. This is mirroring 'on bounds',
+/// f(-x) == f(x) and f(n-1 +x) == f(n-1 + x)
+///
+/// note how mirror BCs are equivalent to requiring the first derivative to be zero in the
+/// linear algebra approach. Obviously with mirrored data this has to be the case; the location
+/// where mirroring occurs is always an extremum. So this case covers 'FLAT' BCs as well
+///
+/// the initial causal coefficient routines are templated by iterator type, because depending
+/// on the circumstances, they may be used either on the input or the output iterator.
+  
+template < class IT >
+value_type icc_mirror ( IT c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+  value_type zn, z2n, iz;
+  value_type Sum ;
+  int  n ;
+
+  if (horizon[k] < M) {
+    /* accelerated loop */
+    zn = z;
+    Sum = c[0];
+    for (n = 1; n < horizon[k]; n++)
+    {
+      Sum += zn * c[n];
+      zn *= z;
+    }
+  }
+  else {
+    /* full loop */
+    zn = z;
+    iz = value_type(1.0) / z;
+    z2n = value_type ( pow(double(pole[k]), double(M - 1)) );
+    Sum = c[0] + z2n * c[M - 1];
+    z2n *= z2n * iz;
+    for (n = 1; n <= M - 2; n++)
+    {
+      Sum += (zn + z2n) * c[n];
+      zn *= z;
+      z2n *= iz;
+    }
+    Sum /= (value_type(1.0) - zn * zn);
+  } 
+//  cout << "icc_mirror: " << Sum << endl ;
+ return(Sum);
+}
+
+/// the initial anticausal coefficient routines are always called with the output iterator,
+/// so they needn't be templated like the icc routines.
+///
+/// I still haven't understood the 'magic' which allows to calculate the initial anticausal
+/// coefficient from just two results of the causal filter, but I assume it's some exploitation
+/// of the symmetry of the data. This code is adapted from P. Thevenaz'.
+
+value_type iacc_mirror ( out_iter c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+
+  return( value_type( z / ( z * z - value_type(1.0) ) ) * ( c [ M - 1 ] + z * c [ M - 2 ] ) );
+}
+
+/// next are 'antimirrored' BCs. This is the same as 'natural' BCs: the signal is
+/// extrapolated via point mirroring at the ends, resulting in point-symmetry at the ends,
+/// which is equivalent to the second derivative being zero, the constraint used in
+/// the linear algebra approach to calculate 'natural' BCs:
+///
+/// f(x) - f(0) == f(0) - f(-x); f(x+n-1) - f(n-1) == f(n-1) - f (n-1-x)
+
+template < class IT >
+value_type icc_natural ( IT c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+  value_type zn, z2n, iz;
+  value_type Sum , c02 ;
+  int  n ;
+
+  // f(x) - f(0) == f(0) - f(-x)
+  // f(-x) == 2 * f(0) - f(x)
+  
+  if (horizon[k] < M)
+  {
+    c02 = c[0] + c[0] ;
+    zn = z;
+    Sum = c[0];
+    for (n = 1; n < horizon[k]; n++)
+    {
+      Sum += zn * ( c02 - c[n] ) ;
+      zn *= z;
+    }
+    return(Sum);
+  }
+  else {
+    zn = z;
+    iz = value_type(1.0) / z;
+    z2n = value_type ( pow(double(pole[k]), double(M - 1)) );
+    Sum = value_type( ( value_type(1.0) + z ) / ( value_type(1.0) - z ) )
+          * ( c[0] - z2n * c[M - 1] );
+    z2n *= z2n * iz;                                                   // z2n == z^2M-3
+    for (n = 1; n <= M - 2; n++)
+    {
+      Sum -= (zn - z2n) * c[n];
+      zn *= z;
+      z2n *= iz;
+    }
+    return(Sum / (value_type(1.0) - zn * zn));
+  } 
+}
+
+/// I still haven't understood the 'magic' which allows to calculate the initial anticausal
+/// coefficient from just two results of the causal filter, but I assume it's some exploitation
+/// of the symmetry of the data. This code is adapted from P. Thevenaz' formula.
+
+value_type iacc_natural ( out_iter c , int k )
+{
+  value_type z = real_type ( pole[k] ) ;
+
+  return - value_type( z / ( ( value_type(1.0) - z ) * ( value_type(1.0) - z ) ) ) * ( c [ M - 1 ] - z * c [ M - 2 ] ) ;
+}
+
+/// next are reflective BCs. This is mirroring 'between bounds':
+///
+/// f ( -1 - x ) == f ( x ) and f ( n + x ) == f ( n-1 - x )
+///
+/// I took Thevenaz' routine for mirrored data as a template and adapted it.
+/// 'reflective' BCs have some nice properties which make them more suited than mirror BCs in
+/// some situations:
+/// - the artificial discontinuity is 'pushed out' half a unit spacing
+/// - the extrapolated data are just as long as the source data
+/// - they play well with even splines
+
+template < class IT >
+value_type icc_reflect ( IT c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+  value_type zn, z2n, iz;
+  value_type Sum ;
+  int  n ;
+
+  if (horizon[k] < M)
+  {
+    zn = z;
+    Sum = c[0];
+    for (n = 0; n < horizon[k]; n++)
+    {
+      Sum += zn * c[n];
+      zn *= z;
+    }
+    return(Sum);
+  }
+  else
+  {
+    zn = z;
+    iz = value_type(1.0) / z;
+    z2n = value_type ( pow(double(pole[k]), double(2 * M)) );
+    Sum = 0 ;
+    for (n = 0; n < M - 1 ; n++)
+    {
+      Sum += (zn + z2n) * c[n];
+      zn *= z;
+      z2n *= iz;
+    }
+    Sum += (zn + z2n) * c[n];
+    return c[0] + Sum / (value_type(1.0) - zn * zn) ;
+  } 
+}
+
+/// I still haven't understood the 'magic' which allows to calculate the initial anticausal
+/// coefficient from just one result of the causal filter, but I assume it's some exploitation
+/// of the symmetry of the data. I have to thank P. Thevenaz for his formula which let me code:
+
+value_type iacc_reflect ( out_iter c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+
+  return c[M - 1] / ( value_type(1.0) - value_type(1.0) / z ) ;
+}
+
+/// next is periodic BCs. so, f(x) = f(x+N)
+///
+/// Implementing this is more straightforward than implementing the various mirrored types.
+/// The mirrored types are, in fact, also periodic, but with a period twice as large, since they
+/// repeat only after the first reflection. So especially the code for the full loop is more complex
+/// for mirrored types. The down side here is the lack of symmetry to exploit, which made me code
+/// a loop for the initial anticausal coefficient as well.
+
+template < class IT >
+value_type icc_periodic ( IT c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+  value_type zn ;
+  value_type Sum ;
+  int  n ;
+
+  if (horizon[k] < M)
+  {
+    zn = z ;
+    Sum = c[0] ;
+    for ( n = M - 1 ; n > ( M - horizon[k] ) ; n-- )
+    {
+      Sum += zn * c[n];
+      zn *= z;
+    }
+   }
+  else
+  {
+    zn = z;
+    Sum = c[0];
+    for ( n = M - 1 ; n > 0 ; n-- )
+    {
+      Sum += zn * c[n];
+      zn *= z;
+    }
+    Sum /= ( value_type(1.0) - zn ) ;
+  }
+ return Sum ;
+}
+
+// TODO doublecheck this routine!
+
+value_type iacc_periodic ( out_iter c , int k )
+{
+  value_type z = value_type ( pole[k] ) ;
+  value_type zn ;
+  value_type Sum ;
+
+  if (horizon[k] < M)
+  {
+    zn = z ;
+    Sum = c[M-1] * z ;
+    for ( int n = 0 ; n < horizon[k] ; n++ )
+    {
+      zn *= z;
+      Sum += zn * c[n];
+    }
+    Sum = -Sum ;
+  }
+  else
+  {
+    zn = z;
+    Sum = c[M-1];
+    for ( int n = 0 ; n < M - 1 ; n++ )
+    {
+      Sum += zn * c[n];
+      zn *= z;
+    }
+    Sum = z * Sum / ( zn - value_type(1.0) );
+  }
+  return Sum ;
+}
+
+/// guess the initial coefficient. This tries to minimize the effect
+/// of starting out with a hard discontinuity as it occurs with zero-padding,
+/// while at the same time requiring little arithmetic effort
+///
+/// for the forward filter, we guess an extrapolation of the signal to the left
+/// repeating c[0] indefinitely, which is cheap to compute:
+
+template < class IT >
+value_type icc_guess ( IT c , int k )
+{
+  return c[0] * value_type ( 1.0 / ( 1.0 - pole[k] ) ) ;
+}
+
+// for the backward filter, we assume mirror BC, which is also cheap to compute:
+
+value_type iacc_guess ( out_iter c , int k )
+{
+  return iacc_mirror ( c , k ) ;
+}
+
+template < class IT >
+value_type icc_identity ( IT c , int k )
+{
+  return c[0] ;
+}
+
+value_type iacc_identity ( out_iter c , int k )
+{
+  return c[M-1] ;
+}
+
+/// now we come to the solving, or prefiltering code itself.
+/// there are some variants - a bit of code bloat due to the explicit handling of a few
+/// distinct cases; since this is core code I have opted to suffer some code duplication
+/// in exchange for maximum efficiency.
+/// The code itself is adapted from P. Thevenaz' code.
+///
+/// This variant uses a 'carry' element, 'X', to carry the result of the recursion
+/// from one iteration to the next instead of using the direct implementation of the
+/// recursion formula, which would read the previous value of the recursion from memory
+/// by accessing x[n-1], or, x[n+1], respectively.
+
+void solve_gain_inlined ( in_iter c , out_iter x )
+{
+  assert ( M > 1 ) ;
+  
+  // use a buffer of one value_type for the recursion (see below)
+
+  value_type X ;
+  real_type p = real_type ( pole[0] ) ;
+  
+  // process first pole, applying overall gain in the process
+  // of consuming the input. This gain may be a power of the 'orthodox'
+  // lambda from Thevenaz' code. This is done when the input is multidimensional,
+  // in which case it's wasteful to apply lambda in each dimension. In this situation
+  // it makes more sense to apply pow(lambda,dimensions) when solving along the
+  // first axis and apply no gain when solving along the other axes.
+  // Also note that the application of the gain is performed during the processing
+  // of the first (maybe the only) pole of the filter, instead of running a separate
+  // loop over the input to apply it before processing starts.
+  
+  // note how the gain is applied to the initial causal coefficient. This is
+  // equivalent to first applying the gain to the input and then calculating
+  // the initial causal coefficient from the amplified input.
+  
+  // note the seemingly strange = X clause in the asignment. By performing this
+  // assignment, we buffer the result of the current filter step to be used in the
+  // next iteration instead of fetching it again from memory. In my trials, this
+  // performed better, especially on SIMD data.
+  
+  x[0] = X = value_type ( lambda ) * (this->*_p_icc1) (c, 0);
+
+  /* causal recursion */
+  // the gain is applied to each input value as it is consumed
+  
+  for (int n = 1; n < M; n++)
+  {
+    x[n] = X = value_type ( lambda ) * c[n] + value_type ( p ) * X ;
+  }
+  
+  // now the input is used up and won't be looked at any more; all subsequent
+  // processing operates on the output.
+  
+  /* anticausal initialization */
+  
+  x[M - 1] = X = (this->*_p_iacc)(x, 0);
+
+  /* anticausal recursion */
+  for (int n = M - 2; 0 <= n; n--)
+  {
+    x[n] = X = value_type ( p ) * ( X - x[n]);
+  }
+  
+  // for the remaining poles, if any, don't apply the gain
+  // and process the result from applying the first pole
+  
+  for (int k = 1; k < npoles; k++)
+  {
+    p = pole[k] ;
+    /* causal initialization */
+    x[0] = X = (this->*_p_icc2)(x, k);
+    
+    /* causal recursion */
+    for (int n = 1; n < M; n++)
+    {
+      x[n] = X = x[n] + value_type ( p ) * X ;
+    }
+    
+    /* anticausal initialization */
+    x[M - 1] = X = (this->*_p_iacc)(x, k);
+    
+    /* anticausal recursion */
+    for (int n = M - 2; 0 <= n; n--)
+    {
+      x[n] = X = value_type ( p ) * ( X - x[n] );
+    }
+  }
+}
+
+/// solve routine without application of any gain, it is assumed that this has been
+/// done already during an initial run with the routine above, or in some other way.
+
+void solve_no_gain ( in_iter c , out_iter x )
+{
+  assert ( M > 1 ) ;
+
+  value_type X ;
+  real_type p = real_type ( pole[0] ) ;
+  
+  // process first pole, consuming the input
+  
+  /* causal initialization */
+  x[0] = X = (this->*_p_icc1)(c, 0);
+  
+  /* causal recursion */
+  for ( int n = 1; n < M; n++)
+  {
+    x[n] = X = c[n] + value_type ( p ) * X ;
+  }
+  
+  /* anticausal initialization */
+  x[M - 1] = X = (this->*_p_iacc)(x, 0);
+  
+  /* anticausal recursion */
+  for ( int n = M - 2; 0 <= n; n--)
+  {
+    x[n] = X = value_type ( p ) * ( X - x[n]);
+  }
+  
+  // for the remaining poles, if any, work on the result
+  // of processing the first pole
+  
+  for ( int k = 1 ; k < npoles; k++)
+  {
+    p = pole[k] ;
+    /* causal initialization */
+    x[0] = X = (this->*_p_icc2)(x, k);
+    
+    /* causal recursion */
+    for (int n = 1; n < M; n++)
+    {
+      x[n] = X = x[n] + value_type ( p ) * X ;
+    }
+    
+    /* anticausal initialization */
+    x[M - 1] = X = (this->*_p_iacc)(x, k);
+    
+    /* anticausal recursion */
+    for (int n = M - 2; 0 <= n; n--)
+    {
+      x[n] = X = value_type ( p ) * ( X - x[n] );
+    }
+  }
+}
+
+/// shortcircuit routine, copies input to output
+///
+/// this routine can also be used for splines of degree 0 and 1, for simplicity's sake
+
+void solve_identity ( in_iter c , out_iter x )
+{
+  if ( x == c ) // if operation is in-place we needn't do anything
+    return ;
+  for ( int n = 0 ; n < M ; n++ ) // otherwise, copy input to output
+    x[n] = c[n] ;
+}
+
+/// The last bit of work left in class filter is the constructor.
+/// The number of input/output values is passed into the constructur, limiting the
+/// filter to operate on data precisely of this length. apply_gain isn't immediately
+/// obvious: it's not a mere flag, but contains the exponent which should be applied
+/// to the gain. If, for example, a 2D spline is built, one might pass in 2 here for
+/// the first dimension, and 0 for the second. This way, one set of multiplications is
+/// saved, at the cost of slightly reduced accuracy for large spline degrees. For high
+/// spline degrees and higher dimensions, it's advisable to not use this mechanism and
+/// pass in apply_gain = 1 for all dimensions; the calling code in filter.h decides this
+/// with a heuristic.
+/// The number of poles and a pointer to the poles themselves are passed in with the
+/// parameters _nbpoles and _pole, respectively.
+/// Finally, the last parameter, tolerance, gives a measure of the acceptable error.
+
+public:
+  
+filter ( int _M ,               ///< number of input/output elements (DataLength)
+         double gain ,          ///< gain to apply to the signal to cancel attenuation
+         bc_code bc ,           ///< boundary conditions for this filter
+         int _npoles ,          ///< number of poles
+         const double * _pole , ///< pointer to _npoles doubles holding the filter poles
+         double tolerance )     ///< acceptable loss of precision, absolute value
+: M ( _M ) ,
+  npoles ( _npoles ) ,
+  pole ( _pole ) ,
+  lambda ( gain )
+{
+  if ( npoles < 1 )
+  {
+    // zero poles means there's nothing to do but possibly
+    // copying the input to the output, which solve_identity
+    // will do if the operation isn't in-place
+    _p_solve = & filter_type::solve_identity ;
+    return ;
+  }
+  
+  // calculate the horizon for each pole, this is the number of iterations
+  // the filter must perform on a unit impulse (TODO doublecheck) for it to
+  // decay below 'tolerance'
+
+  for ( int i = 0 ; i < npoles ; i++ )
+  {
+    if ( tolerance )
+      horizon.push_back ( ceil ( log ( tolerance ) / log ( fabs ( pole[i] ) ) ) ) ;
+    else
+      horizon.push_back ( M ) ;
+  }
+
+  if ( gain == 1.0 )
+  {
+    // gain == 1.0 has no effect, we can use this solve variant, applying no gain:
+    _p_solve = & filter_type::solve_no_gain ;
+  }
+  else
+  {
+    // if gain isn't 1.0, we use the solve variant which applies it
+    // to the signal as it goes along.
+    _p_solve = & filter_type::solve_gain_inlined ;
+  }
+
+  // while the forward/backward IIR filter in the solve_... routines is the same for all
+  // boundary conditions, the calculation of the initial causal and anticausal coefficients
+  // depends on the boundary conditions and is handled by a call through a method pointer
+  // in the solve_... routines. Here we fix these method pointers:
+  
+  if ( bc == MIRROR )
+  {     
+    _p_icc1 = & filter_type::icc_mirror<in_iter> ;
+    _p_icc2 = & filter_type::icc_mirror<out_iter> ;
+    _p_iacc = & filter_type::iacc_mirror ;
+  }
+  else if ( bc == NATURAL )
+  {     
+    _p_icc1 = & filter_type::icc_natural<in_iter> ;
+    _p_icc2 = & filter_type::icc_natural<out_iter> ;
+    _p_iacc = & filter_type::iacc_natural ;
+  }
+  else if ( bc == PERIODIC )
+  {
+    _p_icc1 = & filter_type::icc_periodic<in_iter> ;
+    _p_icc2 = & filter_type::icc_periodic<out_iter> ;
+    _p_iacc = & filter_type::iacc_periodic ;
+  }
+  else if ( bc == REFLECT )
+  {
+    _p_icc1 = & filter_type::icc_reflect<in_iter> ;
+    _p_icc2 = & filter_type::icc_reflect<out_iter> ;
+    _p_iacc = & filter_type::iacc_reflect ;
+  }
+  else if ( bc == ZEROPAD )
+  {
+    _p_icc1 = & filter_type::icc_identity<in_iter> ;
+    _p_icc2 = & filter_type::icc_identity<out_iter> ;
+    _p_iacc = & filter_type::iacc_identity ;
+  }
+  else if ( bc == IDENTITY )
+  {
+    _p_solve = & filter_type::solve_identity ;
+  }
+  else if ( bc == GUESS )
+  {
+    _p_icc1 = & filter_type::icc_guess<in_iter> ;
+    _p_icc2 = & filter_type::icc_guess<out_iter> ;
+    _p_iacc = & filter_type::iacc_guess ;
+  }
+  else
+  {
+    std::cout << "boundary condition " << bc << " not supported by vspline::filter" << std::endl ;
+    throw not_supported ( "boundary condition not supported by vspline::filter" ) ;
+  }
+}
+
+} ; // end of class filter
+
+// Now that we have generic code for 1D filtering, we want to apply this code to
+// n-dimensional arrays. We use the following strategy:
+// - perform the prefiltering collinear to each axis separately
+// - when processing a specific axis, split the array(s) into chunks and use one job per chunk
+// - perform a traverse on each chunk, copying out subsets collinear to the processing axis
+//   to a buffer
+// - perform the filter on the buffer
+// - copy the filtered data to the target
+// The code is organized bottom-up, with the highest-level routines furthest down, saving
+// on forward declarations. The section of code immediately following doesn't use vectorization,
+// the vector code follows.
+
+/// 'monadic' gather and scatter. gather picks up count source_type which are stride apart,
+/// starting at source and deposting compactly at target. scatter performs the reverse
+/// operation. source_type and target_type can be different; on assignment source_type is
+/// simply cast to target_type.
+/// index_type is passed in as a template argument, allowing for wider types than int,
+/// so these routines can also operate on very large areas of memory.
+
+template < typename source_type ,
+           typename target_type = source_type ,
+           typename index_type = int >
+void gather ( const source_type* source ,
+              target_type* target ,
+              const index_type & stride ,
+              index_type count
+            )
+{
+  while ( count-- )
+  {
+    *target = target_type ( *source ) ;
+    source += stride ;
+    ++target ;
+  }
+}
+
+template < typename source_type ,
+           typename target_type = source_type ,
+           typename index_type = int >
+void scatter ( const source_type* source ,
+               target_type* target ,
+               const index_type & stride ,
+               index_type count
+             )
+{
+  while ( count-- )
+  {
+    *target = target_type ( *source ) ;
+    ++source ;
+    target += stride ;
+  }
+}
+
+/// nonaggregating_filter subsequently copies all 1D subarrays of source collinear to axis
+/// into a 1D buffer, performs the filter 'solver' on the buffer, then writes the filtered
+/// data to the corresponding 1D subarray of target (which may be the same as source).
+/// While the buffering consumes some time, it saves time on the actual filter calculation,
+/// especially with higher-order filters. On my system, I found I broke even even with only
+/// one pole, so there is no special treatment here for low-order filtering (TODO confirm)
+/// note the use of range_type<T>, which is from multithread.h
+/// we derive the index type for the call to the monadic gather/scatter routines
+/// automatically, so here it comes out as vigra's difference_type_1
+
+template < class source_view_type ,
+           class target_view_type ,
+           class math_type >
+void nonaggregating_filter ( vspline::range_type
+                              < typename source_view_type::difference_type > range ,
+                             source_view_type * p_original_source ,
+                             target_view_type * p_original_target ,
+                             int axis ,
+                             double gain ,
+                             bc_code bc ,
+                             int nbpoles ,
+                             const double * pole ,
+                             double tolerance
+                           )
+{
+  typedef typename source_view_type::value_type source_type ;
+  typedef typename target_view_type::value_type target_type ;
+
+  // we're in the single-threaded code now. multithread() has simply forwarded
+  // the source and target MultiArrayViews and a range, here we use the range
+  // to pick out the subarrays of original_source and original_target which we
+  // are meant to process in this thread:
+
+  const auto source = p_original_source->subarray ( range[0] , range[1] ) ;
+  auto target = p_original_target->subarray ( range[0] , range[1] ) ;
+  
+  auto count = source.shape ( axis ) ; 
+
+  /// we use a buffer of count value_types
+
+  vigra::MultiArray < 1 , math_type > buffer ( count ) ;
+
+  // avoiding being specific about the iterator's type allows us to slot in
+  // any old iterator we can get by calling begin() on buffer 
+  
+  typedef decltype ( buffer.begin() ) iter_type ;
+  typedef filter < iter_type , iter_type , math_type > filter_type ;
+  filter_type solver ( count , gain , bc , nbpoles , pole , tolerance ) ;
+
+  // next slice is this far away:
+
+  auto source_stride = source.stride ( axis ) ;
+
+  auto source_base_adress = source.data() ;
+  auto buffer_base_adress = buffer.data() ;
+  auto target_base_adress = target.data() ;
+
+  if ( source.stride() == target.stride() )
+  {
+    // we already know that both arrays have the same shape. If the strides are also the same,
+    // both arrays have the same structure in memory.
+    // If both arrays have the same structure, we can save ourselves the index calculations
+    // for the second array, since the indices would come out the same. target_base_adress
+    // may be the same as source_base_adress, in which case the operation is in-place, but
+    // we can't derive any performance benefit from the fact.
+
+    // pick the first slice of source along the processing axis
+
+    auto source_slice = source.bindAt ( axis , 0 ) ;
+
+    // we permute the slice's strides to ascending order to make the memory access
+    // as efficient as possible.
+
+    auto permuted_slice = source_slice.permuteStridesAscending() ;
+    
+    // we iterate over the elements in this slice - not to access them, but to
+    // calculate their offset from the first one. This may not be the most efficient
+    // way but it's simple and foolproof and will only be needed once per count values.
+
+    auto source_sliter = permuted_slice.begin() ;
+    auto source_sliter_end = permuted_slice.end() ;
+
+    while ( source_sliter < source_sliter_end )
+    {
+      // copy from the array to the buffer with a monadic gather, casting to
+      // math_type in the process
+      
+      auto source_index = &(*source_sliter) - source_base_adress ;
+      
+      gather < source_type , math_type > ( source_base_adress + source_index ,
+                                           buffer_base_adress ,
+                                           source_stride ,
+                                           count ) ;
+                              
+      // finally (puh): apply the prefilter, using the solver in-place, iterating over
+      // the vectors in buffer with maximum efficiency.
+                              
+      solver.solve ( buffer.begin() ) ;
+      
+      // and perform a monadic scatter to write the filtered data to the destination,
+      // casting to target_type in the process
+
+      scatter< math_type , target_type > ( buffer_base_adress ,
+                                           target_base_adress + source_index ,
+                                           source_stride ,
+                                           count ) ;
+      ++source_sliter ;
+    }
+  }
+  else
+  {
+    // pretty much the same as the previouse operation, with the distinction that
+    // copying the filtered data from the buffer to the target now needs it's own
+    // index etc., since all these may be different.
+    // TODO we might permute source_slice's strides to ascending and apply the same
+    // permutation to target_slice.
+    
+    auto source_slice = source.bindAt ( axis , 0 ) ;
+    auto source_sliter = source_slice.begin() ;
+    auto source_sliter_end = source_slice.end() ;
+
+    auto target_slice = target.bindAt ( axis , 0 ) ;
+    auto target_stride = target.stride ( axis ) ;
+    auto target_sliter = target_slice.begin() ;
+
+    while ( source_sliter < source_sliter_end )
+    {
+      auto source_index = &(*source_sliter) - source_base_adress ;
+      auto target_index = &(*target_sliter) - target_base_adress ;
+      
+      gather < source_type , math_type > ( source_base_adress + source_index ,
+                                           buffer_base_adress ,
+                                           source_stride ,
+                                           count ) ;
+                                           
+      solver.solve ( buffer.begin() ) ;
+      
+      scatter< math_type , target_type > ( buffer_base_adress ,
+                                           target_base_adress + target_index ,
+                                           target_stride ,
+                                           count ) ;
+      ++source_sliter ;
+      ++target_sliter ;
+    }
+  }
+}
+
+// the use of Vc has to be switched on with the flag USE_VC.
+// before we can code the vectorized analogon of nonaggregating_filter, we need
+// some more infrastructure code:
+
+#ifdef USE_VC
+
+/// extended gather and scatter routines taking 'extrusion parameters'
+/// which handle how many times and with which stride the gather/scatter
+/// operation is repeated. With these routines, strided memory can be
+/// copied to a compact chunk of properly aligned memory and back.
+/// The gather routine gathers from source, which points to strided memory,
+/// and deposits in target, which is compact.
+/// The scatter routine scatters from source, which points to compact memory,
+/// and deposits in target, which points to strided memory.
+/// Initially I coded using load/store operations to access the 'non-compact'
+/// memory as well, if the indexes were contiguous, but surprisingly, this was
+/// slower. I like the concise expression with this code - instead of having
+/// variants for load/store vs. gather/scatter and masked/unmasked operation,
+/// the modus operandi is determined by the indices and mask passed, which is
+/// relatively cheap as it occurs only once, while the inner loop can just
+/// rip away.
+/// per default, the type used for gather/scatter indices (gs_indexes_type)
+/// will be what Vc deems appropriate. This comes out as an SIMD type composed
+/// of int, and ought to result in the fastest code on the machine level.
+/// But since the only *requirement* on gather/scatter indices is that they
+/// offer a subscript operator (and hold enough indices), other types can be
+/// used as gs_indexes_type as well. Below I make the disticzion and pass in
+/// a TinyVector of ptrdiff_t if int isn't sufficiently large to hold the
+/// intended indices. On my system, this is actually faster.
+
+template < typename source_type ,     // (singular) source type
+           typename target_type ,     // (simdized) target type
+           typename index_type ,      // (singular) index type for stride, count
+           typename gs_indexes_type > // type for gather/scatter indices
+void
+gather ( const source_type * source ,
+         target_type * target ,
+         const gs_indexes_type & indexes ,
+         const typename target_type::Mask & mask ,
+         const index_type & stride ,
+         index_type count
+       )
+{
+  // fix the type into which to gather source data
+  enum { vsize = target_type::Size } ;
+  typedef typename Vc::SimdArray < source_type , vsize > simdized_source_type ;
+
+  // if the mask is all-true, load the data with an unmasked gather operation
+  if ( mask.isFull() )
+  {
+    while ( count-- )
+    {
+// while Vc hasn't yet implemented gathering using intrinsices (from AVX2)
+// I played with using tem directly to see if I could get better performance.
+// So far it looks like as if the prefiltering code doesn't benefit.
+//       __m256i ix = _mm256_loadu_si256 ( (const __m256i *)&(indexes) ) ;
+//       __m256 fv = _mm256_i32gather_ps (source, ix, 4) ;
+      simdized_source_type x ( source , indexes ) ;
+      * target = target_type ( x ) ;
+      source += stride ;
+      ++ target ;
+    }
+  }
+  else
+  {
+    // if there is a partially filled mask, perform a masked gather operation
+    while ( count-- )
+    {
+      simdized_source_type x ( source , indexes , mask ) ;
+      * target = target_type ( x ) ;
+      source += stride ;
+      ++ target ;
+    }
+  }
+}
+
+template < typename source_type ,     // (simdized) source type
+           typename target_type ,     // (singular) target type
+           typename index_type ,      // (singular) index type for stride, count
+           typename gs_indexes_type > // type for gather/scatter indices
+void
+scatter ( const source_type * source ,
+          target_type * target ,
+          const gs_indexes_type & indexes ,
+          const typename source_type::Mask & mask ,
+          const index_type & stride ,
+          index_type count
+        )
+{
+  // fix the type from which to scatter target data
+  enum { vsize = source_type::Size } ;
+  typedef typename Vc::SimdArray < target_type , vsize > simdized_target_type ;
+
+  // if the mask is full, deposit with an unmasked scatter
+  if ( mask.isFull() )
+  {
+    while ( count-- )
+    {
+      simdized_target_type x ( *source ) ;
+      x.scatter ( target , indexes ) ;
+      ++ source ;
+      target += stride ;
+    }
+  }
+  else
+  {
+    // if there is a partially filled mask, perform a masked scatter operation
+    while ( count-- )
+    {
+      simdized_target_type x ( *source ) ;
+      x.scatter ( target , indexes , mask ) ;
+      ++ source ;
+      target += stride ;
+    }
+  }
+}
+
+/// aggregating_filter keeps a buffer of vector-aligned memory, which it fills from
+/// vsize 1D subarrays of the source array which are collinear to the processing axis.
+/// Note that the vectorization, or aggregation axis is *orthogonal* to the processing
+/// axis, since the adjacency of neighbours along the processing axis needs to be
+/// preserved for filtering.
+/// The buffer is then submitted to vectorized forward-backward recursive filtering
+/// and finally stored back to the corresponding memory area in target, which may
+/// be the same as source, in which case the operation is seemingly performed
+/// in-place (while in fact the buffer is still used). Buffering takes the bulk
+/// of the processing time (on my system), the vectorized maths are fast by
+/// comparison. Depending on data type, array size and spline degree, sometimes the
+/// nonvectorized code is faster. But as both grow, bufering soon comes out on top.
+/// ele_aggregating_filter is a subroutine processing arrays of elementary value_type.
+/// It's used by aggregating_filter, after element-expanding the array(s).
+/// With this vectorized routine and the size of gather/scatter indices used by Vc
+/// numeric overflow could occur: the index type is only int, while it's assigned a
+/// ptrdiff_t, which it may not be able to represent. The overflow can happen when
+/// a gather/scatter spans a too-large memory area. The gather/scatter indices will
+/// be set up so that the first index is always 0 (by using the adress of the first
+/// storee, not the array base adress), but even though this makes it less likely for
+/// the overflow to occur, it still can happen. In this case the code falls back
+/// to using a vigra::TinyVector < ptrdiff_t > as gather/scatter index type, which
+/// may cause Vc to use less performant code for the gather/scatter operations but
+/// is safe.
+// TODO: using different vsize for different axes might be faster.
+
+template < typename source_view_type ,
+           typename target_view_type ,
+           typename math_type >
+void ele_aggregating_filter ( source_view_type &source ,
+                              target_view_type &target ,
+                              int axis ,
+                              double gain ,
+                              bc_code bc ,
+                              int nbpoles ,
+                              const double * pole ,
+                              double tolerance
+                            )
+{
+  // for prefiltering, using Vc:Vectors seems faster than using SimdArrays of twice the size,
+  // which are used as simdized type in evaluation
+
+  typedef typename Vc::Vector < math_type > simdized_math_type ;
+       // number of math_type in a simdized_math_type
+  const int vsize ( simdized_math_type::Size ) ;
+  
+  typedef typename source_view_type::value_type source_type ;
+  typedef typename Vc::SimdArray < source_type , vsize > simdized_source_type ;
+  
+  typedef typename target_view_type::value_type target_type ;
+  typedef typename Vc::SimdArray < target_type , vsize > simdized_target_type ;
+  
+  // indexes for gather/scatter. first the 'optimal' type, which Vc produces as
+  // the IndexType for simdized_math_type. Next a wider type composed of std::ptrdiff_t,
+  // to be used initially when calculating the indices, and optionally later for the
+  // actual gather/scatter operations if gs_indexes_type isn't wide enough.
+
+  typedef typename simdized_math_type::IndexType gs_indexes_type ;
+  typedef vigra::TinyVector < std::ptrdiff_t , vsize > comb_type ;
+  
+  // mask type for masked operation
+  typedef typename simdized_math_type::MaskType mask_type ;
+  
+  auto count = source.shape ( axis ) ; // number of vectors we'll process
+
+  // I initially tried to use Vc::Memory, but the semantics of the iterator obtained
+  // by buffer.begin() didn't work for me.
+  // anyway, a MultiArray with the proper allocator works just fine, and the dereferencing
+  // of the iterator needed in the solver works without further ado. 
+  
+  vigra::MultiArray < 1 , simdized_math_type , Vc::Allocator<simdized_math_type> >
+    buffer ( count ) ;
+
+  // avoiding being specific about the iterator's type allows us to slot in
+  // any old iterator we can get by calling begin() on buffer 
+  
+  typedef decltype ( buffer.begin() ) viter_type ;
+
+  // set of offsets into the source slice which will be used for gather/scatter
+
+  comb_type source_indexes ;
+  
+  // while we don't hit the last odd few 1D subarrays the mask is all-true
+
+  mask_type mask ( true ) ;
+  
+  // next slice is this far away:
+
+  auto source_stride = source.stride ( axis ) ;
+
+  // we want to use the extended gather/scatter (with 'extrusion'), so we need the
+  // source and target pointers. Casting buffer's data pointer to math_type is safe,
+  // Since the simdized_type objects stored there are merely raw math_type data
+  // in disguise.
+
+  auto source_base_adress = source.data() ;
+  auto buffer_base_adress = buffer.data() ;
+  auto target_base_adress = target.data() ;
+
+  gs_indexes_type source_gs_indexes ;
+  gs_indexes_type target_gs_indexes ;      
+
+  // we create a solver object capable of handling the iterator producing the successive
+  // simdized_types from the buffer. While the unvectorized code can omit passing the third
+  // template argument (the elementary type used inside the solver) we pass it here, as we
+  // don't define an element-expansion via vigra::ExpandElementResult for simdized_type.
+
+  typedef filter < viter_type , viter_type , math_type > filter_type ;
+  filter_type solver ( count , gain , bc , nbpoles , pole , tolerance ) ;
+
+  if ( source.stride() == target.stride() )
+  {
+    // we already know that both arrays have the same shape. If the strides are also the same,
+    // both arrays have the same structure in memory.
+    // If both arrays have the same structure, we can save ourselves the index calculations
+    // for the second array, since the indexes would come out the same. target_base_adress
+    // may be the same as source_base_adress, in which case the operation is in-place, but
+    // we can't derive any performance benefit from the fact.
+
+    // pick the first slice of source along the processing axis
+
+    auto source_slice = source.bindAt ( axis , 0 ) ;
+
+    // we permute the slice's strides to ascending order to make the memory access
+    // as efficient as possible.
+
+    auto permuted_slice = source_slice.permuteStridesAscending() ;
+    
+    // we iterate over the elements in this slice - not to access them, but to
+    // calculate their offset from the first one. This may not be the most efficient
+    // way but it's simple and foolproof and will only be needed once per count values.
+
+    auto source_sliter = permuted_slice.begin() ;
+    auto source_sliter_end = permuted_slice.end() ;
+    
+    while ( source_sliter < source_sliter_end )
+    {
+      // try loading vsize successive offsets into an comb_type
+      
+      int e ;
+      
+      // we base the operation so that the first entry in source_indexes
+      // will come out 0.
+  
+      auto first_source_adress = &(*source_sliter) ;
+      auto offset = first_source_adress - source_base_adress ;
+      auto first_target_adress = target_base_adress + offset ;
+      
+      for ( e = 0 ; e < vsize && source_sliter < source_sliter_end ; ++e , ++source_sliter )
+        
+        source_indexes[e] = &(*source_sliter) - first_source_adress ;
+      
+      if ( e < vsize )
+        
+        // have got less than vsize? must be the last few items.
+        // mask was all-true before, so now we limit it to the first e fields:
+        
+        mask = ( simdized_math_type::IndexesFromZero() < e ) ;
+
+      // next we assign the indices (which are ptrdiff_t) to the intended type
+      // for gather/scatter indices - which is what Vc deems appropriate. This should
+      // be the optimal choice in terms of performance. Yet we can't be certain that
+      // the ptrdiff_t values actually fit into this type, which is usually composed of
+      // int only. So we test if the assigned value compares equal to the assignee.
+      // If the test fails for any of the indices, we switch to code using a
+      // vigra::TinyVector < ptrdiff_t > for the indices, which is permissible, since
+      // TinyVector offers operator[], but may be less efficient.
+      // Note: Vc hasn't implemented the gather with intrinsics for AVX2, that's why
+      // using gs_indexes_type can't yet have a speedup effect.
+      // Note: since the gathers are often from widely spaced locations, there is
+      // not too much benefit to be expected.
+      
+      bool fits = true ;
+      for ( e = 0 ; fits && ( e < vsize ) ; e++ )
+      {
+        source_gs_indexes[e] = source_indexes[e] ;
+        if ( source_gs_indexes[e] != source_indexes[e] )
+          fits = false ;
+      }
+      
+      if ( fits )
+      {
+        // perform extended gather with extrusion parameters to transport the unfiltered data
+        // to the buffer, passing in source_gs_indexes for best performance.
+        
+        gather
+          ( first_source_adress ,
+            buffer_base_adress ,
+            source_gs_indexes ,
+            mask ,
+            source_stride ,
+            count ) ;
+                                
+        // finally (puh): apply the prefilter, using the solver in-place, iterating over
+        // the vectors in buffer with maximum efficiency.
+                                
+        solver.solve ( buffer.begin() ) ;
+        
+        // and perform extended scatter with extrusion parameters to write the filtered data
+        // to the destination
+
+        scatter
+          ( buffer_base_adress ,
+            first_target_adress ,
+            source_gs_indexes ,
+            mask ,
+            source_stride ,
+            count ) ;
+      }
+      else
+      {
+        // Since the indices did not fit into the optimal type for gather/scatter
+        // indices, we pass in a wider type, which may reduce performance, but is
+        // necessary under the circumstances. But this should rarely happen:
+        // it would mean a gather/scatter spanning several GB.
+        
+        gather
+          ( first_source_adress ,
+            buffer_base_adress ,
+            source_indexes ,
+            mask ,
+            source_stride ,
+            count ) ;
+                                
+        solver.solve ( buffer.begin() ) ;
+        
+        scatter
+          ( buffer_base_adress ,
+            first_target_adress ,
+            source_indexes ,
+            mask ,
+            source_stride ,
+            count ) ;
+      }
+    }
+  }
+  else
+  {
+    // pretty much the same as the if(...) case, with the distinction that copying
+    // the filtered data from the buffer to the target now needs it's own set of
+    // indexes etc., since all these may be different.
+
+    // TODO we might permute source_slice's strides to ascending and apply the same
+    // permutation to target_slice.
+    
+    auto source_slice = source.bindAt ( axis , 0 ) ;
+    auto source_sliter = source_slice.begin() ;
+    auto source_sliter_end = source_slice.end() ;
+
+    auto target_slice = target.bindAt ( axis , 0 ) ;
+    auto target_stride = target.stride ( axis ) ;
+    auto target_sliter = target_slice.begin() ;
+    comb_type target_indexes ;
+
+    while ( source_sliter < source_sliter_end )
+    {
+      int e ;
+      auto first_source_adress = &(*source_sliter) ;
+      auto first_target_adress = &(*target_sliter) ;
+      
+      for ( e = 0 ;
+           e < vsize && source_sliter < source_sliter_end ;
+           ++e , ++source_sliter , ++target_sliter )
+      {
+        source_indexes[e] = &(*source_sliter) - first_source_adress ;
+        target_indexes[e] = &(*target_sliter) - first_target_adress ;
+      }
+      if ( e < vsize )
+        mask = ( simdized_math_type::IndexesFromZero() < e ) ;
+      
+      // similar code here for the idexes, see notes above.
+
+      bool fits = true ;
+      for ( e = 0 ; fits && ( e < vsize ) ; e++ )
+      {
+        source_gs_indexes[e] = source_indexes[e] ;
+        target_gs_indexes[e] = target_indexes[e] ;
+        if (    source_gs_indexes[e] != source_indexes[e]
+             || target_gs_indexes[e] != target_indexes[e] )
+          fits = false ;
+      }
+
+      if ( fits )
+      {
+        gather
+          ( first_source_adress ,
+            buffer_base_adress ,
+            source_gs_indexes ,
+            mask ,
+            source_stride ,
+            count ) ;
+        solver.solve ( buffer.begin() ) ;
+        scatter
+          ( buffer_base_adress ,
+            first_target_adress ,
+            target_gs_indexes ,
+            mask ,
+            target_stride ,
+            count ) ;
+      }
+      else
+      {
+        gather
+          ( first_source_adress ,
+            buffer_base_adress ,
+            source_indexes ,
+            mask ,
+            source_stride ,
+            count ) ;
+        solver.solve ( buffer.begin() ) ;
+        scatter
+          ( buffer_base_adress ,
+            first_target_adress ,
+            target_indexes ,
+            mask ,
+            target_stride ,
+            count ) ;
+      }
+    }
+  }
+}
+
+/// here we provide a common routine 'aggregating_filter', which works for elementary
+/// value_types and also for aggregate value_types. Processing is different for these
+/// two cases, because the vector code can only process elementary types, and if
+/// value_type isn't elementary, we need to element-expand the source and target
+/// arrays. Since this routine is the functor passed to multithread() and therefore
+/// receives a range parameter to pick out a subset of the data to process in the
+/// single thread, we also take the opportunity here to pick out the subarrays
+/// for further processing.
+
+template < class source_type ,
+           class target_type ,
+           typename math_type >
+void aggregating_filter ( range_type < typename source_type::difference_type > range ,
+                          source_type * p_original_source ,
+                          target_type * p_original_target ,
+                          int axis ,
+                          double gain ,
+                          bc_code bc ,
+                          int nbpoles ,
+                          const double * pole ,
+                          double tolerance
+                        )
+{
+  const int dim = source_type::actual_dimension ;
+  typedef typename source_type::value_type value_type ;
+  static_assert ( std::is_same < value_type , typename target_type::value_type > :: value ,
+    "aggregating_filter: both arrays must have the same value_type" ) ;
+  typedef typename vigra::ExpandElementResult < value_type > :: type ele_type ;
+
+  // continue processing on the subarrays of source and target specified by 'range':
+
+  auto source = p_original_source->subarray ( range[0] , range[1] ) ;
+  auto target = p_original_target->subarray ( range[0] , range[1] ) ;
+  
+  // value_type may be an aggregate type, but we want to operate on elementary types
+  // so we element-expand the array and call ele_aggregating_filter, which works on
+  // arrays with elementary types. If value_type is elementary already, the call to
+  // expandElements inserts a singleton dimension, but this has next to no performance
+  // impact, so contrary to my initial implementation I don't handle the 1-channel
+  // case separately any more.
+
+  auto expanded_source = source.expandElements ( 0 ) ;
+  auto expanded_target = target.expandElements ( 0 ) ;
+
+  // with the element-expanded arrays at hand, we can now delegate to ele_aggregating_filter:
+  
+  ele_aggregating_filter < decltype ( expanded_source ) ,
+                           decltype ( expanded_target ) ,
+                           math_type >
+              ( expanded_source ,
+                expanded_target ,
+                axis + 1 ,
+                gain ,
+                bc ,
+                nbpoles ,
+                pole ,
+                tolerance ) ;
+}
+
+#endif
+
+/// Now we have the routines which perform the buffering and filtering for a chunk of data,
+/// We add code for multithreading. This is done by using utility code from multithread.h.
+///
+/// filter_1d, which is the routine processing nD arrays along a specific axis, might as well
+/// be a function. But functions can't be partially specialized (at least not with my compiler)
+/// so I use a functor instead, which, as a class, can be partially specialized. We'll want a
+/// partial specialization for 1D arrays, where all of our usual schemes of multithreading and
+/// vectorization don't intrinsically work and we have to employ a different method, see there.
+
+template < typename input_array_type ,  ///< type of array with knot point data
+           typename output_array_type , ///< type of array for coefficients (may be the same)
+           typename math_type ,         ///< real data type used for calculations inside the filter
+           int dim >
+class filter_1d
+{
+public:
+  void operator() ( input_array_type &input ,    ///< source data. can also operate in-place,
+                    output_array_type &output ,  ///< where input == output.
+                    int axis ,
+                    double gain ,
+                    bc_code bc ,                 ///< boundary treatment for this solver
+                    int nbpoles ,
+                    const double * pole ,
+                    double tolerance ,
+                    int njobs = default_njobs )  ///< number of jobs to use when multithreading
+{
+  typedef typename input_array_type::value_type value_type ;
+
+  // depending on whether Vc is used or not, we choose the appropriate (single-threaded)
+  // filtering routine, which is to be passed to multitheread()
+
+#ifdef USE_VC
+
+  typedef typename vigra::ExpandElementResult < value_type > :: type ele_type ;
+
+  auto pf = & aggregating_filter < input_array_type ,
+                                   output_array_type ,
+                                   ele_type > ;
+
+#else
+
+  auto pf = & nonaggregating_filter < input_array_type ,
+                                      output_array_type ,
+                                      value_type > ;
+  
+#endif
+  
+  // obtain a partitioning of the data array into subranges. We do this 'manually' here
+  // because we must instruct shape_splitter not to chop up the current processing axis
+  // (by passing axis as the 'forbid' parameter)
+
+  auto partitioning = shape_splitter<dim>::part ( input.shape() , njobs , axis ) ;
+  
+  // now use multithread() to distribute ranges of data to individual jobs which are
+  // executed by the it's thread pool.
+  
+  multithread ( pf ,
+                partitioning ,
+                &input ,
+                &output ,
+                axis ,
+                gain ,
+                bc ,
+                nbpoles ,
+                pole ,
+                tolerance ) ;
+}
+} ;
+
+/// now here's the specialization for *1D arrays*. It may come as a surprise that it looks
+/// nothing like the nD routine. This is due to the fact that we follow a specific strategy:
+/// We 'fold up' the 1D array into a 'fake 2D' array, process this 2D array with the nD code
+/// which is very efficient, and 'mend' the stripes along the margins of the fake 2D array
+/// which contain wrong results due to the fact that some boundary condition appropriate
+/// for the 2D case was applied.
+/// With this 'cheat' we can handle 1D arrays with full multithreading and vectorization,
+/// while the 'orthodox' approach would have to process the data in linear order with
+/// a single thread. Cleaning up the 'dirty' margins is cheap for large arrays.
+/// The code is making guesses as to whether it's worth while to follow this strategy;
+/// the array has to be quite large before 'fake 2D processing' is actually applied.
+
+template < typename input_array_type ,  ///< type of array with knot point data
+           typename output_array_type , ///< type of array for coefficients (may be the same)
+           typename math_type >         ///< type for calculations inside filter
+class filter_1d < input_array_type ,
+                  output_array_type ,
+                  math_type ,
+                  1 >                 // specialize for 1D
+{
+public:
+  void operator() ( input_array_type &input ,    ///< source data. can operate in-place
+                    output_array_type &output ,  ///< where input == output.
+                    int axis ,
+                    double gain ,
+                    bc_code bc ,                 ///< boundary treatment for this solver
+                    int nbpoles ,
+                    const double * pole ,
+                    double tolerance ,
+                    int njobs = default_njobs )  ///< number of jobs to use
+{
+  typedef typename input_array_type::value_type value_type ;
+  typedef decltype ( input.begin() ) input_iter_type ;
+  typedef decltype ( output.begin() ) output_iter_type ;
+  typedef vspline::filter < input_iter_type , output_iter_type , double > filter_type ;
+  typedef typename vigra::ExpandElementResult < value_type > :: type ele_type ;
+
+  const int bands = vigra::ExpandElementResult < value_type > :: size ;
+  int runup ;
+
+  // if we can multithread, start out with as many lanes as the desired number of threads
+
+  int lanes = njobs ;
+  
+#ifdef USE_VC
+ 
+//   const int vsize = vector_traits < ele_type > :: vsize ;
+  const int vsize = Vc::Vector < ele_type > :: Size ;
+  
+  // if we can use vector code, the number of lanes is multiplied by the
+  // number of elements a simdized type inside the vector code can handle
+
+  lanes *= vsize ;
+
+#endif
+
+  // we give the filter some space to run up to precision
+  
+  if ( tolerance <= 0.0 )
+  {
+    // we can't use the fake_2d method if the tolerance is 0.0
+    lanes = 1 ;
+  }
+  else
+  {
+    // there are minimum requirements for using the fake 2D filter. First find
+    // the horizon at the given tolerance
+    
+    int horizon = ceil ( log ( tolerance ) / log ( fabs ( pole[0] ) ) ) ;
+    
+    // this is just as much as we want for the filter to run up to precision
+    // starting with BC code 'ZEROPAD' at the margins
+    
+    runup = horizon ;
+    
+    // the absolute minimum to successfully run the fake 2D filter is this:
+    // TODO we might rise the threshold, min_length, here
+    
+    int min_length = 4 * runup * lanes + 2 * runup ;
+    
+    // input is too short to bother with fake 2D, just single-lane it
+    
+    if ( input.shape(0) < min_length )
+    {
+      lanes = 1 ;
+    }
+    else
+    {
+      // input is larger than the absolute minimum, maybe we can even increase
+      // the number of lanes some more? we'd like to do this if the input is
+      // very large, since we use buffering and don't want the buffers to become
+      // overly large. But the smaller the run along the split x axis, the more
+      // incorrect margin values we have to mend, so we need a compromise.
+      // assume a 'good' length for input: some length where further splitting
+      // would not be wanted anymore. TODO: do some testing, find a good value
+      
+      int good_length = 64 * runup * lanes + 2 * runup ;
+      
+      int split = 1 ;
+      
+      // suppose we split input.shape(0) in ( 2 * split ) parts, is it still larger
+      // than this 'good' length? If not, leave split factor as it is.
+      
+      while ( input.shape(0) / ( 2 * split ) >= good_length )
+      {  
+        // if yes, double split factor, try again
+        split *= 2 ;
+      }
+      
+      lanes *= split ; // increase number of lanes by additional split
+    }
+    
+  }
+  
+  // if there's only one lane we just use this simple code:
+
+  if ( lanes == 1 )
+  {
+    // this is a simple single-threaded implementation
+    filter_type solver ( input.shape(0) ,
+                         gain ,
+                         bc ,
+                         nbpoles ,
+                         pole ,
+                         0.0 ) ;
+    solver.solve ( input.begin() , output.begin() ) ;
+    return ; // return prematurely, saving us an else clause
+  }
+  
+  // the input qualifies for fake 2D processing.
+
+//   std::cout << "fake 2D processing with " << lanes << " lanes" << std::endl ;
+  
+  // we want as many chunks as we have lanes. There may be some data left
+  // beyond the chunks (tail_size of value_type)
+  
+  int core_size = input.shape(0) ;
+  int chunk_size = core_size / lanes ;
+  core_size = lanes * chunk_size ;
+  int tail_size = input.shape(0) - core_size ;
+  
+  // just doublecheck
+
+  assert ( core_size + tail_size == input.shape(0) ) ;
+  
+  // now here's the strategy: we treat the data as if they were 2D. This will
+  // introduce errors along the 'vertical' margins, since there the 2D treatment
+  // will start with some boundary condition along the x axis instead of looking
+  // at the neighbouring line where the actual continuation is.
+  
+  // create buffers for head and tail
+  
+  vigra::MultiArray < 1 , value_type > head ( 2 * runup ) ;
+  vigra::MultiArray < 1 , value_type > tail ( tail_size + 2 * runup ) ;
+  
+  // filter the beginning and end of the signal into these buffers. Note how
+  // we call this filter with the boundary condition passed in
+
+  filter_type head_solver ( head.size() ,
+                            gain ,
+                            bc ,
+                            nbpoles ,
+                            pole ,
+                            0.0 ) ;
+                            
+  head_solver.solve ( input.begin() , head.begin() ) ;
+  
+  filter_type tail_solver ( tail.size() ,
+                            gain ,
+                            bc ,
+                            nbpoles ,
+                            pole ,
+                            0.0 ) ;
+
+  tail_solver.solve ( input.end() - tail.shape(0) , tail.begin() ) ;
+  
+  // head now has runup correct values at the beginning, succeeded by runup invalid
+  // values, and tail has tail_size + runup correct values at the end, preceded by
+  // runup values which aren't usable, which were needed to run the filter
+  // up to precision. We'll use these correct data later to make up for the fact that
+  // the margin treatment omits the beginning and end of the data.
+
+  // now we create a fake 2D view to the margin of the data. Note how we let the
+  // view begin 2 * runup before the end of the first line, capturing the 'wraparound'
+  // right in the middle of the view
+  
+  typedef vigra::MultiArrayView < 2 , value_type > fake_2d_type ;
+  
+  fake_2d_type
+    fake_2d_margin ( vigra::Shape2 ( 4 * runup , lanes - 1 ) ,
+                     vigra::Shape2 ( input.stride(0) , input.stride(0) * chunk_size ) ,
+                     input.data() + chunk_size - 2 * runup ) ;
+ 
+  // again we create a buffer and filter into the buffer
+
+  vigra::MultiArray < 2 , value_type > margin_buffer ( fake_2d_margin.shape() ) ;
+  
+  filter_1d < fake_2d_type , fake_2d_type , math_type , 2 > ()
+    ( fake_2d_margin ,
+      margin_buffer ,
+      0 ,
+      gain ,
+      GUESS ,
+      nbpoles ,
+      pole ,
+      tolerance ,
+      1 ) ;
+ 
+  // now we have filtered data for the margins in margin_buffer, of which the central half
+  // is usable, the remainder being runup data which we'll ignore. Here's a view to the
+  // central half:
+  
+  vigra::MultiArrayView < 2 , value_type > margin
+  = margin_buffer.subarray ( vigra::Shape2 ( runup , 0 ) ,
+                             vigra::Shape2 ( 3 * runup , lanes - 1 ) ) ;
+  
+  // we already create a view to the target array's margin which we intend to overwrite,
+  // but the data will only be copied in from margin after the treatment of the core.
+
+  vigra::MultiArrayView < 2 , value_type >
+    margin_target ( vigra::Shape2 ( 2 * runup , lanes - 1 ) ,
+                    vigra::Shape2 ( output.stride(0) , output.stride(0) * chunk_size ) ,
+                    output.data() + chunk_size - runup ) ;
+                    
+  // next we fake a 2D array from input and filter it to output, this may be an
+  // in-place operation, since we've extracted all margin information earlier and
+  // deposited what we need in buffers
+  
+  fake_2d_type
+    fake_2d_source ( vigra::Shape2 ( chunk_size , lanes ) ,
+                     vigra::Shape2 ( input.stride(0) , input.stride(0) * chunk_size ) ,
+                     input.data() ) ;
+
+  fake_2d_type
+    fake_2d_target ( vigra::Shape2 ( chunk_size , lanes ) ,
+                     vigra::Shape2 ( output.stride(0) , output.stride(0) * chunk_size ) ,
+                     output.data() ) ;
+  
+  // now we filter the fake 2D source to the fake 2D target
+
+  filter_1d < fake_2d_type , fake_2d_type , math_type , 2 > ()
+    ( fake_2d_source ,
+      fake_2d_target ,
+      0 ,
+      gain ,
+      GUESS ,
+      nbpoles ,
+      pole ,
+      tolerance ,
+      njobs ) ;
+
+  // we now have filtered data in target, but the stripes along the magin
+  // in x-direction (1 runup wide) are wrong, because we applied GUESS BC.
+  // this is why we have the data in 'margin', and we now copy them to the
+  // relevant section of 'target'
+               
+  margin_target = margin ;
+  
+  // finally we have to fix the first and last few values, which weren't touched
+  // by the margin operation (due to margin's offset and length)
+  
+  typedef vigra::Shape1 dt ;
+  
+  output.subarray ( dt(0) , dt(runup) )
+    = head.subarray ( dt(0) , dt(runup) ) ;
+
+  output.subarray ( dt(output.size() - tail_size - runup ) , dt(output.size()) )
+    = tail.subarray ( dt(tail.size() - tail_size - runup ) , dt(tail.size()) ) ;
+}
+} ;
+
+/// This routine calls the 1D filtering routine for all axes in turn. This is the
+/// highest-level routine in filter.h, and the only routine used by other code in
+/// vspline. It has no code specific to b-splines, any set of poles will be processed.
+/// To use this routine for b-splines, the correct poles have to be passed in, which
+/// is done in prefilter.h, where the code for prefiltering the knot point data
+/// calls filter_nd with the poles needed for a b-spline.
+///
+/// This routine takes the following parameters:
+///
+/// - input, output: MultiArrayViews of the source and target array
+/// - bc: TinyVector of boundary condition codes, allowing separate values for each axis
+/// - nbpoles: number of filter poles
+/// - pole: pointer to nbpoles doubles containing the filter poles
+/// - tolerance: acceptable error
+/// - njobs: number of jobs to use when multithreading
+
+template < typename input_array_type ,  // type of array with knot point data
+           typename output_array_type , // type of array for coefficients (may be the same)
+           typename math_type >         // type used for arithmetic operations in filter
+void filter_nd ( input_array_type & input ,
+                 output_array_type & output ,
+                 vigra::TinyVector<bc_code,input_array_type::actual_dimension> bc ,
+                 int nbpoles ,
+                 const double * pole ,
+                 double tolerance ,
+                 int njobs = default_njobs )
+{
+  // check if operation is in-place. I assume that the test performed here
+  // is sufficient to determine if the operation is in-place.
+  
+  bool in_place = false ;
+  
+  if ( (void*)(input.data()) == (void*)(output.data()) )
+    in_place = true ;
+
+  // if input == output, with degree <= 1 we needn't do anything at all.
+  
+  if ( in_place && nbpoles < 1 )
+    return ;
+
+  // do a bit of compatibility checking
+  
+  const int dim = input_array_type::actual_dimension ;
+  
+  if ( output_array_type::actual_dimension != dim )
+  {
+    throw dimension_mismatch ( "input and output array must have the same dimension" ) ;
+  }
+  
+  typedef typename input_array_type::difference_type diff_t ;
+  diff_t shape = input.shape() ;
+  if ( output.shape() != shape )
+  {
+    throw shape_mismatch ( "input and output array must have the same shape" ) ;
+  }
+
+  // normally the gain is the same for all dimensions.
+
+  double gain_d0 = overall_gain ( nbpoles , pole ) ;
+  double gain_dn = gain_d0 ;
+
+  // deactivating the code below may produce slightly more precise results
+  // This bit of code results in applictation of the cumulated gain for all dimensions
+  // while processing axis 0, and no gain application for subsequent axes.
+  // heuristic. for high degrees, below optimization reduces precision too much
+  // TODO: the effect of this optimization seems negligible.
+  
+  if ( dim > 1 && pow ( nbpoles , dim ) < 32 )
+  {
+    gain_d0 = pow ( gain_d0 , dim ) ;
+    gain_dn = 1.0 ;
+  }
+
+  // even if degree <= 1, we'll only arrive here if input != output.
+  // So we still have to copy the input data to the output (solve_identity)
+  
+  filter_1d < input_array_type , output_array_type , math_type , dim > ()
+    ( input ,
+      output ,
+      0 ,
+      gain_d0 ,
+      bc[0] ,
+      nbpoles ,
+      pole ,
+      tolerance ,
+      njobs ) ;
+
+  // but if degree <= 1 we're done already, since copying the data again
+  // in dimensions 1... is futile
+
+  if ( nbpoles > 0 )
+  {
+    // so for the remaining dimensions we also call the filter.
+    for ( int d = 1 ; d < dim ; d++ )
+      filter_1d < output_array_type , output_array_type , math_type , dim > ()
+        ( output ,
+          output ,
+          d ,
+          gain_dn ,
+          bc[d] ,
+          nbpoles ,
+          pole ,
+          tolerance ,
+          njobs ) ;
+  }
+}
+
+} ; // namespace vspline
+
+#endif // VSPLINE_FILTER_H
diff --git a/map.h b/map.h
new file mode 100644
index 0000000..1e97b41
--- /dev/null
+++ b/map.h
@@ -0,0 +1,528 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file map.h
+
+    \brief code to handle out-of-bounds coordinates.
+    
+    Incoming coordinates may not be inside the range which can be evaluated
+    by a functor. There is no one correct way of dealing with out-of-bounds
+    coordinates, so we provide a few common ways of doing it.
+    
+    If the 'standard' gate types don't suffice, users can easily write their
+    own handling code by providing gate objects or mapper objects derived
+    from vspline::unary_functor and chaining them with their functors,
+    the classes provided here can serve as templates.
+    
+    The basic type handling the operation is 'gate_type', which 'treats'
+    a single value or single simdized type. For nD coordinates, we use a
+    set of these gate_type objects, one for each component; each one may be
+    of a distinct type specific to the axis the component belongs to.
+    
+    Application of the gates is via a 'mapper' object, which contains
+    the gate_types and applies them to the components in turn.
+    
+    The final mapper object is a functor which converts an arbitrary incoming
+    coordinate into a 'treated' coordinate (or, for REJECT mode, may throw an
+    out_of_bounds exception). If the treatment succeeds to produce a suitable
+    in-range coordinate, accessing the inner functor with it is safe.
+    
+    gate_type objects are derived from vspline::unary_functor, so they fit in
+    well with other code in vspline and can easily be wrapped in other
+    unary_functor objects, or used stand-alone. class mapper is also derived
+    from unary_functor. With all functors concerned derived from unary_functor,
+    we can use class vspline::chain (in unary_functor.h) to combine the mapper
+    and the inner functor.
+
+*/
+
+#ifndef VSPLINE_MAP_H
+#define VSPLINE_MAP_H
+
+#include <vspline/unary_functor.h>
+#include <vigra/tinyvector.hxx>
+
+namespace vspline
+{
+/// 'mapping' in vspline means the application of a functor to real coordinates,
+/// either rejecting them (with an out_of_bounds exception) when out-of-bounds
+/// or folding them into the defined range in various ways. After this initial
+/// processing the coordinates are 'split' into their integral part and remainder.
+/// The specific mapping mode to be used is coded with these values:
+
+// typedef enum {
+//   MAP_REJECT ,    ///< throw out_of_bounds for out-of-bounds coordinates
+//   MAP_LIMIT ,     ///< clamps out-of-bounds coordinates to the bounds
+//   MAP_CONSTANT ,  ///< replace out-of-bounds coordinates by constant value
+//   MAP_MIRROR ,    ///< mirror-fold into the bounds
+//   MAP_PERIODIC,   ///< periodic-fold into the bounds
+// } map_code;
+// 
+// const std::string mc_name[] =
+// {
+//   "MAP_REJECT" ,
+//   "MAP_LIMIT" ,
+//   "MAP_CONSTANT" ,
+//   "MAP_MIRROR" ,
+//   "MAP_PERIODIC" ,
+// } ;
+
+// /// unspecialized gate_type. only classes specialized by 'mode'
+// /// actually do any work.
+// 
+// template < typename in_type ,
+//            typename out_type ,
+//            int vsize ,
+//            vspline::map_code mode >
+// struct gate_policy
+// {
+// } ;
+
+/// specializations for the supported mapping modes
+
+/// gate with REJECT throws vspline::out_of_bounds for invalid coordinates
+
+template < typename _in_type ,
+           typename _out_type ,
+           int _vsize >
+struct reject_policy
+: public vspline::uf_types < _in_type , _out_type , _vsize >
+{
+  typedef _in_type rc_type ;
+  typedef vspline::uf_types < rc_type , rc_type , _vsize > base_type ;
+  using_unary_functor_types(base_type) ;
+
+  const rc_type lower ;
+  const rc_type upper ;
+  
+  reject_policy ( rc_type _lower ,
+                rc_type _upper )
+  : lower ( _lower ) ,
+    upper ( _upper )
+  { } ;
+
+  void eval ( const in_type & c ,
+                    out_type & result ) const
+  {
+    if ( c < lower || c > upper )
+      throw vspline::out_of_bounds() ;
+    result = c ;
+  }
+  
+#ifdef USE_VC
+
+  void eval ( const in_ele_v & c ,
+                    out_ele_v & result ) const
+  {
+    if ( any_of ( ( c < lower ) | ( c > upper ) ) )
+      throw vspline::out_of_bounds() ;
+    result = c ;
+  }
+
+#endif
+
+} ;
+
+/// gate with LIMIT clamps out-of-bounds values
+
+template < typename _in_type ,
+           typename _out_type ,
+           int _vsize >
+struct limit_policy
+: public vspline::uf_types < _in_type , _out_type , _vsize >
+{
+  typedef _in_type rc_type ;
+  typedef vspline::uf_types < rc_type , rc_type , _vsize > base_type ;
+  using_unary_functor_types(base_type) ;
+  
+  const rc_type lower ;
+  const rc_type upper ;
+  
+  limit_policy ( rc_type _lower ,
+                rc_type _upper )
+  : lower ( _lower ) ,
+    upper ( _upper )
+  { } ;
+
+  void eval ( const in_type & c ,
+                    out_type & result ) const
+  {
+    if ( c < lower )
+      result = lower ;
+    else if ( c > upper )
+      result = upper ;
+    else
+      result = c ;
+  }
+  
+#ifdef USE_VC
+
+  void eval ( const in_ele_v & c ,
+                    out_ele_v & result ) const
+  {
+    out_ele_v cc = c ;
+    cc ( cc < lower ) = lower ;
+    cc ( cc > upper ) = upper ;
+    result = cc ;
+  }
+
+#endif
+
+} ;
+
+/// constant gate assigns 'fix' to out-of-bounds coordinates; by default
+/// it will return 0 for out-of-bounds input, but the value can be chosen
+/// by passing a third argument to the constructor.
+
+template < typename _in_type ,
+           typename _out_type ,
+           int _vsize >
+struct constant_policy
+: public vspline::uf_types < _in_type , _out_type , _vsize >
+{
+  typedef _in_type rc_type ;
+  typedef vspline::uf_types < rc_type , rc_type , _vsize > base_type ;
+  using_unary_functor_types(base_type) ;
+  
+  const rc_type lower ;
+  const rc_type upper ;
+  const rc_type fix ;
+  
+  constant_policy ( rc_type _lower ,
+                rc_type _upper ,
+                rc_type _fix = rc_type(0)
+            )
+  : lower ( _lower ) ,
+    upper ( _upper ) ,
+    fix ( _fix )
+  { } ;
+
+  void eval ( const in_type & c ,
+                    out_type & result ) const
+  {
+    if ( c < lower )
+      result = fix ;
+    else if ( c > upper )
+      result = fix ;
+    else
+      result = c ;
+  }
+  
+#ifdef USE_VC
+
+  void eval ( const in_ele_v & c ,
+                    out_ele_v & result ) const
+  {
+    out_ele_v cc = c ;
+    cc ( cc < lower ) = fix ;
+    cc ( cc > upper ) = fix ;
+    result = cc ;
+  }
+
+#endif
+
+} ;
+
+#ifdef USE_VC
+
+template <typename rc_v>
+rc_v v_fmod ( const rc_v& lhs ,
+              const typename rc_v::EntryType rhs )
+{
+  typedef typename vector_traits < int , rc_v::Size > :: ele_v ic_v ;
+
+  ic_v iv ( lhs / rhs ) ;
+  return lhs - rhs * rc_v ( iv ) ;
+}
+
+#endif
+
+/// gate with mirror 'folds' coordinates into the range. It places the
+/// result value so that mirroring it on both lower and upper (which
+/// produces an infinite number of mirror images) will produce one
+/// mirror image coinciding with the input.
+/// When using this gate type with splines with MIRROR boundary conditions,
+/// if the shape of the core for the axis in question is M, _lower would be
+/// passed 0 and _upper M-1.
+/// For splines with REFLECT boundary conditions, we'd pass -0.5 to
+/// _lower and M-0.5 to upper, since here we mirror 'between bounds'
+/// and the defined range is wider.
+
+template < typename _in_type ,
+           typename _out_type ,
+           int _vsize >
+struct mirror_policy
+: public vspline::uf_types < _in_type , _out_type , _vsize >
+{
+  typedef _in_type rc_type ;
+  typedef vspline::uf_types < rc_type , rc_type , _vsize > base_type ;
+  using_unary_functor_types(base_type) ;
+  
+  const rc_type lower ;
+  const rc_type upper ;
+  
+  mirror_policy ( rc_type _lower ,
+                rc_type _upper )
+  : lower ( _lower ) ,
+    upper ( _upper )
+  { } ;
+
+  void eval ( const in_type & c ,
+                    out_type & result ) const
+  {
+    in_type cc = c ;
+    if ( cc < lower )
+      cc = 2 * lower - cc ;
+    if ( cc > upper )
+    {
+      cc -= lower ;
+      cc = std::fmod ( cc , 2 * ( upper - lower ) ) ;
+      cc += lower ;
+      if ( cc > upper )
+        cc = 2 * upper - cc ;
+    }
+    result = cc ;
+  }
+  
+#ifdef USE_VC
+
+/// vectorized fmod function
+
+  void eval ( const in_ele_v & c ,
+                    out_ele_v & result ) const
+  {
+    out_ele_v cc ;
+    
+    cc = c - lower ;
+    auto w = upper - lower ;
+
+    cc = abs ( cc ) ;               // left mirror, v is now >= 0
+
+    if ( any_of ( cc > w ) )
+    {
+      cc = v_fmod ( cc , 2 * w ) ;  // map to one full period
+      cc -= w ;                     // center
+      cc = abs ( cc ) ;             // map to half period
+      cc = w - cc ;                 // flip
+    }
+    
+    result = cc + lower ;
+  }
+
+#endif
+
+} ;
+
+/// the periodic mapping also folds the incoming value into the allowed range.
+/// The resulting value will be ( N * period ) from the input value and inside
+/// the range, period being upper - lower.
+/// For splines done with PERIODIC boundary conditions, if the shape of
+/// the core for this axis is M, we'd pass 0 to _lower and M to _upper.
+
+template < typename _in_type ,
+           typename _out_type ,
+           int _vsize >
+struct periodic_policy
+: public vspline::uf_types < _in_type , _out_type , _vsize >
+{
+  typedef _in_type rc_type ;
+  typedef vspline::uf_types < rc_type , rc_type , _vsize > base_type ;
+  using_unary_functor_types(base_type) ;
+  
+  const rc_type lower ;
+  const rc_type upper ;
+  
+  periodic_policy ( rc_type _lower ,
+                rc_type _upper )
+  : lower ( _lower ) ,
+    upper ( _upper )
+  { } ;
+
+  void eval ( const in_type & c ,
+                    out_type & result ) const
+  {
+    in_type cc = c - lower ;
+    auto w = upper - lower ;
+    
+    if ( cc < 0 )
+      cc = w + fmod ( cc , w ) ;
+    else if ( cc >= w )
+      cc = fmod ( cc , w ) ;
+    result = cc + lower ;
+  }
+  
+#ifdef USE_VC
+
+  void eval ( const in_ele_v & c ,
+                    out_ele_v & result ) const
+  {
+    out_ele_v cc ;
+    
+    cc = c - lower ;
+    auto w = upper - lower ;
+
+    if ( any_of ( ( cc < 0 ) | ( cc >= w ) ) )
+    {
+      cc = v_fmod ( cc , w ) ;
+      cc ( cc < 0 ) += w ;
+    }
+    
+    result = cc + lower ;
+  }
+
+#endif
+
+} ;
+
+template < typename coordinate_type ,
+           int _vsize ,
+           template < class ,
+                      class ,
+                      int > class gate_policy >
+using gate_type =
+  vspline::unary_functor < coordinate_type ,
+                           coordinate_type ,
+                           _vsize ,
+                            gate_policy > ;
+
+/// finally we define class mapper which is initialized with a set of
+/// gate objects (of arbitrary type) which are applied to each component
+/// of an incoming nD coordinate in turn.
+/// The trickery with the variadic template argument list is necessary,
+/// because we want to be able to combine arbitrary gate types, which
+/// have distinct types to make them as efficient as possible.
+/// the only requirement for a gate type is that it has to provide the
+/// necessary eval() functions. Typically a gate type would inherit from
+/// a 1D vspline::unary_functor, like the types above, since this guarantees
+/// a suitable type, but this is not enforced.
+
+// TODO: I'd like to be able to construct a mapper object from
+// a vspline::bspline (which provides lower_limit() and upper_limit()
+// for each axis) and a set of vspline::map_codes
+
+template < typename _in_type ,
+           typename _out_type ,
+           int _vsize ,
+           class ... gate_types >
+struct mapper_policy
+{
+  enum { dimension = vigra::ExpandElementResult < _in_type > :: size } ;
+  
+  // we hold the 1D mappers in a tuple
+  
+  typedef std::tuple < gate_types... > mvec_type ;
+  
+  // mvec holds the 1D gate objects passed to the constructor
+  
+  const mvec_type mvec ;
+  
+  // the constructor receives gate objects
+
+  mapper_policy ( gate_types ... args )
+  : mvec ( args... )
+  { } ;
+  
+  // to handle the application of the 1D gates, we use a recursive
+  // helper type which applies the 1D gate for a specific axis and
+  // then recurses to the next axis until axis 0 is reached.
+  // We also pass 'dimension' as template argument, so we can specialize
+  // for 1D operation (see below)
+
+  template < int level , int dimension , typename nd_coordinate_type >
+  struct _map
+  { 
+    void operator() ( const mvec_type & mvec ,
+                      const nd_coordinate_type & in ,
+                      nd_coordinate_type & out ) const
+    {
+      std::get<level>(mvec).eval ( in[level] , out[level] ) ;
+      _map < level - 1 , dimension , nd_coordinate_type >() ( mvec , in , out ) ;
+    }
+  } ;
+  
+  // at level 0 the recursion ends
+  
+  template < int dimension , typename nd_coordinate_type >
+  struct _map < 0 , dimension , nd_coordinate_type >
+  { 
+    void operator() ( const mvec_type & mvec ,
+                      const nd_coordinate_type & in ,
+                      nd_coordinate_type & out ) const
+    {
+      std::get<0>(mvec).eval ( in[0] , out[0] ) ;
+    }
+  } ;
+  
+  // here's the specialization for 1D operation
+
+  template < typename coordinate_type >
+  struct _map < 0 , 1 , coordinate_type >
+  { 
+    void operator() ( const mvec_type & mvec ,
+                      const coordinate_type & in ,
+                      coordinate_type & out ) const
+    {
+      std::get<0>(mvec).eval ( in , out ) ;
+    }
+  } ;
+
+  // now we define eval for unvectorized and vectorized operation
+  // by simply delegating to struct _map at the top level.
+
+  template < class in_type , class out_type >
+  void eval ( const in_type & in ,
+                    out_type & out ) const
+  {
+    _map < dimension - 1 , dimension , in_type >() ( mvec , in , out ) ;
+  }
+
+} ;
+
+template < typename _coordinate_type ,
+           int _vsize ,
+           class ... gate_types >
+using mapper = vspline::unary_functor < _coordinate_type ,
+                                        _coordinate_type ,
+                                        _vsize ,
+                                        mapper_policy ,
+                                        gate_types ... > ;
+
+} ; // namespace vspline
+
+#endif // #ifndef VSPLINE_MAP_H
diff --git a/multithread.h b/multithread.h
new file mode 100644
index 0000000..eea6679
--- /dev/null
+++ b/multithread.h
@@ -0,0 +1,671 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// \file multithread.h
+///
+/// \brief code to distribute the processing of bulk data to several threads
+/// 
+/// The code in this header provides a resonably general method to perform
+/// processing of manifolds of data with several threads in parallel. In vspline,
+/// there are several areas where potentially large numbers of individual values
+/// have to be processed independently of each other or in a dependence which
+/// can be preserved in partitioning. To process such 'bulk' data effectively,
+/// vspline employs two strategies: multithreading and vectorization.
+/// This file handles the multithreading.
+///
+/// To produce generic code for the purpose, we first introduce a model of what
+/// we intend to do. This model looks at the data as occupying a 'range' having
+/// a defined starting point and end point. We keep with the convention of defining
+/// ranges so that the start point is inside and the end point outside the data
+/// set described by the range, just like iterators obtained by begin() and end().
+/// This range is made explicit, even if it is implicit in the data which we want to
+/// submit to multithreading, and there is a type for the purpose: struct range_type.
+/// range_type merely captures the concept of a range, taking 'limit_type' as it's
+/// template parameter, so that any type of range can be accomodated. A range is
+/// defined by it's lower and upper limit.
+///
+/// Next we define an object holding a set of ranges, modeling a partitioning of
+/// an original/whole range into subranges, which, within the context of this code,
+/// are disparate and in sequence. This object is modeled as struct partition_type,
+/// taking a range_type as it's template argument.
+///
+/// With these types, we model concrete ranges and partitionings. The most important
+/// one is dealing with multidimensional shapes, where a range extends from a 'lower'
+/// coordinate to a just below a 'higer' coordinate. These two coordinates can be
+/// used directly to call vigra's 'subarray' function.
+///
+/// Next we provide code to partition ranges into sets of subranges.
+///
+/// Finally we can express a generalized multithreading routine. This routine takes
+/// a functor capable of processing a range specification and a parameter pack of
+/// arbitrary further parameters, of which some will usually be refering to manifolds
+/// of data for which the given range makes sense. We call this routine with a
+/// partitioning of the original range and the same parameter pack that is to be passed
+/// to the functor. The multithreading routine proceeds to set up 'tasks' as needed,
+/// providing each with the functor as it's functional, a subrange from
+/// the partitioning and the parameter pack as arguments. There's also an overload
+/// to multithread() where we pass in a range over all the data and a desired number of
+/// tasks, leaving the partitioning to code inside multithread(), resulting in some
+/// default partitioning provided by a suitable overload of function partition().
+/// This is the commonly used variant, since it's usually not necessary to obtain
+/// a partitioning other than the default one. The notable exception here is partitioning
+/// for b-spline prefiltering, where the axis along which we apply the filter must
+/// not be split, and hence the prefiltering code partitions the range itself and
+/// uses the variant of multithread() which takes a partitioning.
+///
+/// The tasks, once prepared, are handed over to a 'joint_task' object which handles
+/// the interaction with the thread pool (in thread_pool.h). While my initial code
+/// used one thread per task, this turned out inefficient, because it was not granular
+/// enough: the slowest thread became the limiting factor. Now the job at hand is split
+/// into more individual tasks (something like 8 times the number of cores), resulting
+/// in a fair compromise concerning granularity. multithread() waits for all tasks to
+/// terminate and returns when it's certain that the job is complete.
+///
+/// With this method, we assure that on return of multithread() we can safely access
+/// whatever results we anticipate. While it might be useful to launch the tasks and
+/// return to continue the main thread, picking up the result later when it becomes
+/// ready, I chose to suspend the calling thread until the result arrives. This makes
+/// the logic simpler, and should be what most use cases need: there is often little else
+/// to do but to wait for the result anyway. If asynchronous operation is needed, a thread
+/// can be launched to initiate and collect from the multithreading. It's safe to have
+/// several threads using this multithreading code, since each task is linked to a
+/// 'coordinator', see struct joint_task below.
+
+#ifndef VSPLINE_MULTITHREAD_H
+#define VSPLINE_MULTITHREAD_H
+
+#include <assert.h>
+#include <vigra/tinyvector.hxx>
+#include <vigra/multi_array.hxx>
+#include <thread>
+#include <mutex>
+#include <queue>
+#include <condition_variable>
+#include <vspline/common.h>
+#include <vspline/thread_pool.h>
+
+namespace vspline
+{
+/// number of CPU cores in the system.
+
+const int ncores = std::thread::hardware_concurrency() ;
+
+/// when multithreading, use this number of jobs per default. This is
+/// an attempt at a compromise: too many jobs will produce too much overhead,
+/// too few will not distribute the load well and make the system vulnerable
+/// to 'straggling' threads
+
+const int default_njobs = 8 * ncores ;
+
+/// given limit_type, we define range_type as a TinyVector of two limit_types,
+/// the first denoting the beginning of the range and the second it's end, with
+/// end being outside of the range.
+
+template < class limit_type >
+using range_type = vigra::TinyVector < limit_type , 2 > ;
+
+/// given range_type, we define partition_type as a std::vector of range_type.
+/// This data type is used to hold the partitioning of a range into subranges.
+
+template < class range_type >
+using partition_type = std::vector < range_type > ;
+
+/// given a dimension, we define a shape_type as a TinyVector of
+/// vigra::MultiArrayIndex of this dimension.
+/// This is equivalent to vigra's shape type.
+
+// TODO: might instead define as: vigra::MultiArrayShape<dimension>
+
+template < int dimension >
+using shape_type = vigra::TinyVector <  vigra::MultiArrayIndex , dimension > ;
+
+/// given a dimension, we define shape_range_type as a range defined by
+/// two shapes of the given dimension. This definition allows us to directly
+/// pass the two shapes as arguments to a call of subarray() on a MultiArrayView
+/// of the given dimension. Note the subarray semantics: if the range is
+/// [2,2] to [4,4], it refers to elements [2,2], [3,2], [2,3], [3,3].
+
+template < int dimension >
+using shape_range_type = range_type < shape_type < dimension > > ;
+
+template < int dimension >
+using shape_partition_type = partition_type < shape_range_type < dimension > > ;
+
+// currently unused
+// // iterator_splitter will try to set up n ranges from a range. the partial
+// // ranges are stored in a std::vector. The split may succeed producing n
+// // or less ranges, and if iter_range can't be split at all, a single range
+// // encompassing the whole of iter_range will be returned in the result vector.
+// 
+// template < class _iterator_type >
+// struct iterator_splitter
+// {
+//   typedef _iterator_type iterator_type ;
+//   typedef vigra::TinyVector < iterator_type , 2 > range_type ;
+//   typedef std::vector < range_type > partition_type ;
+// 
+//   static partition_type part ( const range_type & iter_range ,
+//                                int n )
+//   {
+//     std::vector < range_type > res ;
+//     assert ( n > 0 ) ;
+// 
+//     iterator_type start = iter_range [ 0 ] ;
+//     iterator_type end = iter_range [ 1 ] ;
+//     int size = end - start ;
+//     if ( n > size )
+//       n = size ;
+//     
+//     int chunk_size = size / n ; // will be at least 1
+//     
+//     for ( int i = 0 ; i < n - 1 ; i++ )
+//     {
+//       res.push_back ( range_type ( start , start + chunk_size ) ) ;
+//       start += chunk_size ;
+//     }
+//     res.push_back ( range_type ( start , end ) ) ;
+//     return res ;
+//   }
+// } ;
+
+/// shape_splitter will try to split a shape into n ranges by 'chopping' it
+/// along the outermost axis that can be split n-ways. The additional parameter
+/// 'forbid' prevents that particular axis from being split. The split may succeed
+/// producing n or less ranges, and if 'shape' can't be split at all, a single range
+/// encompassing the whole of 'shape' will be returned in the result vector. This
+/// object is used for partitioning when one axis has to be preserved intact, like
+/// for b-spline prefiltering, but it's not used per default for all shape splitting,
+/// since the resulting partitioning performs not so well in certain situations
+/// (see the partitioning into tiles below for a better general-purpose splitter)
+
+// TODO: with some shapes, splitting will result in subranges which aren't optimal
+// for b-spline prefiltering (these are fastest with extents which are a multiple of
+// the simdized data type), so we might add code to preferably use cut locations
+// coinciding with those extents. And with small extents being split, the result
+// becomes very inefficient for filtering.
+
+template < int dim >
+struct shape_splitter
+{
+  typedef shape_type < dim > shape_t ;
+  typedef range_type < shape_t > range_t ;
+  typedef partition_type < range_t > partition_t ;
+  
+  static partition_t part ( const shape_t & shape , ///< shape to be split n-ways
+                            int n = default_njobs , ///< intended number of chunks
+                            int forbid = -1 )       ///< axis which shouldn't be split
+  {
+    partition_t res ;
+
+    // find the outermost dimension that can be split n ways, and it's extent 
+    int split_dim = -1 ;
+    int max_extent = -1 ;
+    for ( int md = dim - 1 ; md >= 0 ; md-- )
+    {
+      if (    md != forbid
+          && shape[md] > max_extent
+          && shape[md] >= n )
+      {
+        max_extent = shape[md] ;
+        split_dim = md ;
+        break ;
+      }
+    }
+    
+    // if the search did not yet succeed:
+    if ( max_extent == -1 )
+    {
+      // repeat process with relaxed conditions: now the search will also succeed
+      // if there is an axis which can be split less than n ways
+      for ( int md = dim - 1 ; md >= 0 ; md-- )
+      {
+        if (    md != forbid
+            && shape[md] > 1 )
+        {
+          max_extent = shape[md] ;
+          split_dim = md ;
+          break ;
+        }
+      }
+    }
+    
+    if ( split_dim == -1 )
+    {   
+      // we have not found a dimension for splitting. We pass back res with
+      // a range over the whole initial shape as it's sole member
+      res.push_back ( range_t ( shape_t() , shape ) ) ;
+    }
+    else
+    {
+      // we can split the shape along split_dim
+      
+      int w = shape [ split_dim ] ;  // extent of the dimension we can split
+      n = std::min ( n , w ) ;       // just in case, if that is smaller than n
+      
+      int * cut = new int [ n ] ;    // where to chop up this dimension
+      
+      for ( int i = 0 ; i < n ; i++ )
+        cut[i] = ( (i+1) * w ) / n ;   // roughly equal chunks, but certainly last cut == a.end()
+
+      shape_t start , end = shape ;
+
+      for ( int i = 0 ; i < n ; i++ )
+      {
+        end [ split_dim ] = cut [ i ];                  // apply the cut locations
+        res.push_back ( range_t ( start , end ) ) ;
+        start [ split_dim ] = end [ split_dim ] ;
+      }
+      delete[] cut ; // clean up
+    }
+    return res ;
+  }
+} ;
+
+/// partition a shape range into 'stripes'. This uses shape_splitter with
+/// 'forbid' left at the default of -1, resulting in a split along the
+/// outermost dimension that can be split n ways or the next best thing
+/// shape_splitter can come up with. If the intended split is merely to
+/// distribute the work load without locality considerations, this should
+/// be the split to use. When locality is an issue, consider the next variant.
+
+template < int d >
+partition_type < shape_range_type<d> >
+partition_to_stripes ( shape_range_type<d> range , int nparts )
+{
+  if ( range[0].any() )
+  {
+    // the lower limit of the range is not at the origin, so get the shape
+    // of the region between range[0] and range[1], call shape_splitter with
+    // this shape, and add the offset to the lower limit of the original range
+    // to the partial ranges in the result
+    auto shape = range[1] - range[0] ;
+    auto res = shape_splitter < d > :: part ( shape , nparts ) ;
+    for ( auto & r : res )
+    {
+      r[0] += range[0] ;
+      r[1] += range[0] ;
+    }
+    return res ;
+  }
+  // if range[0] is at the origin, we don't have to use an offset
+  return shape_splitter < d > :: part ( range[1] , nparts ) ;
+}
+
+/// alternative partitioning into tiles. For the optimal situation, where
+/// the view isn't rotated or pitched much, the partitioning into bunches
+/// of lines (above) seems to perform slightly better, but with more difficult
+/// transformations (like 90 degree rotation), performance suffers (like, -20%),
+/// whereas with this tiled partitioning it is roughly the same, supposedly due
+/// to identical locality in both cases. So currently I am using this partitioning.
+/// note that the current implementation ignores the argument 'nparts' and
+/// produces tiles 160X160.
+
+// TODO code is a bit clumsy...
+
+// TODO it may be a good idea to have smaller portions towards the end
+// of the partitioning, since they will be processed last, and if the
+// last few single-threaded operations are short, they may result in less
+// situations where a long single-threaded operation has just started when
+// all other tasks are already done, causing the system to idle on the other
+// cores. or at least the problem will not persist for so long.
+
+template < int d >
+partition_type < shape_range_type<d> >
+partition_to_tiles ( shape_range_type<d> range ,
+                     int nparts = default_njobs )
+{
+  // To help with the dilemma that this function is really quite specific
+  // for images, for the time being I delegate to return partition_to_stripes()
+  // for dimensions != 2
+
+  if ( d != 2 )
+    return partition_to_stripes ( range , nparts ) ;
+
+  auto shape = range[1] - range[0] ;
+
+// currently disregarding incoming nparts parameter:
+//   int nelements = prod ( shape ) ;
+//   int ntile = nelements / nparts ;
+//   int nedge = pow ( ntile , ( 1.0 / d ) ) ;
+  
+  // TODO fixing this size is system-specific!
+  
+  int nedge = 160 ; // instead: heuristic, fixed size tiles
+
+  auto tiled_shape = shape / nedge ;
+
+  typedef std::vector < int > stopv ;
+  stopv stops [ d ] ;
+  for ( int a = 0 ; a < d ; a++ )
+  {
+    stops[a].push_back ( 0 ) ;
+    for ( int k = 1 ; k < tiled_shape[a] ; k++ )
+      stops[a].push_back ( k * nedge ) ;
+    stops[a].push_back ( shape[a] ) ;
+  }
+  
+  for ( int a = 0 ; a < d ; a++ )
+    tiled_shape[a] = stops[a].size() - 1 ;
+  
+  int k = prod ( tiled_shape ) ;
+  
+  // If this partitioning scheme fails to produce a partitioning with
+  // at least nparts components, fall back to using partition_to_stripes()
+  
+  if ( k < nparts )
+    return partition_to_stripes ( range , nparts ) ;
+  
+  nparts = k ;
+  partition_type < shape_range_type<d> > res ( nparts ) ;
+  
+  for ( int a = 0 ; a < d ; a++ )
+  {
+    int j0 = 1 ;
+    for ( int h = 0 ; h < a ; h++ )
+      j0 *= tiled_shape[h] ;
+    int i = 0 ;
+    int j = 0 ;
+    for ( int k = 0 ; k < nparts ; k++ )
+    {
+      res[k][0][a] = stops[a][i] ;
+      res[k][1][a] = stops[a][i+1] ;
+      ++j ;
+      if ( j == j0 )
+      {
+        j = 0 ;
+        ++i ;
+        if ( i >= tiled_shape[a] )
+          i = 0 ;
+      }
+    }
+  }
+  for ( auto & e : res )
+  {
+    e[0] += range[0] ;
+    e[1] += range[0] ;
+//     std::cout << "tile: " << e[0] << e[1] << std::endl ;
+  }
+  return res ;
+}
+
+// /// specialization for 1D shape range. Obviously we can't make tiles
+// /// from 1D data...
+// 
+// template<>
+// partition_type < shape_range_type<1> >
+// partition_to_tiles ( shape_range_type<1> range ,
+//                      int nparts )
+// {
+//   auto size = range[1][0] - range[0][0] ;
+//   auto part_size = size / nparts ;
+//   if ( part_size < 1 )
+//     part_size = size ;
+//   
+//   nparts = int ( size / part_size ) ;
+//   if ( nparts * part_size < size )
+//     nparts++ ;
+// 
+//   partition_type < shape_range_type<1> > res ( nparts ) ;
+//   
+//   auto start = range[0] ;
+//   auto stop = start + part_size ;
+//   for ( auto & e : res )
+//   {
+//     e[0] = start ;
+//     e[1] = stop ;
+//     start = stop ;
+//     stop = start + part_size ;
+//   }
+//   res[nparts-1][1] = size ;
+//   return res ;
+// }
+
+/// action_wrapper wraps a functional into an outer function which
+/// first calls the functional and then checks if this was the last
+/// of a bunch of actions to complete, by incrementing the counter
+/// p_done points to and comparing the result to 'nparts'. If the
+/// test succeeds, the caller is notified via the condition variable
+/// p_pool_cv points to, under the mutex p_pool_mutex points to.
+
+void action_wrapper ( std::function < void() > payload ,
+                      int nparts ,
+                      std::mutex * p_pool_mutex ,
+                      std::condition_variable * p_pool_cv ,
+                      int * p_done )
+{
+  // execute the 'payload'
+
+  payload() ;
+
+  // under the coordinator's task_mutex, increase the caller's
+  // 'done' counter and test if it's now equal to 'nparts', the total
+  // number of actions in this bunch
+  
+  // TODO initially I had the notify_all call after closing the scope of
+  // the lock guard, but I had random crashes. Changing the code to call
+  // notify_all with the lock guard still in effect seemed to remove the
+  // problem, but made me unsure of my logic.
+  
+  // 2017-06-23 after removing a misplaced semicolon after the conditional
+  // below I recoded to perform the notification after closing the lock_guard's
+  // scope, and now there doesn't seem to be any problem any more. I leave
+  // these comments in for reference in case things go wrong
+  // TODO remove this and previous comment if all is well
+  
+  bool last_one = false ;
+
+  {
+    std::lock_guard<std::mutex> lk ( * p_pool_mutex ) ;
+    if ( ++ ( * p_done ) == nparts )
+    {
+      // this was the last action originating from the coordinator
+      // so we set the flag which triggers the notification
+      last_one = true ;
+    }
+  }
+  
+  if ( last_one )
+  {
+    // notify the coordinator that the joint task is now complete
+    p_pool_cv->notify_one() ;
+  }
+}
+
+// with this collateral code at hand, we can now implement multithread().
+
+/// multithread uses a thread pool of worker threads to perform
+/// a multithreaded operation. It receives a functor (a single-threaded
+/// function used for all individual tasks), a partitioning, which contains
+/// information on which part of the data each task should process, and
+/// a set of additional parameters to pass on to the functor.
+/// The individual 'payload' tasks are created by binding the functor with
+///
+/// - a range from the partitioning, describing it's share of the data
+///
+/// - the remaining parameters
+///
+/// These tasks are bound to a wrapper routine which takes care of
+/// signalling when the last task has completed.
+
+static thread_pool common_thread_pool ; // keep a thread pool only for multithread()
+
+template < class range_type , class ...Types >
+int multithread ( void (*pfunc) ( range_type , Types... ) ,
+                  partition_type < range_type > partitioning ,
+                  Types ...args )
+{
+  // get the number of ranges in the partitioning
+
+  int nparts = partitioning.size() ;
+  
+  // guard against empty or wrong partitioning
+
+  if ( nparts <= 0 )
+  {
+    return 0 ;
+  }
+
+  if ( nparts == 1 )
+  {
+    // if only one part is in the partitioning, we take a shortcut
+    // and execute the function right here:
+    (*pfunc) ( partitioning[0] , args... ) ;
+    return 1 ;
+  }
+
+  // alternatively, 'done' can be coded as std::atomic<int>. I tried
+  // but couldn't detect any performance benefit, even though allegedly
+  // atomics are faster than using mutexes... so I'm leaving the code
+  // as it was, using an int and a mutex.
+  
+  int done = 0 ;                    // number of completed tasks
+  std::mutex pool_mutex ;           // mutex to guard access to done and pool_cv
+  std::condition_variable pool_cv ; // for signalling completion
+  
+  {
+    // under the thread pool's task_mutex, fill tasks into task queue
+    std::lock_guard<std::mutex> lk ( common_thread_pool.task_mutex ) ;
+    for ( int i = 0 ; i < nparts ; i++ )
+    {
+      // first create the 'payload' function
+      
+      std::function < void() > payload
+        = std::bind ( pfunc , partitioning[i] , args... ) ;
+
+      // now bind it to the action wrapper and enqueue it
+
+      std::function < void() > action
+        = std::bind ( action_wrapper ,
+                      payload ,
+                      nparts ,
+                      &pool_mutex ,
+                      &pool_cv ,
+                      &done
+                    ) ;
+
+      common_thread_pool.task_queue.push ( action ) ;
+    }
+  }
+
+  // alert all worker threads
+   
+  common_thread_pool.task_cv.notify_all() ;
+
+  {
+    // now wait for the last task to complete. This is signalled by
+    // action_wrapper by notfying on pool_cv and doublechecked
+    // by testing for done == nparts
+
+    std::unique_lock<std::mutex> lk ( pool_mutex ) ;
+    
+    // the predicate done == nparts rejects spurious wakes
+    
+    pool_cv.wait ( lk , [&] { return done == nparts ; } ) ;
+  }
+  
+  // all jobs are done
+
+  return nparts ;
+}
+
+// /// this overload of multithread() takes the desired number of tasks and a
+// /// range covering the 'whole' data. partition() is called with the range,
+// /// resulting in a partitioning which above overload of multithread() can use.
+// 
+// template < class range_type , class ...Types >
+// int multithread ( void (*pfunc) ( range_type , Types... ) ,
+//                   int nparts ,
+//                   range_type range ,
+//                   Types ...args )
+// {
+//   if ( nparts <= 1 )
+//   {
+//     // if only one part is requested, we take a shortcut and execute
+//     // the function right here:
+//     (*pfunc) ( range , args... ) ;
+//     return 1 ;
+//   }
+// 
+//   // partition the range using partition_to_tiles()
+// 
+//   partition_type < range_type > partitioning = partition_to_tiles ( range , nparts ) ;
+//   
+//   return multithread ( pfunc , partitioning , args... ) ;
+// }
+
+/// This variant of multithread() takes a pointer to a function performing
+/// the partitioning of the incoming range. The partitioning function is
+/// invoked on the incoming range (provided nparts is greater than 1) and
+/// the resulting partitioning is used as an argument to the first variant
+/// of multithread().
+
+// TODO It might be better to code this using std::function objects.
+
+// TODO may use move semantics for forwarding instead of relying on the
+// optimizer to figure this out
+
+template < class range_type , class ...Types >
+int multithread ( void (*pfunc) ( range_type , Types... ) ,
+                  partition_type < range_type > (*partition) ( range_type , int ) ,
+                  int nparts ,
+                  range_type range ,
+                  Types ...args )
+{
+  if ( nparts <= 1 )
+  {
+    // if only one part is requested, we take a shortcut and execute
+    // the function right here:
+    (*pfunc) ( range , args... ) ;
+    return 1 ;
+  }
+
+  // partition the range using the function pointed to by 'partition'
+
+  auto partitioning = (*partition) ( range , nparts ) ;
+  
+  // then pass pfunc, the partitioning and the remaining arguments
+  // to the variant of multithread() accepting a partitioning
+  
+  return multithread ( pfunc , partitioning , args... ) ;
+}
+
+
+} ; // end if namespace vspline
+
+#endif // #ifndef VSPLINE_MULTITHREAD_H
diff --git a/poles.h b/poles.h
new file mode 100644
index 0000000..59c6b74
--- /dev/null
+++ b/poles.h
@@ -0,0 +1,690 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file poles.h
+
+    \brief precalculated prefilter poles and basis function values
+
+    The contents of this file below the comments can be generated
+    using prefilter_poles.cc
+
+    While the precalculated basis function values can be generated in long double
+    precision (with code in basis.h), the filter poles are calculated using
+    gsl and BLAS, which provide only double precision.
+    
+    The values defined here are used in several places in vspline. They are
+    precomputed because calculating them when needed can be (potentially very)
+    expensive, and providing them by definitions evaluated at compile time
+    slows compilation.
+    
+    The set of values provided here is sufficient to calculate the b-spline
+    basis function for all spline degrees for arbitrary arguments (including
+    the spline's derivatives) - see basis.h. The poles are needed for prefiltering.
+*/
+
+#ifndef VSPLINE_POLES_H
+
+namespace vspline_constants {
+
+const long double K0[] = {
+ 1L ,   // basis(0)
+ } ; 
+const long double K1[] = {
+ 1L ,   // basis(0)
+ 0.5L ,   // basis(0.5)
+ } ; 
+const long double K2[] = {
+ 0.75L ,   // basis(0)
+ 0.5L ,   // basis(0.5)
+ 0.125L ,   // basis(1)
+ } ; 
+const double Poles_2[] = {
+-0.17157287525381015314 ,
+} ;
+const long double K3[] = {
+ 0.66666666666666666668L ,   // basis(0)
+ 0.47916666666666666666L ,   // basis(0.5)
+ 0.16666666666666666667L ,   // basis(1)
+ 0.020833333333333333334L ,   // basis(1.5)
+ } ; 
+const double Poles_3[] = {
+-0.26794919243112280682 ,
+} ;
+const long double K4[] = {
+ 0.59895833333333333332L ,   // basis(0)
+ 0.45833333333333333334L ,   // basis(0.5)
+ 0.19791666666666666667L ,   // basis(1)
+ 0.041666666666666666668L ,   // basis(1.5)
+ 0.0026041666666666666667L ,   // basis(2)
+ } ; 
+const double Poles_4[] = {
+-0.36134122590021944266 ,
+-0.013725429297339164503 ,
+} ;
+const long double K5[] = {
+ 0.55000000000000000001L ,   // basis(0)
+ 0.4380208333333333333L ,   // basis(0.5)
+ 0.21666666666666666667L ,   // basis(1)
+ 0.061718750000000000001L ,   // basis(1.5)
+ 0.0083333333333333333337L ,   // basis(2)
+ 0.00026041666666666666668L ,   // basis(2.5)
+ } ; 
+const double Poles_5[] = {
+-0.43057534709997430378 ,
+-0.043096288203264443428 ,
+} ;
+const long double K6[] = {
+ 0.51102430555555555553L ,   // basis(0)
+ 0.41944444444444444449L ,   // basis(0.5)
+ 0.22879774305555555554L ,   // basis(1)
+ 0.079166666666666666666L ,   // basis(1.5)
+ 0.015668402777777777778L ,   // basis(2)
+ 0.001388888888888888889L ,   // basis(2.5)
+ 2.170138888888888889e-05L ,   // basis(3)
+ } ; 
+const double Poles_6[] = {
+-0.48829458930304570075 ,
+-0.081679271076237264237 ,
+-0.0014141518083258114435 ,
+} ;
+const long double K7[] = {
+ 0.47936507936507936512L ,   // basis(0)
+ 0.4025964161706349206L ,   // basis(0.5)
+ 0.23630952380952380952L ,   // basis(1)
+ 0.094024367559523809525L ,   // basis(1.5)
+ 0.023809523809523809525L ,   // basis(2)
+ 0.0033776661706349206347L ,   // basis(2.5)
+ 0.00019841269841269841271L ,   // basis(3)
+ 1.5500992063492063493e-06L ,   // basis(3.5)
+ } ; 
+const double Poles_7[] = {
+-0.5352804307964414976 ,
+-0.12255461519232610512 ,
+-0.0091486948096082820747 ,
+} ;
+const long double K8[] = {
+ 0.45292096819196428568L ,   // basis(0)
+ 0.38737599206349206352L ,   // basis(0.5)
+ 0.24077768477182539682L ,   // basis(1)
+ 0.10647321428571428571L ,   // basis(1.5)
+ 0.032126968625992063494L ,   // basis(2)
+ 0.0061259920634920634923L ,   // basis(2.5)
+ 0.00063476562499999999998L ,   // basis(3)
+ 2.4801587301587301589e-05L ,   // basis(3.5)
+ 9.6881200396825396832e-08L ,   // basis(4)
+ } ; 
+const double Poles_8[] = {
+-0.57468690924881216109 ,
+-0.16303526929727354955 ,
+-0.023632294694844447475 ,
+-0.00015382131064169135559 ,
+} ;
+const long double K9[] = {
+ 0.43041776895943562614L ,   // basis(0)
+ 0.3736024025676532187L ,   // basis(0.5)
+ 0.24314925044091710759L ,   // basis(1)
+ 0.1168385769744819224L ,   // basis(1.5)
+ 0.040255731922398589063L ,   // basis(2)
+ 0.0094531293058311287482L ,   // basis(2.5)
+ 0.0013833774250440917108L ,   // basis(3)
+ 0.00010588576974481922398L ,   // basis(3.5)
+ 2.7557319223985890654e-06L ,   // basis(4)
+ 5.3822889109347442683e-09L ,   // basis(4.5)
+ } ; 
+const double Poles_9[] = {
+-0.60799738916862233751 ,
+-0.20175052019315406482 ,
+-0.04322260854048156492 ,
+-0.0021213069031808251541 ,
+} ;
+const long double K10[] = {
+ 0.41096264282441854056L ,   // basis(0)
+ 0.36109843474426807762L ,   // basis(0.5)
+ 0.24406615618885719797L ,   // basis(1)
+ 0.12543871252204585538L ,   // basis(1.5)
+ 0.047983348920442019401L ,   // basis(2)
+ 0.013183421516754850087L ,   // basis(2.5)
+ 0.0024532852307408785274L ,   // basis(3)
+ 0.00027915564373897707232L ,   // basis(3.5)
+ 1.5887978636188271604e-05L ,   // basis(4)
+ 2.7557319223985890654e-07L ,   // basis(4.5)
+ 2.6911444554673721342e-10L ,   // basis(5)
+ } ; 
+const double Poles_10[] = {
+-0.63655066396958059904 ,
+-0.23818279837754796624 ,
+-0.065727033228304657109 ,
+-0.0075281946755491966489 ,
+-1.6982762823274620556e-05 ,
+} ;
+const long double K11[] = {
+ 0.39392556517556517558L ,   // basis(0)
+ 0.34970223188744306906L ,   // basis(0.5)
+ 0.24396028739778739779L ,   // basis(1)
+ 0.13256116543210659421L ,   // basis(1.5)
+ 0.055202020202020202023L ,   // basis(2)
+ 0.017163149607531321399L ,   // basis(2.5)
+ 0.0038238786676286676285L ,   // basis(3)
+ 0.00057128626126327135446L ,   // basis(3.5)
+ 5.1006092672759339424e-05L ,   // basis(4)
+ 2.1667994232691498315e-06L ,   // basis(4.5)
+ 2.5052108385441718776e-08L ,   // basis(5)
+ 1.2232474797578964246e-11L ,   // basis(5.5)
+ } ; 
+const double Poles_11[] = {
+-0.66126606890063921451 ,
+-0.27218034929481393913 ,
+-0.089759599793708844118 ,
+-0.016669627366234951449 ,
+-0.00051055753444649576434 ,
+} ;
+const long double K12[] = {
+ 0.37884408454472999147L ,   // basis(0)
+ 0.33927295023649190319L ,   // basis(0.5)
+ 0.24313091801014469471L ,   // basis(1)
+ 0.13845146655042488376L ,   // basis(1.5)
+ 0.061867668009041325489L ,   // basis(2)
+ 0.021268582401394901395L ,   // basis(2.5)
+ 0.0054581869256967252307L ,   // basis(3)
+ 0.00099847474413446635658L ,   // basis(3.5)
+ 0.00012091392059187537162L ,   // basis(4)
+ 8.5239798781465448132e-06L ,   // basis(4.5)
+ 2.7086165069699140876e-07L ,   // basis(5)
+ 2.0876756987868098981e-09L ,   // basis(5.5)
+ 5.0968644989912351028e-13L ,   // basis(6)
+ } ; 
+const double Poles_12[] = {
+-0.68286488419809487915 ,
+-0.30378079328817336746 ,
+-0.11435052002714780894 ,
+-0.028836190198661435652 ,
+-0.0025161662172618224839 ,
+-1.883305645063344802e-06 ,
+} ;
+const long double K13[] = {
+ 0.36537086948545281884L ,   // basis(0)
+ 0.32968987958591001189L ,   // basis(0.5)
+ 0.24178841798633465302L ,   // basis(1)
+ 0.14331501747174208366L ,   // basis(1.5)
+ 0.067974967258821425491L ,   // basis(2)
+ 0.0254044062949849888L ,   // basis(2.5)
+ 0.0073122366959172514726L ,   // basis(3)
+ 0.0015671731081656330546L ,   // basis(3.5)
+ 0.00023762984700484700481L ,   // basis(4)
+ 2.3492285420207986942e-05L ,   // basis(4.5)
+ 1.3133086049752716419e-06L ,   // basis(5)
+ 3.125375747123929632e-08L ,   // basis(5.5)
+ 1.6059043836821614601e-10L ,   // basis(6)
+ 1.960332499612013501e-14L ,   // basis(6.5)
+ } ; 
+const double Poles_13[] = {
+-0.701894251817016257 ,
+-0.33310723293052579841 ,
+-0.13890111319434489401 ,
+-0.043213866740361948915 ,
+-0.0067380314152448743045 ,
+-0.00012510011321441739246 ,
+} ;
+const long double K14[] = {
+ 0.35323915669918929845L ,   // basis(0)
+ 0.32085024502063192543L ,   // basis(0.5)
+ 0.24008299041558734203L ,   // basis(1)
+ 0.14732180094624291054L ,   // basis(1.5)
+ 0.073541032564067060978L ,   // basis(2)
+ 0.029499800232377117299L ,   // basis(2.5)
+ 0.009341081854512256905L ,   // basis(3)
+ 0.002275919650051594496L ,   // basis(3.5)
+ 0.00041109051149372196722L ,   // basis(4)
+ 5.2046374591017448155e-05L ,   // basis(4.5)
+ 4.2229561084936041826e-06L ,   // basis(5)
+ 1.8776463468923786384e-07L ,   // basis(5.5)
+ 3.3486357751247422931e-09L ,   // basis(6)
+ 1.1470745597729724715e-11L ,   // basis(6.5)
+ 7.0011874986143339324e-16L ,   // basis(7)
+ } ; 
+const double Poles_14[] = {
+-0.71878378723766189751 ,
+-0.36031907191881451524 ,
+-0.16303351479903732679 ,
+-0.059089482194828991946 ,
+-0.013246756734847169382 ,
+-0.00086402404095337124838 ,
+-2.0913096775274000322e-07 ,
+} ;
+const long double K15[] = {
+ 0.34224026135534072046L ,   // basis(0)
+ 0.31266660625176080971L ,   // basis(0.5)
+ 0.23812319491070731152L ,   // basis(1)
+ 0.15061194980399698684L ,   // basis(1.5)
+ 0.078595253866748575742L ,   // basis(2)
+ 0.033503802571649835525L ,   // basis(2.5)
+ 0.011502274487496875064L ,   // basis(3)
+ 0.003117493948498863913L ,   // basis(3.5)
+ 0.00064854900635323915745L ,   // basis(4)
+ 9.9440249438946462506e-05L ,   // basis(4.5)
+ 1.057200426826749578e-05L ,   // basis(5)
+ 7.0683979027987963181e-07L ,   // basis(5.5)
+ 2.5045990654456262921e-08L ,   // basis(6)
+ 3.348642542939324287e-10L ,   // basis(6.5)
+ 7.6471637318198164765e-13L ,   // basis(7)
+ 2.3337291662047779775e-17L ,   // basis(7.5)
+ } ; 
+const double Poles_15[] = {
+-0.73387257168597164192 ,
+-0.3855857342780184549 ,
+-0.18652010845105168602 ,
+-0.075907592047656735623 ,
+-0.021752065796541687759 ,
+-0.0028011514820764091618 ,
+-3.0935680451474410063e-05 ,
+} ;
+const long double K16[] = {
+ 0.33220826914249586032L ,   // basis(0)
+ 0.30506442781494322298L ,   // basis(0.5)
+ 0.23598831687663609049L ,   // basis(1)
+ 0.15330093144015230863L ,   // basis(1.5)
+ 0.083172975045518980468L ,   // basis(2)
+ 0.03738103391018481751L ,   // basis(2.5)
+ 0.013757630909488189399L ,   // basis(3)
+ 0.0040808725321077028254L ,   // basis(3.5)
+ 0.00095448286788948239937L ,   // basis(4)
+ 0.00017072700505627712969L ,   // basis(4.5)
+ 2.2348950637818187112e-05L ,   // basis(5)
+ 2.0041660421228046889e-06L ,   // basis(5.5)
+ 1.1074718796168506873e-07L ,   // basis(6)
+ 3.131465753406890973e-09L ,   // basis(6.5)
+ 3.1393546448057462799e-11L ,   // basis(7)
+ 4.7794773323873852978e-14L ,   // basis(7.5)
+ 7.2929036443899311795e-19L ,   // basis(8)
+ } ; 
+const double Poles_16[] = {
+-0.74743238775188380885 ,
+-0.40907360475830745195 ,
+-0.2092287193405746315 ,
+-0.093254718980160661301 ,
+-0.031867706120390963676 ,
+-0.0062584067851372366872 ,
+-0.00030156536330664312833 ,
+-2.3232486364235544612e-08 ,
+} ;
+const long double K17[] = {
+ 0.32300939415699870668L ,   // basis(0)
+ 0.29797995870819162778L ,   // basis(0.5)
+ 0.23373674923065111L ,   // basis(1)
+ 0.15548403615015999844L ,   // basis(1.5)
+ 0.087311640770182303119L ,   // basis(2)
+ 0.041108064309116914771L ,   // basis(2.5)
+ 0.016073921990964784645L ,   // basis(3)
+ 0.0051528238735766806875L ,   // basis(3.5)
+ 0.0013308125721335362832L ,   // basis(4)
+ 0.00027040492583018919549L ,   // basis(4.5)
+ 4.1821549694989869669e-05L ,   // basis(5)
+ 4.695715379871064023e-06L ,   // basis(5.5)
+ 3.5643941839232455477e-07L ,   // basis(6)
+ 1.6314974698479856617e-08L ,   // basis(6.5)
+ 3.6845271901099787809e-10L ,   // basis(7)
+ 2.7700195120810122023e-12L ,   // basis(7.5)
+ 2.8114572543455207634e-15L ,   // basis(8)
+ 2.144971660114685641e-20L ,   // basis(8.5)
+ } ; 
+const double Poles_17[] = {
+-0.75968322407197097501 ,
+-0.43093965318021570932 ,
+-0.23108984359938430919 ,
+-0.11082899331622909911 ,
+-0.043213911456682692347 ,
+-0.011258183689472329655 ,
+-0.0011859331251521279364 ,
+-7.6875625812547303262e-06 ,
+} ;
+const long double K18[] = {
+ 0.31453440085864671822L ,   // basis(0)
+ 0.29135844665108330336L ,   // basis(0.5)
+ 0.2314117793664616011L ,   // basis(1)
+ 0.15724011346206745634L ,   // basis(1.5)
+ 0.091048500593391361557L ,   // basis(2)
+ 0.044670474960158529868L ,   // basis(2.5)
+ 0.018422928690498247476L ,   // basis(3)
+ 0.0063191164101958155308L ,   // basis(3.5)
+ 0.0017772776557432943289L ,   // basis(4)
+ 0.00040219803091097442175L ,   // basis(4.5)
+ 7.1383891069110100448e-05L ,   // basis(5)
+ 9.5907105586580192779e-06L ,   // basis(5.5)
+ 9.271047742986201033e-07L ,   // basis(6)
+ 5.9734083260063868359e-08L ,   // basis(6.5)
+ 2.2685078926749432357e-09L ,   // basis(7)
+ 4.0941846266406646114e-11L ,   // basis(7.5)
+ 2.308349801939754902e-13L ,   // basis(8)
+ 1.5619206968586226463e-16L ,   // basis(8.5)
+ 5.958254611429682336e-22L ,   // basis(9)
+ } ; 
+const double Poles_18[] = {
+-0.77080505126463716437 ,
+-0.45132873338515144823 ,
+-0.25207457469899424707 ,
+-0.12841283679297030296 ,
+-0.055462967138511676257 ,
+-0.017662377684794876992 ,
+-0.0030119307290000858941 ,
+-0.00010633735588702059982 ,
+-2.5812403962584360567e-09 ,
+} ;
+const long double K19[] = {
+ 0.3066931017379824246L ,   // basis(0)
+ 0.28515265744763108603L ,   // basis(0.5)
+ 0.22904564568118377632L ,   // basis(1)
+ 0.1586346253388907509L ,   // basis(1.5)
+ 0.094419295116760105743L ,   // basis(2)
+ 0.048060545425350700269L ,   // basis(2.5)
+ 0.020781149371245016366L ,   // basis(3)
+ 0.0075653834126722674764L ,   // basis(3.5)
+ 0.0022918668891541334257L ,   // basis(4)
+ 0.00056895229089948501399L ,   // basis(4.5)
+ 0.00011341320068077591568L ,   // basis(5)
+ 1.7663033358559161242e-05L ,   // basis(5.5)
+ 2.0693993456206894214e-06L ,   // basis(6)
+ 1.7275247843548983816e-07L ,   // basis(6.5)
+ 9.4683295350905535817e-09L ,   // basis(7)
+ 2.987004917810922453e-10L ,   // basis(7.5)
+ 4.3098159994772440922e-12L ,   // basis(8)
+ 1.8223814805986014024e-14L ,   // basis(8.5)
+ 8.2206352466243297175e-18L ,   // basis(9)
+ 1.5679617398499164042e-23L ,   // basis(9.5)
+ } ; 
+const double Poles_19[] = {
+-0.78094644484628727987 ,
+-0.47037281947078746214 ,
+-0.27218037628176311449 ,
+-0.14585089375766777109 ,
+-0.068345906124943789361 ,
+-0.025265073344845085518 ,
+-0.0059366595910830613492 ,
+-0.00050841019468083302468 ,
+-1.9154786562122251559e-06 ,
+} ;
+const long double K20[] = {
+ 0.29941029032001264032L ,   // basis(0)
+ 0.27932165599364228926L ,   // basis(0.5)
+ 0.22666242185748694763L ,   // basis(1)
+ 0.15972211762658876278L ,   // basis(1.5)
+ 0.0974575566598727568L ,   // basis(2)
+ 0.051275465138013302938L ,   // basis(2.5)
+ 0.023129338338060293147L ,   // basis(3)
+ 0.0088777091023436491261L ,   // basis(3.5)
+ 0.0028712400200206135652L ,   // basis(4)
+ 0.00077261996725682196449L ,   // basis(4.5)
+ 0.00017015073085024172881L ,   // basis(5)
+ 3.0008819646690530456e-05L ,   // basis(5.5)
+ 4.1167033003850903957e-06L ,   // basis(6)
+ 4.2192794922896485481e-07L ,   // basis(6.5)
+ 3.0493046656519177393e-08L ,   // basis(7)
+ 1.4241282646631125569e-09L ,   // basis(7.5)
+ 3.7354418501332067724e-11L ,   // basis(8)
+ 4.3098940955120870235e-13L ,   // basis(8.5)
+ 1.3667861257365780153e-15L ,   // basis(9)
+ 4.1103176233121648586e-19L ,   // basis(9.5)
+ 3.9199043496247910105e-25L ,   // basis(10)
+ } ; 
+const double Poles_20[] = {
+-0.79023111767977516351 ,
+-0.48819126033675236398 ,
+-0.29142160165551617146 ,
+-0.16303353479638585388 ,
+-0.081648115630934034459 ,
+-0.033849479552361630419 ,
+-0.0099730290200507193399 ,
+-0.0014683217571042010263 ,
+-3.7746573197331790075e-05 ,
+-2.8679944881725126467e-10 ,
+} ;
+const long double K21[] = {
+ 0.29262268723143477922L ,   // basis(0)
+ 0.2738298047486301248L ,   // basis(0.5)
+ 0.22428009387883276411L ,   // basis(1)
+ 0.16054821266164454585L ,   // basis(1.5)
+ 0.10019429073492722872L ,   // basis(2)
+ 0.054315966627272970968L ,   // basis(2.5)
+ 0.025451983263662738633L ,   // basis(3)
+ 0.010243000848845290252L ,   // basis(3.5)
+ 0.0035111077726313273026L ,   // basis(4)
+ 0.0010143045932529873796L ,   // basis(4.5)
+ 0.000243612424661332394L ,   // basis(5)
+ 4.7797839244413500002e-05L ,   // basis(5.5)
+ 7.4865177795402407054e-06L ,   // basis(6)
+ 9.0756157943914249454e-07L ,   // basis(6.5)
+ 8.1587909794275973583e-08L ,   // basis(7)
+ 5.1150819066710363876e-09L ,   // basis(7.5)
+ 2.0383683775099098269e-10L ,   // basis(8)
+ 4.4482237420372396468e-12L ,   // basis(8.5)
+ 4.104700189226971567e-14L ,   // basis(9)
+ 9.7627580792412901893e-17L ,   // basis(9.5)
+ 1.9572941063391261232e-20L ,   // basis(10)
+ 9.3331055943447405012e-27L ,   // basis(10.5)
+ } ; 
+const double Poles_21[] = {
+-0.79876288565466957436 ,
+-0.50489153745536197171 ,
+-0.30982319641503575092 ,
+-0.17988466679726275443 ,
+-0.095200812461283090826 ,
+-0.043213918440668783183 ,
+-0.01504549998728420962 ,
+-0.0031720039638856827036 ,
+-0.00021990295763158517806 ,
+-4.7797646894259869337e-07 ,
+} ;
+const long double K22[] = {
+ 0.28627661405538603955L ,   // basis(0)
+ 0.26864594027689889732L ,   // basis(0.5)
+ 0.22191207309687150606L ,   // basis(1)
+ 0.1611512144701082552L ,   // basis(1.5)
+ 0.10265788953426401334L ,   // basis(2)
+ 0.057185290104801063607L ,   // basis(2.5)
+ 0.027736783120003498267L ,   // basis(3)
+ 0.011649203759035082664L ,   // basis(3.5)
+ 0.0042065557982618627852L ,   // basis(4)
+ 0.0012943433274091186101L ,   // basis(4.5)
+ 0.00033552928198532912351L ,   // basis(5)
+ 7.2224788646371748003e-05L ,   // basis(5.5)
+ 1.2671383794748147439e-05L ,   // basis(6)
+ 1.7682350579090077749e-06L ,   // basis(6.5)
+ 1.8993891467043433632e-07L ,   // basis(7)
+ 1.5010206322471487409e-08L ,   // basis(7.5)
+ 8.1770577437810697864e-10L ,   // basis(8)
+ 2.7833247876855379198e-11L ,   // basis(8.5)
+ 5.0557094184087925374e-13L ,   // basis(9)
+ 3.7315643098318982977e-15L ,   // basis(9.5)
+ 6.6564259722400510509e-18L ,   // basis(10)
+ 8.8967913924505732872e-22L ,   // basis(10.5)
+ 2.1211603623510773867e-28L ,   // basis(11)
+ } ; 
+const double Poles_22[] = {
+-0.80662949916286152963 ,
+-0.52057023687190062677 ,
+-0.3274164733138280603 ,
+-0.19635282650762261869 ,
+-0.10887245188483440916 ,
+-0.053181604599218119944 ,
+-0.021035660929842874001 ,
+-0.0057066136460001649564 ,
+-0.00072254796507928529137 ,
+-1.3458154983225084633e-05 ,
+-3.186643260432269507e-11 ,
+} ;
+const long double K23[] = {
+ 0.28032619854980754502L ,   // basis(0)
+ 0.26374269458034057742L ,   // basis(0.5)
+ 0.21956831005031718209L ,   // basis(1)
+ 0.16156340331433543452L ,   // basis(1.5)
+ 0.1048741828768824975L ,   // basis(2)
+ 0.059888404600676471811L ,   // basis(2.5)
+ 0.029974159449075470104L ,   // basis(3)
+ 0.013085403104047330802L ,   // basis(3.5)
+ 0.0049523097091663721335L ,   // basis(4)
+ 0.0016124087669444304968L ,   // basis(4.5)
+ 0.00044731411734139782554L ,   // basis(5)
+ 0.0001044647630135970384L ,   // basis(5.5)
+ 2.0225085344373592521e-05L ,   // basis(6)
+ 3.1828904692399063536e-06L ,   // basis(6.5)
+ 3.9679866129008683199e-07L ,   // basis(7)
+ 3.7855233852927286935e-08L ,   // basis(7.5)
+ 2.6346734890183937921e-09L ,   // basis(8)
+ 1.2488410498396141086e-10L ,   // basis(8.5)
+ 3.6338307165683742375e-12L ,   // basis(9)
+ 5.4959585554808751978e-14L ,   // basis(9.5)
+ 3.2448470402629826029e-16L ,   // basis(10)
+ 4.3411473752750814749e-19L ,   // basis(10.5)
+ 3.868170170630684038e-23L ,   // basis(11)
+ 4.6112181790240812755e-30L ,   // basis(11.5)
+ } ; 
+const double Poles_23[] = {
+-0.81390562354320794558 ,
+-0.53531408371104993726 ,
+-0.34423627688965990901 ,
+-0.21240466055269885404 ,
+-0.12256116098899572098 ,
+-0.063602480154273194346 ,
+-0.027811662038017159748 ,
+-0.0090795953352833073946 ,
+-0.0017112714467820973156 ,
+-9.5733943500721317005e-05 ,
+-1.1936918816067781773e-07 ,
+} ;
+const long double K24[] = {
+ 0.27473197352118810147L ,   // basis(0)
+ 0.25909593388549224613L ,   // basis(0.5)
+ 0.21725612218406020861L ,   // basis(1)
+ 0.16181208211791016533L ,   // basis(1.5)
+ 0.10686656672959712099L ,   // basis(2)
+ 0.062431425854373209442L ,   // basis(2.5)
+ 0.032156816325798337903L ,   // basis(3)
+ 0.014541849599514216045L ,   // basis(3.5)
+ 0.0057429446266243922923L ,   // basis(4)
+ 0.0019676174028389475043L ,   // basis(4.5)
+ 0.00058004996270088237076L ,   // basis(5)
+ 0.00014563543156618789351L ,   // basis(5.5)
+ 3.0746018052888292381e-05L ,   // basis(6)
+ 5.3704036096147168721e-06L ,   // basis(6.5)
+ 7.6016977670631529332e-07L ,   // basis(7)
+ 8.4861949009616751487e-08L ,   // basis(7.5)
+ 7.2045281870976666726e-09L ,   // basis(8)
+ 4.4229185004672962615e-10L ,   // basis(8.5)
+ 1.8261499938887221924e-11L ,   // basis(9)
+ 4.5452628388307088642e-13L ,   // basis(9.5)
+ 5.7253638111923437033e-15L ,   // basis(10)
+ 2.70404290721556569e-17L ,   // basis(10.5)
+ 2.7132171099984410352e-20L ,   // basis(11)
+ 1.6117375710961183492e-24L ,   // basis(11.5)
+ 9.6067045396335026575e-32L ,   // basis(12)
+ } ; 
+const double Poles_24[] = {
+-0.82065517417952760226 ,
+-0.54920097364808984075 ,
+-0.3603190653178175995 ,
+-0.22802014939914075353 ,
+-0.13618849963046011919 ,
+-0.074351497302516889043 ,
+-0.035244126673212937406 ,
+-0.013246375325256078484 ,
+-0.0032976826232791502103 ,
+-0.00035807154412069458092 ,
+-4.8126755630580574097e-06 ,
+-3.5407088073360672255e-12 ,
+} ;
+
+const double* precomputed_poles[] = {
+  0, 
+  0, 
+  Poles_2, 
+  Poles_3, 
+  Poles_4, 
+  Poles_5, 
+  Poles_6, 
+  Poles_7, 
+  Poles_8, 
+  Poles_9, 
+  Poles_10, 
+  Poles_11, 
+  Poles_12, 
+  Poles_13, 
+  Poles_14, 
+  Poles_15, 
+  Poles_16, 
+  Poles_17, 
+  Poles_18, 
+  Poles_19, 
+  Poles_20, 
+  Poles_21, 
+  Poles_22, 
+  Poles_23, 
+  Poles_24, 
+} ;
+
+const long double* precomputed_basis_function_values[] = {
+  K0, 
+  K1, 
+  K2, 
+  K3, 
+  K4, 
+  K5, 
+  K6, 
+  K7, 
+  K8, 
+  K9, 
+  K10, 
+  K11, 
+  K12, 
+  K13, 
+  K14, 
+  K15, 
+  K16, 
+  K17, 
+  K18, 
+  K19, 
+  K20, 
+  K21, 
+  K22, 
+  K23, 
+  K24, 
+} ;
+
+} ; // end of namespace vspline_constants
+
+#define VSPLINE_POLES_H
+#endif
diff --git a/prefilter.h b/prefilter.h
new file mode 100644
index 0000000..96272b9
--- /dev/null
+++ b/prefilter.h
@@ -0,0 +1,207 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file prefilter.h
+
+    \brief Code to create the coefficient array for a b-spline.
+    
+    Note: the bulk of the code was factored out to filter.h, while this text still
+    outlines the complete filtering process.
+    
+    The coefficients can be generated in two ways (that I know of): the first
+    is by solving a set of equations which encode the constraints of the spline.
+    A good example of how this is done can be found in libeinspline. I term it
+    the 'linear algebra approach'. In this implementation, I have chosen what I
+    call the 'DSP approach'. In a nutshell, the DSP approach looks at the b-spline's
+    reconstruction by convolving the coefficients with a specific kernel. This
+    kernel acts as a low-pass filter. To counteract the effect of this filter and
+    obtain the input signal from the convolution of the coefficients, a high-pass
+    filter with the inverse transfer function to the low-pass is used. This high-pass
+    has infinite support, but can still be calculated precisely within the bounds of
+    the arithmetic precision the CPU offers, due to the properties it has.
+    
+    I recommend [CIT2000] for a formal explanation. At the core of my prefiltering
+    routines there is code from Philippe Thevenaz' accompanying code to this paper,
+    with slight modifications translating it to C++ and making it generic.
+    The greater part of this file deals with 'generifying' the process and to
+    employing multithreading and the CPU's vector units to gain speed.
+    
+    This code makes heavy use of vigra, which provides handling of multidimensional
+    arrays and efficient handling of aggreagte types - to only mention two of it's
+    many qualities. The vectorization is done with Vc, which allowed me to code
+    the horizontal vectorization I use in a generic fashion.
+    
+    In another version of this code I used vigra's BSPlineBase class to obtain prefilter
+    poles. This required passing the spline degree/order as a template parameter. Doing it
+    like this allows to make the Poles static members of the solver, but at the cost of
+    type proliferation. Here I chose not to follow this path and pass the spline order as a
+    parameter to the spline's constructor, thus reducing the number of solver specializations
+    and allowing automated testing with loops over the degree. This variant is slightly slower.
+
+    In addition to the code following the 'implicit scheme' proposed by Thevenaz, I provide
+    code to use an 'explicit scheme' to obtain the b-spline coefficients. The implicit scheme
+    makes assumptions about the continuation of the signal outside of the window of data which
+    is acceessible: that the data continue mirrored, reflected, etc. - but it proceeds to
+    capture these assumptions in formulae deriving suitable initial causal/anticausal coefficients
+    from them. Usually this is done with a certain 'horizon' which takes into account the limited
+    arithmetic precision of the calculations and abbreviates the initial coefficient calculation
+    to a certain chosen degree of precision. The same effect can be achieved by simply embedding
+    the knot point data into a frame containing extrapolated knot point data. If the frame is
+    chosen so wide that margin effects don't 'disturb' the core data, we end up with an equally
+    (im)precise result with an explicit scheme. The width of the frame now takes the roll of the
+    horizon used in the implicit scheme and has the same effect. While the explicit scheme needs
+    more memory, it has several advantages:
+
+    - there is no need to code specific routines for initial coefficient generation
+    - nor any need to explicitly run this code
+    - the iteration over the input becomes more straightforward
+    - any extrapolation scheme can be used easily
+
+    A disadvantage, apart from the higher memory consumption, is that one cannot give a
+    'precise' solution, which the implicit scheme can do for the cases it can handle. But what
+    is 'precise'? Certainly there is no precision beyond the arithmetic precision offered by
+    the underlying system. So if the horizon is chosen wide enough, the resulting coefficients
+    become the same with all schemes. They are interchangeable.
+
+    In an image-processing context, the extra memory needed would typically be a small
+    single-digit percentage - not really a bother. In my trials, I found the runtime differences
+    between the two approaches negligible and the simplification of the code so attractive that
+    I was tempted to choose the explicit scheme over the implicit. Yet since the code for the
+    implicit scheme is there already and some of it is even used in the explicit scheme I keep
+    both methods in the code base for now.
+
+    [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000,
+*/
+
+#ifndef VSPLINE_PREFILTER_H
+#define VSPLINE_PREFILTER_H
+
+#include "common.h"
+#include "filter.h"
+#include "basis.h"
+
+namespace vspline {
+
+using namespace std ;
+using namespace vigra ;
+
+/// With large data sets, and with higher dimensionality, processing separately along each
+/// axis consumes a lot of memory bandwidth. There are ways out of this dilemma by interleaving
+/// the code. Disregarding the calculation of initial causal and anticausal coefficients, the code
+/// to do this would perform the forward filtering step for all axes at the same time and then, later,
+/// the backward filtering step for all axes at the same time. This is possible, since the order
+/// of the filter steps is irrelevant, and the traversal of the data can be arranged so that
+/// values needed for context of the filter are always present (the filters are recursive and only
+/// 'look' one way). I have investigated these variants, but especially the need to calculate
+/// initial causal/anticausal coefficients, and the additional complications arising from
+/// vectorization, have kept me from choosing this path for the current body of code. With the
+/// inclusion of the explicit scheme for prefiltering, dimension-interleaved prefiltering becomes
+/// more feasible, and I anticipate revisiting it.
+///
+/// Here I am using a scheme where I make access to 1D subsets of the data very efficient (if necessary
+/// by buffering lines/stripes of data) and rely on the fact that such simple, fast access plays
+/// well with the compiler's optimizer and pipelining in the CPU. From the trials on my own system
+/// I conclude that this approach does not perform significantly worse than interleaving schemes
+/// and is much easier to formulate and understand. And with fast access to 1D subsets, higher order
+/// splines become less of an issue; the extra arithemtic to prefilter for, say, quintic splines is
+/// done very quickly, since no additional memory access is needed beyond a buffer's worth of data
+/// already present in core memory.
+///
+/// solve is just a thin wrapper around filter_nd in filter.h, injecting the actual number of poles
+/// and the poles themselves.
+///
+/// Note how smoothing comes into play here: it's done simply by
+/// prepending an additional pole to the filter cascade, taking a positive value between
+/// 0 (no smoothing) and 1 (total blur) if 'smoothing' is not 0.0. While I'm not sure about
+/// the precise mathematics (yet) this does what is intended very efficiently. Why smoothe?
+/// If the signal is scaled down when remapping, we'd have aliasing of higher frequencies
+/// into the output, producing artifacts. Pre-smoothing with an adequate factor removes the
+/// higher frequencies (more or less), avoiding the problem.
+///
+/// Using this simple method, pre-smoothing is computationally cheap, but the method used
+/// here isn't equivalent to convolving with a gaussian, though the effect is quite similar.
+/// I think the method is called exponential smoothing.
+
+// TODO: establish the proper maths for this smoothing method
+
+template < typename input_array_type ,  ///< type of array with knot point data
+           typename output_array_type , ///< type of array for coefficients (may be the same)
+           typename math_type >         ///< type for arithmetic operations in filter
+void solve ( input_array_type & input ,
+             output_array_type & output ,
+             TinyVector<bc_code,input_array_type::actual_dimension> bcv ,
+             int degree ,
+             double tolerance ,
+             double smoothing = 0.0 ,
+             int njobs = default_njobs )
+{
+  if ( smoothing != 0.0 )
+  {
+    assert ( smoothing > 0.0 && smoothing < 1.0 ) ;
+    int npoles = degree / 2 + 1 ;
+    double *pole = new double [ npoles ] ;
+    pole[0] = smoothing ;
+    for ( int i = 1 ; i < npoles ; i++ )
+      pole[i] = vspline_constants::precomputed_poles [ degree ] [ i - 1 ] ;
+    
+    filter_nd < input_array_type , output_array_type , math_type >
+              ( input ,
+                output ,
+                bcv ,
+                npoles ,
+                pole ,
+                tolerance ,
+                njobs ) ;
+                
+    delete[] pole ;
+  }
+  else
+    filter_nd < input_array_type , output_array_type , math_type >
+              ( input ,
+                output ,
+                bcv ,
+                degree / 2 ,
+                vspline_constants::precomputed_poles [ degree ] ,
+                tolerance ,
+                njobs ) ;
+}
+
+} ; // namespace vspline
+
+#endif // VSPLINE_PREFILTER_H
diff --git a/prefilter_poles.cc b/prefilter_poles.cc
new file mode 100644
index 0000000..5f2d11a
--- /dev/null
+++ b/prefilter_poles.cc
@@ -0,0 +1,176 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform rational b-splines                           */
+/*                                                                      */
+/*            Copyright 2015, 2016 by Kay F. Jahnke                     */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file prefilter_poles.cc
+
+    \brief calculates the poles of the b-spline prefilter using gsl and BLAS
+
+    this doesn't have to be done for installing vspline if poles.cc is
+    already present. Providing degrees up to 24 is just about what gsl
+    can handle, with such high degrees the evaluation becomes quite imprecise
+    as well, especialy for floats.
+
+    compile with:
+    g++ -std=c++11 prefilter_poles.cc -oprefilter_poles -lgsl -lblas
+    
+    run
+    
+    ./prefilter_poles > poles.cc
+
+    TODO: could do with some TLC...
+*/
+
+#include <iostream>
+#include <iomanip>
+
+#include <vigra/array_vector.hxx>
+#include <vigra/splines.hxx>
+#include <gsl/gsl_poly.h>
+#include <vspline/basis.h>
+
+using namespace std ;
+using namespace vigra ;
+
+// template < class real_type >
+// real_type bspline_basis ( real_type x , int degree , int derivative = 0 )
+// {
+//   if ( degree == 0 )
+//   {
+//     if ( derivative == 0 )
+//         return x < real_type(0.5) && real_type(-0.5) <= x ?
+//                   real_type(1.0)
+//                 : real_type(0.0);
+//     else
+//         return real_type(0.0);
+//   }
+//   if ( derivative == 0 )
+//   {
+//     real_type n12 = real_type((degree + 1.0) / 2.0);
+//     return (     ( n12 + x )
+//                 * bspline_basis ( x + real_type(0.5) , degree - 1 , 0 )
+//               +   ( n12 - x )
+//                 * bspline_basis ( x - real_type(0.5) , degree - 1 , 0 )
+//             )
+//             / degree;
+//   }
+//   else
+//   {
+//     --derivative;
+//     return   bspline_basis ( x + real_type(0.5) , degree - 1 , derivative )
+//            - bspline_basis ( x - real_type(0.5) , degree - 1 , derivative ) ;
+//   }
+// }
+
+template < class T >
+ArrayVector<double> 
+calculatePrefilterCoefficients(int DEGREE)
+{
+    ArrayVector<double> res;
+    const int r = DEGREE / 2;
+    double a[2*r+1] ;
+    double z[4*r+2] ;
+    cout << "const long double K" << DEGREE << "[] = {" << endl ;
+    // we calculate the basis function values at 0.5 intervals
+    int imax = 2 * r ;
+    if ( DEGREE & 1 )
+      imax++ ;
+    for(int i = 0; i <= imax ; ++i)
+    {
+      long double half_i = i / (long double) 2.0 ;
+      long double v = vspline::gen_bspline_basis<long double> ( half_i , DEGREE , 0 ) ;
+      cout << " " << v << "L ,   // basis(" << half_i << ")" << endl ;
+      if ( ! ( i & 1 ) )
+      {
+        // for even i, we put the value in a[] as well - only even i
+        // correspond to the value of the basis function at integral values
+        // which we need for the poles
+        int ih = i / 2 ;
+        a [ r - ih ] = a [ r + ih ] = v ;
+      }
+    }
+    cout << " } ; " << endl ;
+        
+    if(DEGREE > 1)
+    {
+        ArrayVector<double> roots;
+	
+	// we set up the environment gsl needs to find the roots
+	gsl_poly_complex_workspace * w 
+          = gsl_poly_complex_workspace_alloc (2*r+1);
+	// now we call gsl's root finder
+        gsl_poly_complex_solve (a, 2*r+1, w, z);
+	// and release it's workspace
+        gsl_poly_complex_workspace_free (w);
+
+	// we only look at the real parts of the values, which are stored
+	// interleaved real/imag. And we take them back to front, even though
+	// it doesn't matter to Thevenaz' code which end we start with - but conventionally
+	// Pole[0] is the root with the largest absolute, so I stick with that.
+        for(int i = 2 * r - 2 ; i >= 0; i-=2)
+            if(VIGRA_CSTD::fabs(z[i]) < 1.0)
+                res.push_back(z[i]);
+    }
+    return res;
+}
+
+// TODO ugly mishmash of prints and calculations...
+
+void print_poles ( int degree )
+{
+  ArrayVector<double> res = calculatePrefilterCoefficients<double> ( degree ) ;
+  if ( degree > 1 )
+  {
+    cout << "const double Poles_" << degree << "[] = {" << endl ;
+    for ( auto r : res )
+      cout << r << " ," << endl ;
+    cout << "} ;" << endl ;
+  }
+}
+
+int main ( int argc , char * argv[] )
+{
+  cout << setprecision(20) ;
+  
+  for ( int degree = 0 ; degree < 25 ; degree++ )
+    print_poles(degree) ;
+  
+  cout << noshowpos ;
+  cout << "const double* precomputed_poles[] = {" << endl ;
+  cout << "  0, " << endl ;
+  cout << "  0, " << endl ;
+  for ( int i = 2 ; i < 25 ; i++ )
+    cout << "  Poles_" << i << ", " << endl ;
+  cout << "} ;" << endl ;
+  cout << "const long double* precomputed_basis_function_values[] = {" << endl ;
+  for ( int i = 0 ; i < 25 ; i++ )
+    cout << "  K" << i << ", " << endl ;
+  cout << "} ;" << endl ;
+}
diff --git a/remap.h b/remap.h
new file mode 100644
index 0000000..81805b8
--- /dev/null
+++ b/remap.h
@@ -0,0 +1,1339 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// \file remap.h
+///
+/// \brief set of generic remap functions
+///
+/// My foremost reason to have efficient B-spline processing is the formulation of
+/// a generic remap function. This is a function which takes an array of real-valued
+/// nD coordinates and an interpolator over a source array. Now each of the real-valued
+/// coordinates is fed into the interpolator in turn, yielding a value, which is placed
+/// in the output array at the same place the coordinate occupies in the coordinate
+/// array. To put it concisely, if we have
+///
+/// - c, the coordinate array (or 'warp' array)
+/// - a, the source array
+/// - i, the interpolator over a
+/// - j, the coordinates in c
+/// - and t, the target
+///
+/// remap defines the operation
+///
+/// t[j] = i(a,c[j]) for all j
+///
+/// Now we widen the concept of remapping to something which is more like a 'transform'
+/// function. Instead of limiting the process to the use of an 'interpolator', we use
+/// an arbitrary unary functor to transform incoming values to outgoing values, where
+/// the type of the incoming and outgoing values is determined by the functor. If the
+/// functor actually is an interpolator, we have a 'true' remap transforming coordinates
+/// into values, but this is merely a special case.
+///
+/// st_remap is the single_threaded implementation; remap itself partitions it's work
+/// and feeds several threads, each running one instance of st_remap.
+///
+/// remap takes two template arguments:
+///
+/// - unary_functor_type: functor object yielding values for coordinates
+/// - dim_target:         number of dimensions of output array
+///
+/// remaps to other-dimensional objects are supported. This makes it possible to,
+/// for example, remap from a volume to a 2D image, using a 2D warp array containing
+/// 3D coordinates.
+///
+/// There is also a second set of remap functions in this file, which don't take a
+/// 'warp' array. Instead, for every target location, the location's discrete coordinates
+/// are passed to the unary_functor_type object. This way, transformation-based remaps
+/// can be implemented easily: the user code just has to provide a suitable functor
+/// to yield values for coordinates. This interpolator will internally take the discrete
+/// incoming coordinates (into the target array) and transform them as required, internally
+/// producing coordinates suitable for the 'actual' interpolation using a b-spline or some
+/// other object capable of producing values for real coordinates. The routine offering
+/// this service is called index_remap, and only takes one template argument, which is
+/// enough to derive all other types involved:
+///
+/// - unary_functor_type: functor object yielding values for coordinates
+///
+/// This file also has code to evaluate a b-spline at positions in a mesh grid, which can
+/// be used for scaling, and for separable geometric transformations.
+///
+/// The current implementation of the remap functionality uses a straightforward mode of
+/// operation, which factors out the various needed tasks into separate bits of code. The
+/// result data are acquired by 'pulling' them into the target array by repeatedly calling
+/// a functor yielding the results. This functor is a closure containing all logic needed
+/// to produce the result values in scan order of the target array. While the two remap
+/// routines and grid_eval should cover most use cases, it's quite possible to use the
+/// routine fill() itself, passing in a suitable functor.
+///
+/// While the code presented here is quite involved and there are several types and routines
+/// the use(fulness) of which isn't immediately apparent, most use cases will be able to get
+/// by using only remap() or index_remap().
+///
+/// Finally, this file also has a routine to restore the original knot point data from a
+/// bspline object. This is done very efficiently using grid_eval().
+///
+/// Note: Currently, the calls to multithread() are hardwired to use partition_to_tiles()
+/// as their partitioner. partition_to_tiles() falls back to partition_to_stripes() if
+/// it's 'own' partitioning scheme fails to produce the desired number of parts or if
+/// the data are 3D and higher. This way, most use cases should receive adequate treatment.
+
+#ifndef VSPLINE_REMAP_H
+#define VSPLINE_REMAP_H
+
+#include "multithread.h"
+#include "eval.h"
+
+namespace vspline {
+
+using namespace std ;
+using namespace vigra ;
+
+template < int dimension >
+using bcv_type = vigra::TinyVector < bc_code , dimension > ;
+
+/// struct _fill contains the implementation of the 'engine' used for remap-like
+/// functions. The design logic is this: a remap will ultimately produce an array
+/// of results. This array is filled in standard scan order sequence by repeated
+/// calls to a functor containing all the logic to produce values in the required
+/// order. The functor is like a closure, being set up initially with all parameters
+/// needed for the task at hand (like with a warp array, a transformation, a genuine
+/// generator function etc.). Since the functor controls the path of the calculation
+/// from whatever starting point to the production of the final result, there are no
+/// intermediate containers for intermediate results. Since the remap process is
+/// mainly memory-bound, this strategy helps keeping memory use low. The data can
+/// be produced one by one, but the code has vectorized operation as well, which
+/// brings noticeable performance gain. With vectorized operation, instead of producing
+/// single values, the engine produces bunches of values. This operation is transparent
+/// to the caller, since the data are deposited in normal interleaved fashion. The
+/// extra effort for vectorized operation is in the implementation of the generator
+/// functor and reasonably straightforward. If only the standard remap functions
+/// are used, the user can remain ignorant of the vectorization.
+///
+/// Why is _fill an object? It might be a function if partial specialization of functions
+/// were allowed. Since this is not so, we have to use a class and put the function in it's
+/// operator(). Now we can partially specialize the object with the desired effect.
+///
+/// struct _fill's operator() takes an object of class generator_type. This object
+/// has to satisfy a few requirements:
+///
+/// - it has to have an overloaded operator() with two signatures: one taking
+///   a pointer to vsize value_type, one taking a reference to a single value_type.
+///   these arguments specify where to deposit the generator's output.
+///
+/// - it has to offer a bindOuter routine producing a subdimensional generator
+///   to supply values for a slice of output
+///
+/// - it has to offer a subrange routine, limiting output to a subarray
+///   of the 'whole' output
+
+//  TODO might write an abstract base class specifying the interface
+
+/// In the current implementation, the hierarchical descent to subdimensional slices
+/// is always taken to the lowest level, leaving the actual calls of the functor to
+/// occur there. While the hierarchical access may consume some processing time, mainly
+/// to establish the bounds for the 1D operation - but possibly optimized away,
+/// the operation on 1D data can use optimizations which gain more than is needed
+/// for the hierarchical descent. Especially when vectorized code is used, operation
+/// on 1D data is very efficient, since the data can be accessed using load/store
+/// or gather/scatter operations, even when the arrays involved are strided.
+/// Taking the hierarchical descent down to level 0 is encoded in fill() and it's
+/// workhorse code, the generator objects implemented here depend on the descent
+/// going all the way down to 1D.
+///
+/// _fill is used by st_fill. It's an object implementing an hierarchical fill
+/// of the target array. This is done recursively. while the generator's and the
+/// target's dimension is greater 1 they are sliced and the slices are processed
+/// with the routine one level down.
+
+template < typename generator_type  , // functor object yielding values
+           int dim_out >              // number of dimensions of output array
+struct _fill
+{
+  void operator() ( generator_type & gen ,
+                    MultiArrayView < dim_out , typename generator_type::value_type >
+                      & output )
+  {
+      // we're not yet at the intended lowest level of recursion,
+      // so we slice output and generator and feed the slices to the
+      // next lower recursion level
+      for ( int c = 0 ; c < output.shape ( dim_out - 1 ) ; c++ )
+      {
+        // recursively call _fill for each slice along the highest axis
+        auto sub_output = output.bindOuter ( c ) ;
+        auto sub_gen = gen.bindOuter ( c ) ;
+        _fill < decltype ( sub_gen ) , dim_out - 1 >()
+          ( sub_gen , sub_output ) ;
+      }
+  }
+} ;
+
+/// specialization of _fill for level 0 ends the recursive descent
+
+template < typename generator_type > // functor object yielding values
+struct _fill < generator_type , 1 >
+{
+  void operator() ( generator_type & gen ,
+                    MultiArrayView < 1 , typename generator_type::value_type >
+                      & output )
+  {
+    typedef typename generator_type::value_type value_type ;
+    typedef typename generator_type::functor_type functor_type ;
+    
+    auto target_it = output.begin() ;  
+    int leftover = output.elementCount() ;
+
+#ifdef USE_VC
+
+    // TODO: browsing Vc's code base, I noticed the undocumented functions
+    // simd_for_each and simd_for_each_n, which do simple iterations over
+    // contiguous single-channel memory - the code is in 'algorithms.h'.
+    // what's interesting there is that the code iterates with scalar
+    // values until it has reached an aligned address. then it continues
+    // by passing vectors to the unary functor as long as full vectors
+    // can be found, and finally the remaining values are also passed as
+    // scalars. The effect is that the central loop which is processing
+    // vectors will certainly load from an aligned adress, and hence the
+    // load operation can be issued with Vc::Aligned set true. I should
+    // consider using a similar scheme here, rather than starting loads
+    // from the first address and mopping up stragglers.
+
+    enum { vsize = generator_type::vsize } ;
+    enum { dimension = vigra::ExpandElementResult < value_type > :: size } ;
+    enum { advance = dimension * vsize } ;
+    
+    typedef typename vigra::ExpandElementResult < value_type > :: type ele_type ;
+    ele_type * dp = (ele_type*) ( output.data() ) ;
+    
+    typedef typename vector_traits < ele_type , vsize > :: type simdized_type ;
+    typedef typename vector_traits < ele_type , vsize > :: ele_v ele_v ;
+    typedef typename ele_v::IndexType index_type ;
+    
+    int aggregates = leftover / vsize ; // number of full vectors
+    leftover -= aggregates * vsize ;    // remaining leftover single values
+
+    typename functor_type::out_v target_buffer ;
+    
+    if ( output.isUnstrided() )
+    {
+      if ( dimension == 1 )
+      {
+        // best case: unstrided operation on 1D data, we can use
+        // efficient SIMD store operation        
+        for ( int a = 0 ; a < aggregates ; a++ , dp += advance )
+        {
+          gen ( target_buffer ) ;
+          // and store it to destination, using vspline::store from
+          // load_store.h, which automatically picks the most efficient
+          // code for the operation and handles strides as well.
+          target_buffer[0] . store ( dp ) ;
+        }
+      }
+      else
+      {
+        // second best: unstrided operation on nD data
+        for ( int a = 0 ; a < aggregates ; a++ , dp += advance )
+        {
+          gen ( target_buffer ) ;
+          // and store it to destination, using vspline::store from
+          // load_store.h, which automatically picks the most efficient
+          // code for the operation and handles strides as well.
+          for ( int e = 0 ; e < dimension ; e++ )
+            target_buffer[e].scatter
+              ( dp + e , index_type::IndexesFromZero() * dimension ) ;
+        }
+      }
+    }
+    else
+    {
+      auto strided_advance = advance * output.stride(0) ;
+      for ( int a = 0 ; a < aggregates ; a++ , dp += strided_advance )
+      {
+        // here we generate to a simdized target type
+        gen ( target_buffer ) ;
+        // and store it to destination using a strided store.
+        for ( int e = 0 ; e < dimension ; e++ )
+          target_buffer[e].scatter
+            ( dp + e ,
+              index_type::IndexesFromZero() * dimension * output.stride(0) ) ;
+      }
+    }        
+    // if there aren't any leftovers, we can return straight away.
+    if ( ! leftover )
+      return ;
+
+    // otherwise, advance target_it to remaining single values
+    target_it += aggregates * vsize ;
+    
+#endif // USE_VC
+
+    // process leftovers. If vc isn't used, this loop does all the processing
+    while ( leftover-- )
+    {
+      // process leftovers with single-value evaluation
+      gen ( *target_it ) ;
+      ++target_it ;
+    }
+  }
+} ;
+
+/// single-threaded fill. This routine receives the range to process and the generator
+/// object capable of providing result values. The generator object is set up to provide
+/// values for the desired subrange and then passed to _fill, which handles the calls to
+/// the generator object and the depositing of the result values into the target array.
+
+template < typename generator_type  , // functor object yielding values
+           int dim_out >              // number of dimensions of output array
+void st_fill ( shape_range_type < dim_out > range ,
+               generator_type * const       p_gen ,
+               MultiArrayView < dim_out , typename generator_type::value_type > * p_output )
+{
+  // pick out output's subarray specified by 'range'
+
+  auto output = p_output->subarray ( range[0] , range[1] ) ;
+  
+  // get a new generator to cover the same range. we need an instance here.
+  // the generator carries state, we're in the single thread, processing one
+  // chunk out of the partitioning, so the generator we have here won't be
+  // used by other threads (which would be wrong, since it carries state).
+  // but it may be subdivided in yet more generators if fill decides to slice
+  // it and process slices.
+  
+  auto gen = p_gen->subrange ( range ) ;
+  
+  // have the results computed and put into the target
+
+  _fill < generator_type , dim_out >() ( gen , output ) ;
+}
+
+/// multithreaded fill. This is the top-level fill routine. It takes a functor capable
+/// of delivering successive result values (in the target array's scan order), and calls
+/// this functor repeatedly until 'output' is full.
+/// this task is distributed to several worker threads by means of 'multithread', which in
+/// turn uses st_fill, the single-threaded fill routine.
+
+template < typename generator_type  , // functor object yielding values
+           int dim_target >           // number of dimensions of output array
+void fill ( generator_type & gen ,
+            MultiArrayView < dim_target , typename generator_type::value_type >
+              & output )
+{
+  // set up 'range' to cover a complete array of output's size
+  
+  shape_range_type < dim_target > range ( shape_type < dim_target > () ,
+                                          output.shape() ) ;
+
+  // heuristic. minumum desired number of partitions; partition_to_tiles
+  // only uses this value when it delegates to partition_to_stripes.
+
+  int njobs = vspline::common_thread_pool.get_nthreads() ;
+
+  // call multithread(), specifying the single-threaded fill routine as the functor
+  // to invoke the threads, passing in 'range', which will be partitioned by multithread(),
+  // followed by all the other parameters the single-threaded fill needs, which is
+  // pretty much the set of parameters we've received here, with the subtle difference
+  // that we can't pass anything on by reference and use pointers instead.
+
+  multithread ( & st_fill < generator_type , dim_target > ,
+                vspline::partition_to_tiles < dim_target > ,
+                njobs ,        // desired number of partitions
+                range ,        // 'full' range which is to be partitioned
+                &gen ,         // generator_type object
+                &output ) ;    // target array
+} ;
+
+/// Next we code 'generators' for use with fill(). These objects can yield values
+/// to the fill routine, each in it's specific way. The first type we define is
+/// warp_generator. This generator yields data from an array, which, in the context
+/// of a remap-like function, will provide the coordinates to feed to the interpolator.
+/// First is warp_generator for dimensions > 1. Here we provide 'subrange' and
+/// 'bindOuter' to be used for the hierarchical descent in _fill. The current
+/// implementation relies of the hierarchical descent going all the way to 1D,
+/// and does not implement operator() until the 1D specialization.
+
+template < int dimension ,
+           typename unary_functor_type ,
+           bool strided_warp >
+struct warp_generator
+{
+  typedef unary_functor_type functor_type ;
+  
+  typedef typename unary_functor_type::out_type value_type ;
+  typedef typename unary_functor_type::in_type nd_rc_type ;
+  
+  typedef MultiArrayView < dimension , nd_rc_type > warp_array_type ;
+  
+  const warp_array_type warp ; // must not use reference here!
+  
+  const unary_functor_type & itp ;
+  
+  const unary_functor_type & get_functor()
+  {
+    return itp ;
+  }
+  
+  warp_generator
+    ( const warp_array_type & _warp ,
+      const unary_functor_type & _itp )
+  : warp ( _warp ) ,
+    itp ( _itp )
+  { } ;
+
+  warp_generator < dimension , unary_functor_type , strided_warp >
+    subrange ( const shape_range_type < dimension > & range ) const
+  {
+    return warp_generator < dimension , unary_functor_type , strided_warp >
+             ( warp.subarray ( range[0] , range[1] ) , itp ) ;
+  }
+  
+  warp_generator < dimension - 1 , unary_functor_type , strided_warp >
+    bindOuter ( const int & c ) const
+  {
+    return warp_generator < dimension - 1 , unary_functor_type , strided_warp >
+             ( warp.bindOuter ( c ) , itp ) ;
+  }  
+} ;
+
+// previous implementation:
+// specialization of warp_generator for dimension 1. Here we have
+// (vectorized) evaluation code and subrange(), but no bindOuter().
+
+// template < typename unary_functor_type ,
+//            bool strided_warp >
+// struct warp_generator < 1 , unary_functor_type , strided_warp >
+// {
+//   typedef unary_functor_type functor_type ;
+//   
+//   typedef typename unary_functor_type::in_type nd_rc_type ;
+//   typedef typename unary_functor_type::out_type value_type ;
+//   
+//   typedef MultiArrayView < 1 , nd_rc_type > warp_array_type ;
+//   
+//   const warp_array_type warp ; // must not use reference here!
+//   const int stride ;
+//   const nd_rc_type * data ;
+//   typename warp_array_type::const_iterator witer ;
+//   
+//   const unary_functor_type & itp ;
+//   
+//   const unary_functor_type & get_functor()
+//   {
+//     return itp ;
+//   }
+//   
+//   warp_generator
+//     ( const warp_array_type & _warp ,
+//       const unary_functor_type & _itp )
+//   : warp ( _warp ) ,
+//     stride ( _warp.stride(0) ) ,
+//     itp ( _itp ) ,
+//     witer ( _warp.begin() ) ,
+//     data ( _warp.data() )
+//   {
+// #ifdef USE_VC
+//     int aggregates = warp.size() / vsize ;
+//     witer += aggregates * vsize ;
+// #endif
+//   } ;
+// 
+//   // in the context of this class, single value evaluation is only ever
+//   // used for mop-up action after all full vectors have been processed.
+//   // If Vc isn't used, this routine does all the work.
+// 
+//   void operator() ( value_type & target )
+//   {
+//     itp.eval ( *witer , target ) ;
+//     ++witer ;
+//   }
+// 
+// #ifdef USE_VC
+// 
+//   enum { vsize = unary_functor_type :: vsize } ;
+// 
+//   // operator() incorporates two variants, which depend on a template argument,
+//   // so the conditional has no run-time effect. The template argument target_type
+//   // determines the evaluation target and picks the appropriate eval variant.
+//   
+//   template < typename target_type >
+//   void operator() ( target_type & target )
+//   {
+//     typename unary_functor_type::in_v buffer ;
+// 
+//     // KFJ 2017-08-18 I've pushed load and store operations out of
+//     // class unary_functor and am now using generic free functions
+//     // in load_store.h. The idea is to reduce the required interface
+//     // of unary_functor so far that other objects can more easily
+//     // be made to take the place of objects derived from unary_functor.
+//     if ( strided_warp )
+//     {
+//       // if the warp array is strided, we use vspline::load
+//       // to assemble a simdized input value with load/gather operations      
+//       load ( data , buffer , stride ) ;
+//       // now we pass the simdized value to the evaluation routine
+//       itp.eval ( buffer , target ) ;
+//       data += vsize * stride ;
+//     }
+//     else
+//     {
+//       // otherwise we can collect them with an unstrided load operation,
+//       // which is more efficient for 1D data, the same for nD.
+//       load ( data , buffer ) ;
+//       itp.eval ( buffer , target ) ;
+//       data += vsize ;
+//     }
+//   }
+// 
+// #endif
+// 
+//   warp_generator < 1 , unary_functor_type , strided_warp >
+//     subrange ( const shape_range_type < 1 > & range ) const
+//   {
+//     return warp_generator < 1 , unary_functor_type , strided_warp >
+//              ( warp.subarray ( range[0] , range[1] ) , itp ) ;
+//   }
+// 
+// } ;
+
+template < typename unary_functor_type ,
+           bool strided_warp >
+struct warp_generator < 1 , unary_functor_type , strided_warp >
+{
+  typedef unary_functor_type functor_type ;
+  
+  typedef typename unary_functor_type::in_type nd_rc_type ;
+  typedef typename unary_functor_type::out_type value_type ;
+  
+  typedef MultiArrayView < 1 , nd_rc_type > warp_array_type ;
+  
+  const warp_array_type warp ; // must not use reference here!
+  
+  typedef typename unary_functor_type::in_ele_type ele_type ;
+  const ele_type * dp ;
+  typename warp_array_type::const_iterator witer ;
+  
+  const unary_functor_type & itp ;
+  
+  const unary_functor_type & get_functor()
+  {
+    return itp ;
+  }
+  
+  warp_generator
+    ( const warp_array_type & _warp ,
+      const unary_functor_type & _itp )
+  : warp ( _warp ) ,
+    itp ( _itp ) ,
+    witer ( _warp.begin() ) ,
+    dp ( (ele_type*) ( _warp.data() ) )
+  {
+#ifdef USE_VC
+    int aggregates = warp.size() / vsize ;
+    witer += aggregates * vsize ;
+#endif
+  } ;
+
+  // in the context of this class, single value evaluation is only ever
+  // used for mop-up action after all full vectors have been processed.
+  // If Vc isn't used, this routine does all the work.
+
+  void operator() ( value_type & target )
+  {
+    itp.eval ( *witer , target ) ;
+    ++witer ;
+  }
+
+#ifdef USE_VC
+
+  enum { vsize = unary_functor_type :: vsize } ;
+  enum { dimension = unary_functor_type::dim_in } ;
+  enum { advance = dimension * vsize } ;
+  
+  typedef typename vector_traits < ele_type , vsize > :: type simdized_type ;
+  typedef typename vector_traits < ele_type , vsize > :: ele_v ele_v ;
+  typedef typename ele_v::IndexType index_type ;
+
+  const index_type indexes = index_type::IndexesFromZero() * dimension ;
+  
+  typedef typename unary_functor_type::out_v target_type ;
+  typedef typename unary_functor_type::in_v source_type ;
+  
+  // initially I implemented a single operator() with conditionals on
+  // strided_warp and dimension, expecting that the compiler would
+  // pick out the right code without performance impact, but this turned
+  // out wrong. so now I'm using a dispatch mechanism which picks the
+  // appropriate code, effectively forcing the compiler to do the right
+  // thing. TODO: this teaches me a lesson. I think I have relied on
+  // dead code elimination in several places, so I may have to go through
+  // the inner loops looking for similar situations. The performance
+  // difference was not large but consistently measurable.
+  
+  // dispatch to the operator() variant for strided or unstrided warp.
+  // while the code for both variants is very similar, the differentiation
+  // is important, because the unstrided case can use advance (which is
+  // a compile-time constant) directly, while the second case has to
+  // multiply with the stride, which is a run-time value.
+
+  inline void operator() ( target_type & target )
+  {
+    operator() ( target ,
+                 std::integral_constant < bool , strided_warp > () ) ;
+  }
+  
+  // variant of operator() for strided warp arrays
+  // here we don't need to dispatch further, since the stride forces
+  // us to use gather operations even for 1D data.
+  
+  inline void operator() ( target_type & target ,
+                           std::true_type )       // strided warp array
+  {
+    source_type buffer ;
+    
+    for ( int e = 0 ; e < dimension ; e++ )
+      buffer[e].gather
+        ( dp + e , indexes * warp.stride(0) ) ; 
+
+    itp.eval ( buffer , target ) ;
+    dp += advance * warp.stride(0) ;
+  }
+  
+  // variant of operator() for unstrided warp arrays
+  // this variant of operator() further dispatches on 1D/nD data, which
+  // would be futile for strided data (which have to use gather anyway)
+  // but, with unstrided data, if the data are 1D, can result in a (fast)
+  // SIMD load operation.
+  
+  inline void operator() ( target_type & target ,
+                           std::false_type )       // unstrided warp array
+  {
+    typename unary_functor_type::in_v buffer ;
+    
+    load ( buffer , std::integral_constant < bool , dimension == 1 > () ) ;
+
+    itp.eval ( buffer , target ) ;
+    dp += advance ;
+  }
+  
+  // loading 1D data from unstrided memory can use SIMD load instruction:
+  
+  inline
+  void load ( source_type & buffer , std::true_type ) // 1D
+  {
+    buffer[0].load ( (const ele_type* const) dp ) ; 
+  }
+  
+  // nD data have to be gathered instead
+
+  inline
+  void load ( source_type & buffer , std::false_type ) // nD
+  {
+    for ( int e = 0 ; e < dimension ; e++ )
+      buffer[e].gather
+        ( dp + e , indexes ) ; 
+  }
+
+#endif
+
+  warp_generator < 1 , unary_functor_type , strided_warp >
+    subrange ( const shape_range_type < 1 > & range ) const
+  {
+    return warp_generator < 1 , unary_functor_type , strided_warp >
+             ( warp.subarray ( range[0] , range[1] ) , itp ) ;
+  }
+
+} ;
+
+/// implementation of remap() by delegation to the more general fill() routine, passing
+/// in the warp array and the interpolator via a generator object. Calling this routine
+/// remap() doesn't quite do it's scope justice, it might be more appropriate to call it
+/// transform(), since it applies a functor to a set of inputs yielding a set of outputs.
+/// This is a generalization of a remap routine: the remap concept looks at the incoming
+/// data as coordinates, at the functor as an interpolator yielding values for coordinates,
+/// and at the output as an array of thusly generated values. In this implementation of
+/// remap incoming and outgoing data aren't necessarily coordinates or the result of
+/// an interpolation, they can be any pair of types which the functor can handle.
+///
+/// remap takes two template arguments:
+///
+/// - 'unary_functor_type', which is a class satisfying the interface laid down in
+///   unary_functor.h. This is an object which can provide values given coordinates,
+///   like class evaluator, but generalized to allow for arbitrary ways of achieving
+///   it's goal
+///
+/// - 'dim_target' - the number of dimensions of the target array. While the number of
+///   dimensions of the source data is apparent from the unary_functor_type object, the
+///   target array's dimensionality may be different, like when picking 2D slices from
+///   a volume.
+///
+/// remap takes four parameters:
+///
+/// - a reference to a const unary_functor_type object providing the functionality needed
+///   to generate values from coordinates
+///
+/// - a reference to a const MultiArrayView holding coordinates to feed to the unary_functor_type
+///   object. I use the term 'warp array' for this array. It has to have the same shape as
+///   the target array.
+///
+/// - a reference to a MultiArrayView to use as a target. This is where the resulting
+///   data are put.
+
+template < typename unary_functor_type  , // functor object yielding values for coordinates
+           int dim_target >               // number of dimensions of output array
+void remap ( const unary_functor_type & ev ,
+             const MultiArrayView < dim_target ,
+                                    typename unary_functor_type::in_type > & warp ,
+             MultiArrayView < dim_target ,
+                              typename unary_functor_type::out_type > & output
+           )
+{
+  // check shape compatibility
+  
+  if ( output.shape() != warp.shape() )
+  {
+    throw shape_mismatch ( "remap: the shapes of the warp array and the output array do not match" ) ;
+  }
+
+  // we test if the warp array is unstrided in dimension 0. If that is so, even
+  // if it is strided in higher dimensions, via the hierarchical access we will
+  // eventually arrive in dimension 0 and iterate over an unstrided array.
+  // This only matters if Vc is used, because if the warp array is unstrided,
+  // the coordinates can be loaded more effectively. Note that this method
+  // requires that the hierarchical access goes down all the way to 1D.
+
+  if ( warp.isUnstrided ( 0 ) )
+  {
+    //                                set strided_warp to false vvv
+    typedef warp_generator < dim_target , unary_functor_type , false > gen_t ;  
+    gen_t gen ( warp , ev ) ;  
+    fill < gen_t , dim_target > ( gen , output ) ;
+  }
+  else
+  {
+    // warp array is strided even in dimension 0
+    typedef warp_generator < dim_target , unary_functor_type , true > gen_t ;  
+    gen_t gen ( warp , ev ) ;  
+    fill < gen_t , dim_target > ( gen , output ) ;
+  }
+}
+
+/// we code 'apply' as a special variant of remap where the output
+/// is also used as 'warp', so the effect is to feed the unary functor
+/// each 'output' value in turn, let it process it and store the result
+/// back to the same location.
+
+template < class unary_functor_type , // type satisfying the interface in class unary_functor
+           int dim_target >          // dimension of target array
+void apply ( const unary_functor_type & ev ,
+              MultiArrayView < dim_target ,
+                               typename unary_functor_type::out_type > & output )
+{
+  remap < unary_functor_type , dim_target > ( ev , output , output ) ;
+}
+
+/// This is a variant of remap, which directly takes an array of values and remaps it,
+/// internally creating a b-spline of given order just for the purpose. This is used for
+/// one-shot remaps where the spline isn't reused.
+
+template < typename coordinate_type , // type of coordinates in the warp array
+           typename value_type ,      // type of values to produce
+           int dim_out >              // number of dimensions of warp and output array
+int remap ( const MultiArrayView
+              < vigra::ExpandElementResult < coordinate_type > :: size ,
+                value_type > & input ,
+            const MultiArrayView < dim_out , coordinate_type > & warp ,
+            MultiArrayView < dim_out , value_type > & output ,
+            bcv_type < vigra::ExpandElementResult < coordinate_type > :: size > bcv
+              = bcv_type < vigra::ExpandElementResult < coordinate_type > :: size > ( MIRROR ) ,
+            int degree = 3 )
+{
+  const int dim_in = vigra::ExpandElementResult < coordinate_type > :: size ;
+
+  // check shape compatibility
+  
+  if ( output.shape() != warp.shape() )
+  {
+    throw shape_mismatch ( "remap: the shapes of the warp array and the output array do not match" ) ;
+  }
+
+  // create the bspline object
+  // TODO may want to specify tolerance here instead of using default
+  
+  bspline < value_type , dim_in > bsp ( input.shape() , degree , bcv ) ;
+  
+  // prefilter, taking data in 'input' as knot point data
+  
+  bsp.prefilter ( input ) ;
+
+  // create an evaluator over the bspline
+
+  typedef evaluator < coordinate_type , value_type > evaluator_type ;
+  
+  evaluator_type ev ( bsp ) ;
+  
+  // and call the other remap variant,
+  // passing in the evaluator, the coordinate array and the target array
+  
+  remap < evaluator_type , dim_out > ( ev , warp , output ) ;
+
+  return 0 ;
+}
+
+/// class index_generator provides nD indices as input to it's functor which coincide
+/// with the location in the target array for which the functor is called. The data type
+/// of these indices is derived from the functor's input type. Again we presume that
+/// fill() will recurse until level 0, so index_generator's operator() will only be called
+/// at the loweset level of recursion, and we needn't even define it for higher levels.
+
+template < typename unary_functor_type , int level >
+struct index_generator
+{
+  typedef unary_functor_type functor_type ;
+  
+  typedef typename unary_functor_type::out_type value_type ;
+
+  enum { dimension = unary_functor_type::dim_in } ;
+  
+#ifdef USE_VC
+
+  enum { vsize = unary_functor_type :: vsize } ;
+
+#else
+  
+  enum { vsize = 1 } ;
+
+#endif
+  
+  const unary_functor_type & itp ;
+  const shape_range_type < dimension > range ;
+  
+  const unary_functor_type & get_functor()
+  {
+    return itp ;
+  }
+  
+  index_generator
+    ( const unary_functor_type & _itp ,
+      const shape_range_type < dimension > _range )
+  : itp ( _itp ) ,
+    range ( _range )
+  { } ;
+
+  index_generator < unary_functor_type , level >
+    subrange ( const shape_range_type < dimension > range ) const
+  {
+    return index_generator < unary_functor_type , level >
+             ( itp , range ) ;
+  }
+  
+  index_generator < unary_functor_type , level - 1 >
+    bindOuter ( const int & c ) const
+  {
+    auto slice_start = range[0] , slice_end = range[1] ;
+
+    slice_start [ level ] += c ;
+    slice_end [ level ] = slice_start [ level ] + 1 ;
+    
+    return index_generator < unary_functor_type , level - 1 >
+             ( itp , shape_range_type < dimension > ( slice_start , slice_end ) ) ;
+  }  
+} ;
+
+/// specialization of index_generator for level 0. Here, the indices for all higher
+/// dimensions have been fixed by the hierarchical descent, and we only need to concern
+/// ourselves with the index for dimension 0, and supply the operator() implementations.
+/// Note how we derive the concrete type of index from the functor. This way, whatever
+/// the functor takes is provided with no need of type conversion, which would be necessary
+/// if we'd only produce integral indices here.
+
+template < typename unary_functor_type >
+struct index_generator < unary_functor_type , 0 >
+{
+  typedef unary_functor_type functor_type ;
+  
+  typedef typename unary_functor_type::in_type index_type ;
+  typedef typename unary_functor_type::in_ele_type index_ele_type ;
+  typedef typename unary_functor_type::out_type value_type ;
+
+  enum { dimension = unary_functor_type::dim_in } ;
+  
+#ifdef USE_VC
+
+  enum { vsize = unary_functor_type::vsize } ;
+  typedef typename unary_functor_type::in_v index_v ;
+  typedef typename unary_functor_type::out_v out_v ;
+  typedef typename unary_functor_type::in_ele_v index_ele_v ;
+
+  index_v current_v ; // current vectorized index to feed to functor
+
+#else
+  
+  enum { vsize = 1 } ;
+
+#endif
+  
+  index_type current ; // singular index
+
+  const unary_functor_type & itp ;
+  const shape_range_type < dimension > range ;
+  
+  const unary_functor_type & get_functor()
+  {
+    return itp ;
+  }
+  
+  index_generator
+    ( const unary_functor_type & _itp ,
+      const shape_range_type < dimension > _range
+    )
+  : itp ( _itp ) ,
+    range ( _range )
+  {
+    // initially, set the singular index to the beginning of the range
+    current = index_type ( range[0] ) ;
+    
+#ifdef USE_VC
+    
+    // initialize current_v to hold the first simdized index
+    for ( int d = 0 ; d < dimension ; d++ )
+      current_v[d] = index_ele_v ( range[0][d] ) ;
+    current_v[0] += index_ele_v::IndexesFromZero() ;
+    
+    // if vc is used, the singular index will only be used for mop-up action
+    // after all aggregates have been processed.
+    int size = range[1][0] - range[0][0] ;
+    int aggregates = size / vsize ;
+    current[0] += index_ele_type ( aggregates * vsize ) ; // for mop-up
+
+#endif
+
+  } ;
+
+  /// single-value evaluation. This will be used for all values if vc isn't used,
+  /// or only for mop-up action after all full vectors are processed.
+
+  void operator() ( value_type & target )
+  {
+    itp.eval ( current , target ) ;
+    current[0] += index_ele_type ( 1 ) ;
+  }
+
+#ifdef USE_VC
+ 
+  /// vectorized evaluation. Hierarchical decent has left us with only the
+  /// level0 coordinate to increase, making this code very efficient.
+
+  void operator() ( out_v & target )
+  {
+    itp.eval ( current_v , target ) ;
+    current_v[0] += index_ele_v ( vsize ) ;
+  }
+
+#endif
+
+  /// while we are at the lowest level here, we still need the subrange routine
+  /// for cases where the data are 1D in the first place: in this situation,
+  /// we need to split up the range as well.
+
+  index_generator < unary_functor_type , 0 >
+    subrange ( const shape_range_type < dimension > range ) const
+  {
+    return index_generator < unary_functor_type , 0 >
+             ( itp , range ) ;
+  }
+} ;
+
+/// index_remap() is very similar to remap(), but while remap() picks coordinates from
+/// an array (the 'warp' array), index_remap feeds the discrete coordinates to the
+/// successive places data should be rendered to to the unary_functor_type object.
+/// Since the data type of the coordinates is derived from the unary functor's input type,
+/// index_generator can produce integral and real indices, as needed.
+///
+/// index_remap takes one template argument:
+///
+/// - 'unary_functor_type', which is a class satisfying the interface laid down in
+///   unary_functor.h. This is an object which can provide values given coordinates,
+///   like class evaluator, but generalized to allow for arbitrary ways of achieving
+///   it's goal. The unary functor's in_type determines the number of dimensions of
+///   the indices - since they are indices into the target array, the functor's input
+///   type has to have the same number of dimensions as the target array.
+///
+/// index_remap takes three parameters:
+///
+/// - a reference to a const unary_functor_type object providing the functionality needed
+///   to generate values from coordinates
+///
+/// - a reference to a MultiArrayView to use as a target. This is where the resulting
+///   data are put.
+
+template < class unary_functor_type > // type satisfying the interface in class unary_functor
+void index_remap ( const unary_functor_type & ev ,
+                   MultiArrayView < unary_functor_type::dim_in ,
+                                    typename unary_functor_type::out_type > & output )
+{
+  enum { dim_target = unary_functor_type::dim_in } ;
+  
+  typedef typename unary_functor_type::out_type value_type ;
+  typedef TinyVector < int , dim_target > nd_ic_type ;
+  typedef index_generator < unary_functor_type , dim_target - 1 > gen_t ;
+
+  shape_range_type < dim_target > range ( nd_ic_type() , output.shape() ) ;  
+  gen_t gen ( ev , range ) ;  
+  fill < gen_t , dim_target > ( gen , output ) ;
+}
+
+namespace detail // workhorse code for grid_eval
+{
+// in grid_weight, for every dimension we have a set of ORDER weights
+// for every position in this dimension. in grid_ofs, we have the
+// partial offset for this dimension for every position. these partial
+// offsets are the product of the index for this dimension at the position
+// and the stride for this dimension, so that the sum of the partial
+// offsets for all dimensions yields the offset into the coefficient array
+// to the window of coefficients where the weights are to be applied.
+
+template < typename evaluator_type , int level >
+struct _grid_eval
+{
+  typedef typename evaluator_type::ele_type weight_type ;
+  typedef MultiArrayView < level + 1 , typename evaluator_type::value_type > target_type ;
+  
+  void operator() ( int initial_ofs ,
+                    MultiArrayView < 2 , weight_type > & weight ,
+                    weight_type** const & grid_weight ,
+                    const int & ORDER ,
+                    int ** const & grid_ofs ,
+                    const evaluator_type & itp ,
+                    target_type & result )
+  {
+    for ( int ofs = 0 ; ofs < result.shape ( level ) ; ofs++ )
+    {
+      for ( int e = 0 ; e < ORDER ; e++ )
+        weight [ vigra::Shape2 ( e , level ) ] = grid_weight [ level ] [ ORDER * ofs + e ] ;
+      int cum_ofs = initial_ofs + grid_ofs [ level ] [ ofs ] ;
+      auto region = result.bindAt ( level , ofs ) ;
+      _grid_eval < evaluator_type , level - 1 >()
+        ( cum_ofs , weight , grid_weight , ORDER , grid_ofs , itp , region ) ;
+    }
+  }
+} ;
+
+template < typename evaluator_type >
+struct _grid_eval < evaluator_type , 0 >
+{
+  typedef typename evaluator_type::ele_type weight_type ;
+  typedef MultiArrayView < 1 , typename evaluator_type::value_type > target_type ;
+  
+  void operator() ( int initial_ofs ,
+                    MultiArrayView < 2 , weight_type > & weight ,
+                    weight_type** const & grid_weight ,
+                    const int & ORDER ,
+                    int ** const & grid_ofs ,
+                    const evaluator_type & itp ,
+                    target_type & region )
+  {
+    auto iter = region.begin() ;    
+    int ofs_start = 0 ;
+    
+#ifdef USE_VC
+  
+    // on my system, using clang++, the vectorized code is slightly slower
+    // than the unvectorized code. With g++, the vectorized code is faster
+    // than either clang version, but the unvectorized code is much slower.
+
+    const int vsize = evaluator_type::vsize ;
+    const int channels = evaluator_type::channels ;
+    typedef typename evaluator_type::value_type value_type ;
+    typedef typename evaluator_type::ele_type ele_type ;
+    typedef typename evaluator_type::ic_v ic_v ;
+    typedef typename evaluator_type::ele_v ele_v ;
+    typedef typename evaluator_type::mc_ele_v mc_ele_v ;
+
+    // number of vectorized results
+    int aggregates = region.size() / vsize ;
+    // vectorized weights
+    MultiArray < 2 , ele_v > vweight ( weight.shape() ) ;
+    // vectorized offset
+    ic_v select ;
+    // buffer for target data
+    mc_ele_v vtarget ;
+
+    // initialize the vectorized weights for dimensions > 0
+    for ( int d = 1 ; d < weight.shape(1) ; d++ )
+    {
+      for ( int o = 0 ; o < ORDER ; o++ )
+        vweight [ vigra::Shape2 ( o , d ) ] = weight [ vigra::Shape2 ( o , d ) ] ;
+    }
+
+    // get a pointer to the target array's data (seen as elementary type)
+    ele_type * p_target = (ele_type*) ( region.data() ) ;
+    // and the stride, if any, also in terms of the elementary type, from
+    // one cluster of target data to the next
+    int stride = vsize * channels * region.stride(0) ;
+
+    for ( int a = 0 ; a < aggregates ; a++ )
+    {
+      // gather the individual weights into the vectorized form
+      for ( int o = 0 ; o < ORDER ; o++ )
+      {
+        vweight[ vigra::Shape2 ( o , 0 ) ].gather
+          ( grid_weight [ 0 ] + ORDER * a * vsize ,
+            ORDER * ic_v::IndexesFromZero() + o ) ;
+      }
+      select.load ( grid_ofs [ 0 ] + a * vsize ) ; // get the offsets from grid_ofs
+      select += initial_ofs ; // add cumulated offsets from higher dimensions
+      
+      select *= channels ;    // offsets are in terms of value_type, expand
+
+      // now we can call the vectorized eval routine
+      itp.eval ( select , vweight , vtarget ) ;
+      
+      // finally we scatter the vectorized result to target memory
+      for ( int ch = 0 ; ch < channels ; ch++ )
+        vtarget[ch].scatter ( p_target + ch ,
+                              channels * ic_v::IndexesFromZero() ) ;
+
+      // and set p_target to the next cluster of target values
+      p_target += stride ;
+    }
+    
+    // adapt the iterator into target array
+    iter += aggregates * vsize ;
+    // and the initial offset
+    ofs_start += aggregates * vsize ;
+
+#endif
+    
+    // if Vc wasn't used, we start with ofs = 0 and this loop
+    // does all the processing:
+    for ( int ofs = ofs_start ; ofs < region.shape ( 0 ) ; ofs++ )
+    {
+      for ( int e = 0 ; e < ORDER ; e++ )
+        weight [ vigra::Shape2 ( e , 0 )  ] = grid_weight [ 0 ] [ ORDER * ofs + e ] ;
+      int cum_ofs = initial_ofs + grid_ofs [ 0 ] [ ofs ] ;
+      itp.eval ( cum_ofs , weight , *iter ) ;
+      ++iter ;
+    }
+  }
+} ;
+
+} ; // end of namespace detail
+
+// Here is the single-threaded code for the grid_eval function.
+// The first argument is a shape range, defining the subsets of data
+// to process in a single thread. the remainder are forwards of the
+// arguments to grid_eval, partly as pointers. The call is affected
+// via 'multithread()' which sets up the partitioning and distribution
+// to threads from a thread pool.
+
+template < typename evaluator_type , // b-spline evaluator type
+           int dim_out >             // dimension of target
+void st_grid_eval ( shape_range_type < dim_out > range ,
+                    typename evaluator_type::rc_type ** const _grid_coordinate ,
+                    const evaluator_type * itp ,
+                    MultiArrayView < dim_out , typename evaluator_type::value_type >
+                      * p_result )
+{
+  typedef typename evaluator_type::ele_type weight_type ;
+  typedef typename evaluator_type::rc_type rc_type ;
+  typedef MultiArrayView < dim_out , typename evaluator_type::value_type > target_type ;
+  
+  const int ORDER = itp->get_order() ;
+  
+  // pick the subarray of the 'whole' target array pertaining to this thread's range
+  auto result = p_result->subarray ( range[0] , range[1] ) ;
+  
+  // pick the subset of coordinates pertaining to this thread's range
+  const rc_type * grid_coordinate [ dim_out ] ;
+  for ( int d = 0 ; d < dim_out ; d++ )
+    grid_coordinate[d] = _grid_coordinate[d] + range[0][d] ;
+
+  // set up storage for precalculated weights and offsets
+
+  weight_type * grid_weight [ dim_out ] ;
+  int * grid_ofs [ dim_out ] ;
+  
+  // get some metrics
+  TinyVector < int , dim_out > shape ( result.shape() ) ;
+  TinyVector < int , dim_out > stride ( itp->get_stride() ) ;
+  
+  // allocate space for the per-axis weights and offsets
+  for ( int d = 0 ; d < dim_out ; d++ )
+  {
+    grid_weight[d] = new weight_type [ ORDER * shape [ d ] ] ;
+    grid_ofs[d] = new int [ shape [ d ] ] ;
+  }
+  
+  int select ;
+  rc_type tune ;
+  
+  // fill in the weights and offsets, using the interpolator's split() to split
+  // the coordinates received in grid_coordinate, the interpolator's obtain_weights
+  // method to produce the weight components, and the strides of the coefficient
+  // array to convert the integral parts of the coordinates into offsets.
+
+  for ( int d = 0 ; d < dim_out ; d++ )
+  {
+    for ( int c = 0 ; c < shape [ d ] ; c++ )
+    {
+      itp->split ( grid_coordinate [ d ] [ c ] , select , tune ) ; 
+      itp->obtain_weights ( grid_weight [ d ] + ORDER * c , d , tune ) ;
+      grid_ofs [ d ] [ c ] = select * stride [ d ] ;
+    }
+  }
+  
+  // allocate storage for a set of singular weights
+  MultiArray < 2 , weight_type > weight ( vigra::Shape2 ( ORDER , dim_out ) ) ;
+  
+  // now call the recursive workhorse routine
+  detail::_grid_eval < evaluator_type , dim_out - 1 >()
+   ( 0 , weight , grid_weight , ORDER , grid_ofs , *itp , result ) ;
+
+  // clean up
+  for ( int d = 0 ; d < dim_out ; d++ )
+  {
+    delete[] grid_weight[d] ;
+    delete[] grid_ofs[d] ;
+  }
+  
+}
+
+// this is the multithreaded version of grid_eval, which sets up the
+// full range over 'result' and calls 'multithread' to do the rest
+
+/// grid_eval evaluates a b-spline object
+/// at points whose coordinates are distributed in a grid, so that for
+/// every axis there is a set of as many coordinates as this axis is long,
+/// which will be used in the grid as the coordinate for this axis at the
+/// corresponding position. The resulting coordinate matrix (which remains
+/// implicit) is like a mesh grid made from the per-axis coordinates.
+///
+/// If we have two dimensions and x coordinates x0, x1 and x2, and y
+/// coordinates y0 and y1, the resulting implicit coordinate matrix is
+///
+/// (x0,y0) (x1,y0) (x2,y0)
+///
+/// (x0,y1) (x1,y1) (x2,y1)
+///
+/// since the offsets and weights needed to perform an interpolation
+/// only depend on the coordinates, this highly redundant coordinate array
+/// can be processed more efficiently by precalculating the offset component
+/// and weight component for all axes and then simply permutating them to
+/// obtain the result. Especially for higher-degree and higher-dimensional
+/// splines this saves quite some time, since the generation of weights
+/// is computationally expensive.
+///
+/// grid_eval is useful for generating a scaled representation of the original
+/// data, but when scaling down, aliasing will occur and the data should be
+/// low-pass-filtered adequately before processing. Let me hint here that
+/// low-pass filtering can be achieved by using b-spline reconstruction on
+/// raw data (a 'smoothing spline') - or by prefiltering with exponential
+/// smoothing, which can be activated by passing the 'smoothing' parameter
+/// to the prefiltering routine. Of course any other way of smoothing can
+/// be used just the same, like a Burt filter or Gaussian smoothing.
+///
+/// Note that this code is specific to b-spline evaluators and relies
+/// on evaluator_type offering several b-spline specific methods which
+/// are not present in other interpolators, like split() and
+/// obtain_weights(). Since the weight generation for b-splines can
+/// be done separately for each axis and is a computationally intensive
+/// task, precalculating these per-axis weights makes sense. Coding for
+/// the general case (other interpolators), the only achievement would be
+/// the permutation of the partial coordinates, so little would be gained,
+/// and instead an index_remap where the indices are used to pick up
+/// the coordinates can be written easily: have a unary_functor taking
+/// discrete coordinates, 'loaded' with the per-axis coordinates, and an
+/// eval routine yielding the picked coordinates.
+
+template < typename evaluator_type , // b-spline evaluator
+           int dim_out >             // dimension of target
+void grid_eval ( typename evaluator_type::rc_type ** const grid_coordinate ,
+                 const evaluator_type & itp ,
+                 MultiArrayView < dim_out , typename evaluator_type::value_type >
+                   & result )
+{
+  shape_range_type < dim_out > range ( shape_type < dim_out > () , result.shape() ) ;
+  multithread ( st_grid_eval < evaluator_type , dim_out > ,
+                vspline::partition_to_tiles < dim_out > ,
+                ncores * 8 ,
+                range ,
+                grid_coordinate ,
+                &itp ,
+                &result ) ;
+}
+
+/// grid_eval allows us to code a function to restore the original knot point
+/// date from a bspline. We simply fill in the discrete coordinates into the
+/// grid coordinate vectors and call grid_eval with them.
+/// note that this routine can't operate in-place, so you can't overwrite
+/// a bspline object's core with the restored knot point data, you have to
+/// provide a separate target array.
+/// This routine is potentially faster than running an index_remap with
+/// the same target, due to the precalculated weight components.
+
+template < int dimension , typename value_type , typename rc_type = float >
+void restore ( const vspline::bspline < value_type , dimension > & bspl ,
+               vigra::MultiArrayView < dimension , value_type > & target )
+{
+  if ( target.shape() != bspl.core.shape() )
+    throw shape_mismatch
+     ( "restore: spline's core shape and target array shape must match" ) ;
+    
+  typedef vigra::TinyVector < rc_type , dimension > coordinate_type ;
+  typedef vigra::MultiArrayView < dimension , value_type > target_type ;
+  typedef typename vigra::ExpandElementResult < value_type > :: type weight_type ;
+  
+  // set up the coordinate component vectors
+  rc_type * p_ruler [ dimension ] ;
+  for ( int d = 0 ; d < dimension ; d++ )
+  {
+    p_ruler[d] = new rc_type [ target.shape ( d ) ] ;
+    for ( int i = 0 ; i < target.shape ( d ) ; i++ )
+      p_ruler[d][i] = rc_type(i) ;
+  }
+  
+  typedef vspline::evaluator < coordinate_type , value_type > ev_type ;
+  ev_type ev ( bspl ) ;
+  vspline::grid_eval < ev_type , dimension > // target_type , weight_type , rc_type >
+    ( p_ruler , ev , target ) ;
+
+  for ( int d = 0 ; d < dimension ; d++ )
+    delete[] p_ruler[d] ;
+}
+
+} ; // end of namespace vspline
+
+#endif // VSPLINE_REMAP_H
diff --git a/thread_pool.h b/thread_pool.h
new file mode 100644
index 0000000..01b16e9
--- /dev/null
+++ b/thread_pool.h
@@ -0,0 +1,174 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/// \file thread_pool.h
+///
+/// \brief provides a thread pool for vspline's multithread() routine
+///
+/// class thread_pool aims to provide a simple and straightforward implementation
+/// of a thread pool for multithread() in multithread.h, but the class might find
+/// use elsewhere. The operation is simple:
+///
+/// a set of worker threads is launched who wait for 'tasks', which come in the shape
+/// of std::function<void()>, from a queue. When woken, a worker thread tries to obtain
+/// a task. If it succeeds, the task is executed, and the worker thread tries to get
+/// another task. If none is to be had, it goes to sleep, waiting to be woken once
+/// there are new tasks.
+
+#include <thread>
+#include <mutex>
+#include <queue>
+#include <condition_variable>
+#include <iostream>
+
+namespace vspline
+{
+  
+class thread_pool
+{
+  // used to switch off the worker threads at program termination.
+  // access under task_mutex.
+
+  bool stay_alive = true ;
+
+  // the thread pool itself is held in this variable. The pool
+  // does not change after construction
+
+  std::vector < std::thread * > pool ;
+  
+public:
+
+  // mutex and condition variable for interaction with the task queue
+  // and stay_alive
+
+  std::mutex task_mutex ;
+  std::condition_variable task_cv ;
+  
+  // queue to hold tasks. access under task_mutex
+
+  std::queue < std::function < void() > > task_queue ;
+
+private:
+  
+  /// code to run a worker thread
+  /// We use a thread pool of worker threads. These threads have a very 
+  /// simple cycle: They try and obtain a task (std::function<void()>). 
+  /// If there is one to be had, it is invoked, otherwise they wait on
+  /// task_cv. When woken up, the flag stay_alive is checked, and if it
+  /// is found to be false, the worker thread ends.
+  
+  void worker_thread()
+  {
+    while ( true )
+    {
+      // under task_mutex, check stay_alive and try to obtain a task
+      std::unique_lock<std::mutex> task_lock ( task_mutex ) ;
+
+      if ( ! stay_alive )
+      {
+        task_lock.unlock() ;
+        break ; // die
+      }
+
+      if ( task_queue.size() )
+      {
+        // there are tasks in the queue, take one
+        auto task = task_queue.front() ;
+        task_queue.pop() ;
+        task_lock.unlock() ;
+        // got a task, perform it, then try for another one
+        task() ;
+      }
+      else
+      {
+        // no luck. wait.
+        task_cv.wait ( task_lock ) ; // simply wait, spurious alert is okay
+      }
+      // start next cycle, either after having completed a job
+      // or after having been woken by an alert
+    }
+  }
+
+public:
+  
+  thread_pool ( int nthreads = 4 * std::thread::hardware_concurrency() )
+  {
+    // to launch a thread with a method, we need to bind it to the object:
+    std::function < void() > wf = std::bind ( &thread_pool::worker_thread , this ) ;
+    
+    // now we can fill the pool with worker threads
+    for ( int t = 0 ; t < nthreads ; t++ )
+      pool.push_back ( new std::thread ( wf ) ) ;
+  }
+
+  int get_nthreads() const
+  {
+    return pool.size() ;
+  }
+
+  ~thread_pool()
+  {
+    {
+      // under task_mutex, set stay_alive to false
+      
+      std::lock_guard<std::mutex> task_lock ( task_mutex ) ;
+      stay_alive = false ;      
+    }
+
+    // wake all inactive worker threads,
+    // join all worker threads once they are finished
+
+    task_cv.notify_all() ;
+    
+    for ( auto threadp : pool )
+    {
+      threadp->join() ;
+    }
+    
+    // once all are joined, delete their std::thread object
+
+    for ( auto threadp : pool )
+    {
+      delete threadp ;
+    }
+  }
+} ;
+
+} ; // end of namespace vspline
+
diff --git a/unary_functor.h b/unary_functor.h
new file mode 100644
index 0000000..e981f69
--- /dev/null
+++ b/unary_functor.h
@@ -0,0 +1,421 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file unary_functor.h
+
+    \brief interface definition for unary functors
+
+    vspline's evaluation and remapping code relies on a unary functor template
+    which is used as the base for vspline::evaluator and also constitutes the
+    type of object accepted by most of the functions in remap.h.
+
+    This template produces functors which are meant to yield a single output
+    for a single input, where both the input and output types may be single
+    types or vigra::TinyVectors, and their elementary types may be vectorized.
+    The functors are required to provide methods named eval() which are capable
+    of performing the required functionality. These eval routines take both
+    their input and output by reference - the input is taken by const &, and the
+    output as plain &. The result type of the eval routines is void. While
+    such unary functors can be hand-coded, the class template 'unary_functor'
+    provides services to create such functors in a uniform way, with a specifc
+    system of associated types and some convenience code. Using unary_functor
+    is meant to facilitate the creation of the unary functors used in vspline.
+    
+    Using unary_functor generates objects which can be easily combined into
+    more complex unary functors, a typical use would be to 'chain' two unary_functors,
+    see class template 'chain' below, which also provides an example for the use
+    of unary_functor. While this is currently the only explicitly coded combination,
+    providing more should be easy and the code would be similar to the expression
+    template implementations.
+    
+    vspline's unary functor objects are composed of two components:
+    
+    - class template uf_types, to provide a standard set of types
+    
+    - an evalation policy providing the actual code for the functor
+    
+    the policy defines one or several functions eval() handling the operation
+    at hand. Using the policy approach seems a roundabout way at first, but
+    it makes it quite easy to produce new unary functors. The policy is incorporated
+    into the functor by means of a template template argument. The policy shares
+    the functor's argument and result type and it's vector width. These template
+    arguments are always passed through to the evaluation policy, optionally
+    followed by more class type template arguments, which are also passed via
+    the functor's template argument list, resulting in a reasonably terse notation.
+    Current C++ syntax offers no way to pass on arbitrary variadic template arguments;
+    there is a different syntax for type- and non-type arguments. This is annoying
+    but can't be helped. In class evaluator (eval.h) I work around this problem by
+    creating types from int template arguments using std::numerical_constant.
+*/
+
+#ifndef VSPLINE_UNARY_FUNCTOR_H
+#define VSPLINE_UNARY_FUNCTOR_H
+
+#include <vspline/common.h>
+
+namespace vspline {
+
+/// struct uf_types provides the types used throughout vspline in unary functors.
+/// These types are pulled in with the macro using_unary_functor_types() below.
+/// I keep these type declarations in a separate class because they may be used
+/// by other classes (via inheritance) - especially by evaluation policies which
+/// have to be explicit about their i/o types (simple policies can avoid refering
+/// to the types by using templated eval routines)
+
+template < typename IN ,  // argument or input type, like coordinate for interpolators
+           typename OUT , // result type, like the type of an interpolation result
+           int _vsize >   // vector width. often derived from OUT with vector_traits
+struct uf_types
+{  
+  // number of dimensions. This may well be different for IN and OUT.
+
+  enum { dim_in = vigra::ExpandElementResult < IN > :: size } ;
+  enum { dim_out = vigra::ExpandElementResult < OUT > :: size } ;
+
+  // number of elements in simdized data
+
+  enum { vsize = _vsize } ;
+
+  // typedefs for incoming (argument) and outgoing (result) type
+  
+  typedef IN in_type ;
+  typedef OUT out_type ;
+  
+  // elementary types of same. we rely on vigra's ExpandElementResult mechanism
+  // to provide these types.
+  
+  typedef typename vigra::ExpandElementResult < IN > :: type in_ele_type ;
+  typedef typename vigra::ExpandElementResult < OUT > :: type out_ele_type ;
+  
+#ifdef USE_VC
+  
+  // for vectorized operation, we need a few extra typedefs
+  // I use a _v suffix to indicate vectorized types.
+
+  /// a simdized type of the elementary type of result_type,
+  /// which is used for coefficients and results. this is fixed via
+  /// the traits class vector_traits (in common.h). Note how we derive
+  /// this type using vsize from the template argument, not what
+  /// vspline::vector_traits deems appropriate for ele_type - though
+  /// both numbers will be the same in most cases.
+  
+  typedef typename vector_traits < IN , vsize > :: ele_v in_ele_v ;
+  typedef typename vector_traits < OUT , vsize > :: ele_v out_ele_v ;
+  
+  /// vectorized in_type and out_type. with the current implementation this is
+  /// a vigra::TinyVector of the ele_v types above, which may only have one member.
+  
+  typedef typename vector_traits < IN , vsize > :: type in_v ;
+  typedef typename vector_traits < OUT , vsize > :: type out_v ;
+
+  /// vsize wide vector of ints, used for gather/scatter indexes
+  
+  typedef typename vector_traits < int , vsize > :: ele_v ic_v ;
+
+  /// out_type_of provides the result type of an evaluation given it's argument
+  /// type. With this declaration, operator() can be expressed.
+  
+  template < class in_type >
+  using out_type_of = typename
+    std::conditional
+    < std::is_same
+      < in_type ,
+        in_v
+      > :: value ,
+      out_v ,
+      out_type
+    > :: type ;
+
+#else
+
+  /// for non-vectorized code, out_type_of is much simpler:
+    
+  template < class in_type >
+  using out_type_of = out_type ;
+
+#endif
+
+} ;
+
+/// since we want to use the types in class uf_types mulitply, there are
+/// macros to pull all the types of class uf_types into a class derived
+/// from it. classes which want to use these macros need to invoke them
+/// with the base class as their argument.
+/// for unverctorized operation we have:
+
+#define using_singular_unary_functor_types(base_type) \
+  using typename base_type::in_ele_type ;             \
+  using typename base_type::out_ele_type ;            \
+  using typename base_type::in_type ;                 \
+  using typename base_type::out_type ;                \
+  enum { dim_in = base_type::dim_in } ;               \
+  enum { dim_out = base_type::dim_out } ;
+
+/// for vectorized operation there are a few more types to pull in:
+
+#define using_simdized_unary_functor_types(base_type) \
+  using typename base_type::in_ele_v ;                \
+  using typename base_type::out_ele_v ;               \
+  using typename base_type::in_v ;                    \
+  using typename base_type::out_v ;                   \
+  using typename base_type::ic_v ;
+  
+/// finally a macro automatically pulling in the proper set of type names
+/// depending on USE_VC. This is the macro used throughout. Here, vsize is also
+/// fixed to the base class' value - or to 1 if USE_VC isn't defined.
+
+#ifdef USE_VC
+#define using_unary_functor_types(base_type)    \
+  enum { vsize = base_type::vsize } ;           \
+  using_singular_unary_functor_types(base_type) \
+  using_simdized_unary_functor_types(base_type) 
+#else  
+#define using_unary_functor_types(base_type)    \
+  enum { vsize = 1 } ;                          \
+  using_singular_unary_functor_types(base_type) 
+#endif
+
+/// struct unary_functor combines the type system above and a specific
+/// evaluation policy, which provides the eval() routines needed for a
+/// specific operation. Contrary to my initial implementation prescribing
+/// an interface by defining two pure virtual functions eval() for
+/// unvectorized and vectorized operation, I now rely entirely on the
+/// evaluation policy. This gives more flexibility, because now a specific
+/// implementation of a unary functor can provide only the eval() variants
+/// which will actually be used by calling code, and the policy can also
+/// use templates for the evaluation code, which is helpful if both the
+/// vectorized and unvectorized code are the same.
+///
+/// unary_functor inherits from uf_types to have the eval type system,
+/// and from the specific evaluation policy to have eval routines.
+/// The specification of unary_functor ensures that all components share
+/// the same argument type, result type and vector width, since there is only
+/// one single point where they are introduced, namely as the first three
+/// template arguments in unary_functor's template argument list.
+  
+// TODO I use the same type system in several places, so it's separate, but it's
+// attractive to pull the types right into unary_functor, avoiding the ugly macros.
+// The problem is that evluation policies work with types which are isomorphic
+// to the types in uf_types, but usually more aptly named: where uf_types
+// might have in_type, a policy might have coordinate_type. But the types in
+// uf_types are needed to have a handle on their syntactic function, which is
+// referred to when unary functors are combined (by incorporation or chaining)
+
+// TODO consider a traits class instead. for vectorized types, we have vector_traits
+// already, maybe it would be wise to have something more general like 'eval_traits'
+
+// TODO unary_functor receives trailing template arguments after the policy which
+// are passed through to the policy's template instantiation. This is done via
+// the variadic template argument 'class ... trailing_args'. I'd like to be able to
+// also pass non-type arguments (int, enums...), but afaict the syntax 'typename ...'
+// is specific to type arguments, and I don't know of a general syntax allowing
+// arbitrary template arguments. This shortcoming limits the usefulness of the
+// passing through of the policy's template arguments, but it can be worked around
+// by making types out of the noin-type template arguments, like by using
+// std::numerical_constant
+  
+template < typename argument_type ,            // argument and result type are input and
+           typename result_type ,              // output of eval()
+           int _vsize ,                        // size of vectors to process
+           template < typename at ,            // template template argument passing on
+                      typename rt ,            // argument_type, result_type and _vsize
+                      int vsz ,                // to the evaluation policy
+                      typename ... policy_args // plus 0 ot more types to be passed
+                    > class eval_policy ,      // to the policy additionally
+           typename ... trailing_args >        // these additional types are passed in here
+struct unary_functor
+// inherit from uf_types to get the standard evaluator type system
+: public uf_types < argument_type , result_type , _vsize > ,
+  public eval_policy < argument_type ,      // now here's the inheritance from the policy.
+                       result_type ,        // it takes argument_type, result_type and
+                       _vsize ,             // _vsize plus any trailing args passed to
+                       trailing_args ... >  // unary_functor
+{
+  // pull in the standard evaluator type system. While it would be nice to simply
+  // incorporate these typedefs here, I also use them in other classes.
+  // inheriting by itself does not provide the types from uf_types, they have to be
+  // specifically declared with the using_unary_functor_types() macro above, which
+  // has using declarations making the typedefs from uf_types available without
+  // explicit reference to class uf_types.
+  // an alternative would be to not inherit from uf_types and use typedefs in the
+  // macro to introduce the types, effectively using uf_types as a traits class.
+    
+  typedef uf_types < argument_type , result_type , _vsize > base_type ;
+  using_unary_functor_types ( base_type ) ;
+  
+  // introduce a compact name for the evaluation policy:
+  
+  using evp = eval_policy < argument_type , result_type , _vsize ,
+                            trailing_args ... > ;
+                            
+  /// variadic constructor template. this passes any constructor
+  /// arguments to the evaluation_policy. This looks odd at first, but is necessary:
+  /// the evaluation policy may have state which needs to be initialized, and the mere
+  /// type of the evaluation policy does not contain information on it's constructor
+  /// arguments. But since class unary_functor is a mere shell and does not have
+  /// state specific to itself, we can be certain that all arguments passed to it's
+  /// constructor are 'meant' for it's evaluation policy, and just pass them on. We
+  /// use perfect forwarding to preserve the arguments as best as we can.
+  
+  template < class ... ctor_args >
+  unary_functor ( ctor_args && ... args )
+  : evp ( std::forward<ctor_args> ( args ) ... )
+  { } ;
+
+  // note how unary_functor contains no eval() routines. These are the domain of
+  // the evaluation policy (evp in this case).
+  // We rely on evp providing all eval() variants the calling code actually *uses*
+  // rather than enforcing a specific interface, like by using pure virtual functions
+  // as I did in my initial implementation.
+  
+  // to be able to declare operator(), we need to derive the return type
+  // from the incoming type, using out_type_of from the standard eval type system.
+  // alternatively we could use overloading, but then we need separate definitions
+  // for all cases and we can't use the generic form here:
+
+  // TODO: clarify what to do when 1D values come into play. currently, in_v
+  // and out_v are syntactically nD values with one dimension.
+  
+  template < typename I ,
+             typename O
+               = typename base_type::template out_type_of<I>
+           >
+  O operator() ( const I & i ) const
+  {
+    O result ;
+    evp::eval ( i , result ) ;
+    return result ;
+  }
+} ;
+
+/// class chain is a helper class to easily pass one unary functor's result
+/// as argument to another one. We rely on T1 and T2 to provide a few of the
+/// standard types used in unary functors. Typically, T1 and T2 will both be
+/// vspline::unary_functors, but the type requirements could also be fulfilled
+/// manually.
+/// This class can also serve as an example for a policy which can be used
+/// by a vspline::unary_functor. Such policies have to accept three mandatory
+/// template arguments (argument_type, result_type and vsize), plus optionally
+/// more types which are specific to the policy. Here, the types of the
+/// functors to be chained (T1, T2) are specific to the policy and follow
+/// the standard template arguments.
+
+template < typename argument_type ,   // first the three standard template
+           typename result_type ,     // arguments
+           int vsize ,
+           typename T1 ,              // then the two specific arguments
+           typename T2 >              // for chain_policy
+struct chain_policy
+{
+  // chaining is only allowed if a set of conditions is fulfilled:
+  // T1's input and T2's output type must match argument_type and result_type:
+  
+  static_assert ( std::is_same < argument_type ,
+                                 typename T1::in_type > :: value ,
+                  "can only chain unary functors where argument_type == T1::in_type" ) ;
+
+  static_assert ( std::is_same < result_type ,
+                                 typename T2::out_type > :: value ,
+                  "can only chain unary functors where result_type == T2::out_type" ) ;
+
+  // require a common intermediate type. This requirement is currently omitted
+  // - the spec is widened to allow situations where T2::in_type can be constructed from
+  // T1::out_type.
+
+//   static_assert ( std::is_same < typename T1::out_type ,
+//                                  typename T2::in_type > :: value ,
+//                   "can only chain unary functors where T1::out_type == T2::in_type" ) ;
+
+  // we require both functors to share the same vectorization width
+                  
+  static_assert ( T1::vsize == T2::vsize ,
+                  "can only chain unary functors with the same vector width" ) ;
+
+  // hold the two functors as const references
+
+  const T1 & t1 ;
+  const T2 & t2 ;
+  
+  // the constructor initializes these references
+
+  chain_policy ( const T1 & _t1 ,
+                 const T2 & _t2 )
+  : t1 ( _t1 ) ,
+    t2 ( _t2 )
+    { } ;
+
+  // the actual eval needs a bit of trickery to determine the type of
+  // the intermediate object from the type of the first argument. While
+  // it's possible to specialize the code where I == B to omit the
+  // intermediate object, I rely on the optimizer to do so, since removing
+  // unneccessary intermediates is a trivial task.
+
+  template < typename A ,
+             typename B ,
+             typename I = typename T1::template out_type_of<A> >
+  void eval ( const A & c ,
+                    B & result ) const
+  {
+    I cc ;                    // have an intermediate handy
+    t1.eval ( c , cc ) ;      // evaluate first functor ro it
+    t2.eval ( cc , result ) ; // feed it as input to second functor
+  }
+
+} ;
+
+/// struct chain specializes unary_functor with the chain policy above.
+/// this is also a handy example for the use of unary_functor, demonstrating how
+/// the type handling is reasonably painless using unary_functor. With the
+/// alias declaration below, given two unary functors A and B, we can
+/// instantiate a 'chain' class as simply as vspline::chain < A , B >.
+
+template < typename T1 ,  // 'chain' objects can be instantiated with two
+           typename T2 >  // template arguments
+using chain = typename
+ vspline::unary_functor < typename T1::in_type ,  // in_type and out_type are
+                          typename T2::out_type , // inferred from T1 and T2
+                          T1::vsize ,             // as is the vector width
+                          chain_policy ,          // this is the policy argument
+                          T1 ,                    // and the trailing types are
+                          T2 > ;                  // passed through to the policy
+
+} ; // end of namespace vspline
+
+#endif // VSPLINE_UNARY_FUNCTOR_H
+
diff --git a/vspline.doxy b/vspline.doxy
new file mode 100644
index 0000000..3c10794
--- /dev/null
+++ b/vspline.doxy
@@ -0,0 +1,2303 @@
+# Doxyfile 1.8.6
+
+# This file describes the settings to be used by the documentation system
+# doxygen (www.doxygen.org) for a project.
+#
+# All text after a double hash (##) is considered a comment and is placed in
+# front of the TAG it is preceding.
+#
+# All text after a single hash (#) is considered a comment and will be ignored.
+# The format is:
+# TAG = value [value, ...]
+# For lists, items can also be appended using:
+# TAG += value [value, ...]
+# Values that contain spaces should be placed between quotes (\" \").
+
+#---------------------------------------------------------------------------
+# Project related configuration options
+#---------------------------------------------------------------------------
+
+# This tag specifies the encoding used for all characters in the config file
+# that follow. The default is UTF-8 which is also the encoding used for all text
+# before the first occurrence of this tag. Doxygen uses libiconv (or the iconv
+# built into libc) for the transcoding. See http://www.gnu.org/software/libiconv
+# for the list of possible encodings.
+# The default value is: UTF-8.
+
+DOXYFILE_ENCODING      = UTF-8
+
+# The PROJECT_NAME tag is a single word (or a sequence of words surrounded by
+# double-quotes, unless you are using Doxywizard) that should identify the
+# project for which the documentation is generated. This name is used in the
+# title of most generated pages and in a few other places.
+# The default value is: My Project.
+
+PROJECT_NAME           = "vspline"
+
+# The PROJECT_NUMBER tag can be used to enter a project or revision number. This
+# could be handy for archiving the generated documentation or if some version
+# control system is used.
+
+PROJECT_NUMBER         = 17
+
+# Using the PROJECT_BRIEF tag one can provide an optional one line description
+# for a project that appears at the top of each page and should give viewer a
+# quick idea about the purpose of the project. Keep the description short.
+
+PROJECT_BRIEF          = "Generic C++ Code for Uniform B-Splines"
+
+# With the PROJECT_LOGO tag one can specify an logo or icon that is included in
+# the documentation. The maximum height of the logo should not exceed 55 pixels
+# and the maximum width should not exceed 200 pixels. Doxygen will copy the logo
+# to the output directory.
+
+PROJECT_LOGO           =
+
+# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path
+# into which the generated documentation will be written. If a relative path is
+# entered, it will be relative to the location where doxygen was started. If
+# left blank the current directory will be used.
+
+OUTPUT_DIRECTORY       = ../kfj.bitbucket.org
+
+# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create 4096 sub-
+# directories (in 2 levels) under the output directory of each output format and
+# will distribute the generated files over these directories. Enabling this
+# option can be useful when feeding doxygen a huge amount of source files, where
+# putting all generated files in the same directory would otherwise causes
+# performance problems for the file system.
+# The default value is: NO.
+
+CREATE_SUBDIRS         = NO
+
+# The OUTPUT_LANGUAGE tag is used to specify the language in which all
+# documentation generated by doxygen is written. Doxygen will use this
+# information to generate all constant output in the proper language.
+# Possible values are: Afrikaans, Arabic, Armenian, Brazilian, Catalan, Chinese,
+# Chinese-Traditional, Croatian, Czech, Danish, Dutch, English (United States),
+# Esperanto, Farsi (Persian), Finnish, French, German, Greek, Hungarian,
+# Indonesian, Italian, Japanese, Japanese-en (Japanese with English messages),
+# Korean, Korean-en (Korean with English messages), Latvian, Lithuanian,
+# Macedonian, Norwegian, Persian (Farsi), Polish, Portuguese, Romanian, Russian,
+# Serbian, Serbian-Cyrillic, Slovak, Slovene, Spanish, Swedish, Turkish,
+# Ukrainian and Vietnamese.
+# The default value is: English.
+
+OUTPUT_LANGUAGE        = English
+
+# If the BRIEF_MEMBER_DESC tag is set to YES doxygen will include brief member
+# descriptions after the members that are listed in the file and class
+# documentation (similar to Javadoc). Set to NO to disable this.
+# The default value is: YES.
+
+BRIEF_MEMBER_DESC      = YES
+
+# If the REPEAT_BRIEF tag is set to YES doxygen will prepend the brief
+# description of a member or function before the detailed description
+#
+# Note: If both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the
+# brief descriptions will be completely suppressed.
+# The default value is: YES.
+
+REPEAT_BRIEF           = YES
+
+# This tag implements a quasi-intelligent brief description abbreviator that is
+# used to form the text in various listings. Each string in this list, if found
+# as the leading text of the brief description, will be stripped from the text
+# and the result, after processing the whole list, is used as the annotated
+# text. Otherwise, the brief description is used as-is. If left blank, the
+# following values are used ($name is automatically replaced with the name of
+# the entity):The $name class, The $name widget, The $name file, is, provides,
+# specifies, contains, represents, a, an and the.
+
+ABBREVIATE_BRIEF       =
+
+# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then
+# doxygen will generate a detailed section even if there is only a brief
+# description.
+# The default value is: NO.
+
+ALWAYS_DETAILED_SEC    = NO
+
+# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all
+# inherited members of a class in the documentation of that class as if those
+# members were ordinary class members. Constructors, destructors and assignment
+# operators of the base classes will not be shown.
+# The default value is: NO.
+
+INLINE_INHERITED_MEMB  = NO
+
+# If the FULL_PATH_NAMES tag is set to YES doxygen will prepend the full path
+# before files name in the file list and in the header files. If set to NO the
+# shortest path that makes the file name unique will be used
+# The default value is: YES.
+
+FULL_PATH_NAMES        = YES
+
+# The STRIP_FROM_PATH tag can be used to strip a user-defined part of the path.
+# Stripping is only done if one of the specified strings matches the left-hand
+# part of the path. The tag can be used to show relative paths in the file list.
+# If left blank the directory from which doxygen is run is used as the path to
+# strip.
+#
+# Note that you can specify absolute paths here, but also relative paths, which
+# will be relative from the directory where doxygen is started.
+# This tag requires that the tag FULL_PATH_NAMES is set to YES.
+
+STRIP_FROM_PATH        =
+
+# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the
+# path mentioned in the documentation of a class, which tells the reader which
+# header file to include in order to use a class. If left blank only the name of
+# the header file containing the class definition is used. Otherwise one should
+# specify the list of include paths that are normally passed to the compiler
+# using the -I flag.
+
+STRIP_FROM_INC_PATH    =
+
+# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but
+# less readable) file names. This can be useful is your file systems doesn't
+# support long names like on DOS, Mac, or CD-ROM.
+# The default value is: NO.
+
+SHORT_NAMES            = NO
+
+# If the JAVADOC_AUTOBRIEF tag is set to YES then doxygen will interpret the
+# first line (until the first dot) of a Javadoc-style comment as the brief
+# description. If set to NO, the Javadoc-style will behave just like regular Qt-
+# style comments (thus requiring an explicit @brief command for a brief
+# description.)
+# The default value is: NO.
+
+JAVADOC_AUTOBRIEF      = NO
+
+# If the QT_AUTOBRIEF tag is set to YES then doxygen will interpret the first
+# line (until the first dot) of a Qt-style comment as the brief description. If
+# set to NO, the Qt-style will behave just like regular Qt-style comments (thus
+# requiring an explicit \brief command for a brief description.)
+# The default value is: NO.
+
+QT_AUTOBRIEF           = NO
+
+# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make doxygen treat a
+# multi-line C++ special comment block (i.e. a block of //! or /// comments) as
+# a brief description. This used to be the default behavior. The new default is
+# to treat a multi-line C++ comment block as a detailed description. Set this
+# tag to YES if you prefer the old behavior instead.
+#
+# Note that setting this tag to YES also means that rational rose comments are
+# not recognized any more.
+# The default value is: NO.
+
+MULTILINE_CPP_IS_BRIEF = YES
+
+# If the INHERIT_DOCS tag is set to YES then an undocumented member inherits the
+# documentation from any documented member that it re-implements.
+# The default value is: YES.
+
+INHERIT_DOCS           = YES
+
+# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce a
+# new page for each member. If set to NO, the documentation of a member will be
+# part of the file/class/namespace that contains it.
+# The default value is: NO.
+
+SEPARATE_MEMBER_PAGES  = NO
+
+# The TAB_SIZE tag can be used to set the number of spaces in a tab. Doxygen
+# uses this value to replace tabs by spaces in code fragments.
+# Minimum value: 1, maximum value: 16, default value: 4.
+
+TAB_SIZE               = 4
+
+# This tag can be used to specify a number of aliases that act as commands in
+# the documentation. An alias has the form:
+# name=value
+# For example adding
+# "sideeffect=@par Side Effects:\n"
+# will allow you to put the command \sideeffect (or @sideeffect) in the
+# documentation, which will result in a user-defined paragraph with heading
+# "Side Effects:". You can put \n's in the value part of an alias to insert
+# newlines.
+
+ALIASES                =
+
+# This tag can be used to specify a number of word-keyword mappings (TCL only).
+# A mapping has the form "name=value". For example adding "class=itcl::class"
+# will allow you to use the command class in the itcl::class meaning.
+
+TCL_SUBST              =
+
+# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources
+# only. Doxygen will then generate output that is more tailored for C. For
+# instance, some of the names that are used will be different. The list of all
+# members will be omitted, etc.
+# The default value is: NO.
+
+OPTIMIZE_OUTPUT_FOR_C  = NO
+
+# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or
+# Python sources only. Doxygen will then generate output that is more tailored
+# for that language. For instance, namespaces will be presented as packages,
+# qualified scopes will look different, etc.
+# The default value is: NO.
+
+OPTIMIZE_OUTPUT_JAVA   = NO
+
+# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran
+# sources. Doxygen will then generate output that is tailored for Fortran.
+# The default value is: NO.
+
+OPTIMIZE_FOR_FORTRAN   = NO
+
+# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL
+# sources. Doxygen will then generate output that is tailored for VHDL.
+# The default value is: NO.
+
+OPTIMIZE_OUTPUT_VHDL   = NO
+
+# Doxygen selects the parser to use depending on the extension of the files it
+# parses. With this tag you can assign which parser to use for a given
+# extension. Doxygen has a built-in mapping, but you can override or extend it
+# using this tag. The format is ext=language, where ext is a file extension, and
+# language is one of the parsers supported by doxygen: IDL, Java, Javascript,
+# C#, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL. For instance to make
+# doxygen treat .inc files as Fortran files (default is PHP), and .f files as C
+# (default is Fortran), use: inc=Fortran f=C.
+#
+# Note For files without extension you can use no_extension as a placeholder.
+#
+# Note that for custom extensions you also need to set FILE_PATTERNS otherwise
+# the files are not read by doxygen.
+
+EXTENSION_MAPPING      =
+
+# If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments
+# according to the Markdown format, which allows for more readable
+# documentation. See http://daringfireball.net/projects/markdown/ for details.
+# The output of markdown processing is further processed by doxygen, so you can
+# mix doxygen, HTML, and XML commands with Markdown formatting. Disable only in
+# case of backward compatibilities issues.
+# The default value is: YES.
+
+MARKDOWN_SUPPORT       = YES
+
+# When enabled doxygen tries to link words that correspond to documented
+# classes, or namespaces to their corresponding documentation. Such a link can
+# be prevented in individual cases by by putting a % sign in front of the word
+# or globally by setting AUTOLINK_SUPPORT to NO.
+# The default value is: YES.
+
+AUTOLINK_SUPPORT       = YES
+
+# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want
+# to include (a tag file for) the STL sources as input, then you should set this
+# tag to YES in order to let doxygen match functions declarations and
+# definitions whose arguments contain STL classes (e.g. func(std::string);
+# versus func(std::string) {}). This also make the inheritance and collaboration
+# diagrams that involve STL classes more complete and accurate.
+# The default value is: NO.
+
+BUILTIN_STL_SUPPORT    = NO
+
+# If you use Microsoft's C++/CLI language, you should set this option to YES to
+# enable parsing support.
+# The default value is: NO.
+
+CPP_CLI_SUPPORT        = NO
+
+# Set the SIP_SUPPORT tag to YES if your project consists of sip (see:
+# http://www.riverbankcomputing.co.uk/software/sip/intro) sources only. Doxygen
+# will parse them like normal C++ but will assume all classes use public instead
+# of private inheritance when no explicit protection keyword is present.
+# The default value is: NO.
+
+SIP_SUPPORT            = NO
+
+# For Microsoft's IDL there are propget and propput attributes to indicate
+# getter and setter methods for a property. Setting this option to YES will make
+# doxygen to replace the get and set methods by a property in the documentation.
+# This will only work if the methods are indeed getting or setting a simple
+# type. If this is not the case, or you want to show the methods anyway, you
+# should set this option to NO.
+# The default value is: YES.
+
+IDL_PROPERTY_SUPPORT   = YES
+
+# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC
+# tag is set to YES, then doxygen will reuse the documentation of the first
+# member in the group (if any) for the other members of the group. By default
+# all members of a group must be documented explicitly.
+# The default value is: NO.
+
+DISTRIBUTE_GROUP_DOC   = NO
+
+# Set the SUBGROUPING tag to YES to allow class member groups of the same type
+# (for instance a group of public functions) to be put as a subgroup of that
+# type (e.g. under the Public Functions section). Set it to NO to prevent
+# subgrouping. Alternatively, this can be done per class using the
+# \nosubgrouping command.
+# The default value is: YES.
+
+SUBGROUPING            = YES
+
+# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and unions
+# are shown inside the group in which they are included (e.g. using \ingroup)
+# instead of on a separate page (for HTML and Man pages) or section (for LaTeX
+# and RTF).
+#
+# Note that this feature does not work in combination with
+# SEPARATE_MEMBER_PAGES.
+# The default value is: NO.
+
+INLINE_GROUPED_CLASSES = NO
+
+# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and unions
+# with only public data fields or simple typedef fields will be shown inline in
+# the documentation of the scope in which they are defined (i.e. file,
+# namespace, or group documentation), provided this scope is documented. If set
+# to NO, structs, classes, and unions are shown on a separate page (for HTML and
+# Man pages) or section (for LaTeX and RTF).
+# The default value is: NO.
+
+INLINE_SIMPLE_STRUCTS  = NO
+
+# When TYPEDEF_HIDES_STRUCT tag is enabled, a typedef of a struct, union, or
+# enum is documented as struct, union, or enum with the name of the typedef. So
+# typedef struct TypeS {} TypeT, will appear in the documentation as a struct
+# with name TypeT. When disabled the typedef will appear as a member of a file,
+# namespace, or class. And the struct will be named TypeS. This can typically be
+# useful for C code in case the coding convention dictates that all compound
+# types are typedef'ed and only the typedef is referenced, never the tag name.
+# The default value is: NO.
+
+TYPEDEF_HIDES_STRUCT   = NO
+
+# The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This
+# cache is used to resolve symbols given their name and scope. Since this can be
+# an expensive process and often the same symbol appears multiple times in the
+# code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small
+# doxygen will become slower. If the cache is too large, memory is wasted. The
+# cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range
+# is 0..9, the default is 0, corresponding to a cache size of 2^16=65536
+# symbols. At the end of a run doxygen will report the cache usage and suggest
+# the optimal cache size from a speed point of view.
+# Minimum value: 0, maximum value: 9, default value: 0.
+
+LOOKUP_CACHE_SIZE      = 0
+
+#---------------------------------------------------------------------------
+# Build related configuration options
+#---------------------------------------------------------------------------
+
+# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in
+# documentation are documented, even if no documentation was available. Private
+# class members and static file members will be hidden unless the
+# EXTRACT_PRIVATE respectively EXTRACT_STATIC tags are set to YES.
+# Note: This will also disable the warnings about undocumented members that are
+# normally produced when WARNINGS is set to YES.
+# The default value is: NO.
+
+EXTRACT_ALL            = NO
+
+# If the EXTRACT_PRIVATE tag is set to YES all private members of a class will
+# be included in the documentation.
+# The default value is: NO.
+
+EXTRACT_PRIVATE        = NO
+
+# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal
+# scope will be included in the documentation.
+# The default value is: NO.
+
+EXTRACT_PACKAGE        = NO
+
+# If the EXTRACT_STATIC tag is set to YES all static members of a file will be
+# included in the documentation.
+# The default value is: NO.
+
+EXTRACT_STATIC         = NO
+
+# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) defined
+# locally in source files will be included in the documentation. If set to NO
+# only classes defined in header files are included. Does not have any effect
+# for Java sources.
+# The default value is: YES.
+
+EXTRACT_LOCAL_CLASSES  = YES
+
+# This flag is only useful for Objective-C code. When set to YES local methods,
+# which are defined in the implementation section but not in the interface are
+# included in the documentation. If set to NO only methods in the interface are
+# included.
+# The default value is: NO.
+
+EXTRACT_LOCAL_METHODS  = NO
+
+# If this flag is set to YES, the members of anonymous namespaces will be
+# extracted and appear in the documentation as a namespace called
+# 'anonymous_namespace{file}', where file will be replaced with the base name of
+# the file that contains the anonymous namespace. By default anonymous namespace
+# are hidden.
+# The default value is: NO.
+
+EXTRACT_ANON_NSPACES   = NO
+
+# If the HIDE_UNDOC_MEMBERS tag is set to YES, doxygen will hide all
+# undocumented members inside documented classes or files. If set to NO these
+# members will be included in the various overviews, but no documentation
+# section is generated. This option has no effect if EXTRACT_ALL is enabled.
+# The default value is: NO.
+
+HIDE_UNDOC_MEMBERS     = NO
+
+# If the HIDE_UNDOC_CLASSES tag is set to YES, doxygen will hide all
+# undocumented classes that are normally visible in the class hierarchy. If set
+# to NO these classes will be included in the various overviews. This option has
+# no effect if EXTRACT_ALL is enabled.
+# The default value is: NO.
+
+HIDE_UNDOC_CLASSES     = NO
+
+# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend
+# (class|struct|union) declarations. If set to NO these declarations will be
+# included in the documentation.
+# The default value is: NO.
+
+HIDE_FRIEND_COMPOUNDS  = NO
+
+# If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any
+# documentation blocks found inside the body of a function. If set to NO these
+# blocks will be appended to the function's detailed documentation block.
+# The default value is: NO.
+
+HIDE_IN_BODY_DOCS      = NO
+
+# The INTERNAL_DOCS tag determines if documentation that is typed after a
+# \internal command is included. If the tag is set to NO then the documentation
+# will be excluded. Set it to YES to include the internal documentation.
+# The default value is: NO.
+
+INTERNAL_DOCS          = NO
+
+# If the CASE_SENSE_NAMES tag is set to NO then doxygen will only generate file
+# names in lower-case letters. If set to YES upper-case letters are also
+# allowed. This is useful if you have classes or files whose names only differ
+# in case and if your file system supports case sensitive file names. Windows
+# and Mac users are advised to set this option to NO.
+# The default value is: system dependent.
+
+CASE_SENSE_NAMES       = YES
+
+# If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with
+# their full class and namespace scopes in the documentation. If set to YES the
+# scope will be hidden.
+# The default value is: NO.
+
+HIDE_SCOPE_NAMES       = NO
+
+# If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of
+# the files that are included by a file in the documentation of that file.
+# The default value is: YES.
+
+SHOW_INCLUDE_FILES     = YES
+
+# If the SHOW_GROUPED_MEMB_INC tag is set to YES then Doxygen will add for each
+# grouped member an include statement to the documentation, telling the reader
+# which file to include in order to use the member.
+# The default value is: NO.
+
+SHOW_GROUPED_MEMB_INC  = NO
+
+# If the FORCE_LOCAL_INCLUDES tag is set to YES then doxygen will list include
+# files with double quotes in the documentation rather than with sharp brackets.
+# The default value is: NO.
+
+FORCE_LOCAL_INCLUDES   = NO
+
+# If the INLINE_INFO tag is set to YES then a tag [inline] is inserted in the
+# documentation for inline members.
+# The default value is: YES.
+
+INLINE_INFO            = YES
+
+# If the SORT_MEMBER_DOCS tag is set to YES then doxygen will sort the
+# (detailed) documentation of file and class members alphabetically by member
+# name. If set to NO the members will appear in declaration order.
+# The default value is: YES.
+
+SORT_MEMBER_DOCS       = YES
+
+# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the brief
+# descriptions of file, namespace and class members alphabetically by member
+# name. If set to NO the members will appear in declaration order. Note that
+# this will also influence the order of the classes in the class list.
+# The default value is: NO.
+
+SORT_BRIEF_DOCS        = NO
+
+# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen will sort the
+# (brief and detailed) documentation of class members so that constructors and
+# destructors are listed first. If set to NO the constructors will appear in the
+# respective orders defined by SORT_BRIEF_DOCS and SORT_MEMBER_DOCS.
+# Note: If SORT_BRIEF_DOCS is set to NO this option is ignored for sorting brief
+# member documentation.
+# Note: If SORT_MEMBER_DOCS is set to NO this option is ignored for sorting
+# detailed member documentation.
+# The default value is: NO.
+
+SORT_MEMBERS_CTORS_1ST = NO
+
+# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the hierarchy
+# of group names into alphabetical order. If set to NO the group names will
+# appear in their defined order.
+# The default value is: NO.
+
+SORT_GROUP_NAMES       = NO
+
+# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be sorted by
+# fully-qualified names, including namespaces. If set to NO, the class list will
+# be sorted only by class name, not including the namespace part.
+# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES.
+# Note: This option applies only to the class list, not to the alphabetical
+# list.
+# The default value is: NO.
+
+SORT_BY_SCOPE_NAME     = NO
+
+# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to do proper
+# type resolution of all parameters of a function it will reject a match between
+# the prototype and the implementation of a member function even if there is
+# only one candidate or it is obvious which candidate to choose by doing a
+# simple string match. By disabling STRICT_PROTO_MATCHING doxygen will still
+# accept a match between prototype and implementation in such cases.
+# The default value is: NO.
+
+STRICT_PROTO_MATCHING  = NO
+
+# The GENERATE_TODOLIST tag can be used to enable ( YES) or disable ( NO) the
+# todo list. This list is created by putting \todo commands in the
+# documentation.
+# The default value is: YES.
+
+GENERATE_TODOLIST      = YES
+
+# The GENERATE_TESTLIST tag can be used to enable ( YES) or disable ( NO) the
+# test list. This list is created by putting \test commands in the
+# documentation.
+# The default value is: YES.
+
+GENERATE_TESTLIST      = YES
+
+# The GENERATE_BUGLIST tag can be used to enable ( YES) or disable ( NO) the bug
+# list. This list is created by putting \bug commands in the documentation.
+# The default value is: YES.
+
+GENERATE_BUGLIST       = YES
+
+# The GENERATE_DEPRECATEDLIST tag can be used to enable ( YES) or disable ( NO)
+# the deprecated list. This list is created by putting \deprecated commands in
+# the documentation.
+# The default value is: YES.
+
+GENERATE_DEPRECATEDLIST= YES
+
+# The ENABLED_SECTIONS tag can be used to enable conditional documentation
+# sections, marked by \if <section_label> ... \endif and \cond <section_label>
+# ... \endcond blocks.
+
+ENABLED_SECTIONS       =
+
+# The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the
+# initial value of a variable or macro / define can have for it to appear in the
+# documentation. If the initializer consists of more lines than specified here
+# it will be hidden. Use a value of 0 to hide initializers completely. The
+# appearance of the value of individual variables and macros / defines can be
+# controlled using \showinitializer or \hideinitializer command in the
+# documentation regardless of this setting.
+# Minimum value: 0, maximum value: 10000, default value: 30.
+
+MAX_INITIALIZER_LINES  = 30
+
+# Set the SHOW_USED_FILES tag to NO to disable the list of files generated at
+# the bottom of the documentation of classes and structs. If set to YES the list
+# will mention the files that were used to generate the documentation.
+# The default value is: YES.
+
+SHOW_USED_FILES        = YES
+
+# Set the SHOW_FILES tag to NO to disable the generation of the Files page. This
+# will remove the Files entry from the Quick Index and from the Folder Tree View
+# (if specified).
+# The default value is: YES.
+
+SHOW_FILES             = YES
+
+# Set the SHOW_NAMESPACES tag to NO to disable the generation of the Namespaces
+# page. This will remove the Namespaces entry from the Quick Index and from the
+# Folder Tree View (if specified).
+# The default value is: YES.
+
+SHOW_NAMESPACES        = YES
+
+# The FILE_VERSION_FILTER tag can be used to specify a program or script that
+# doxygen should invoke to get the current version for each file (typically from
+# the version control system). Doxygen will invoke the program by executing (via
+# popen()) the command command input-file, where command is the value of the
+# FILE_VERSION_FILTER tag, and input-file is the name of an input file provided
+# by doxygen. Whatever the program writes to standard output is used as the file
+# version. For an example see the documentation.
+
+FILE_VERSION_FILTER    =
+
+# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed
+# by doxygen. The layout file controls the global structure of the generated
+# output files in an output format independent way. To create the layout file
+# that represents doxygen's defaults, run doxygen with the -l option. You can
+# optionally specify a file name after the option, if omitted DoxygenLayout.xml
+# will be used as the name of the layout file.
+#
+# Note that if you run doxygen from a directory containing a file called
+# DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE
+# tag is left empty.
+
+LAYOUT_FILE            =
+
+# The CITE_BIB_FILES tag can be used to specify one or more bib files containing
+# the reference definitions. This must be a list of .bib files. The .bib
+# extension is automatically appended if omitted. This requires the bibtex tool
+# to be installed. See also http://en.wikipedia.org/wiki/BibTeX for more info.
+# For LaTeX the style of the bibliography can be controlled using
+# LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the
+# search path. Do not use file names with spaces, bibtex cannot handle them. See
+# also \cite for info how to create references.
+
+CITE_BIB_FILES         =
+
+#---------------------------------------------------------------------------
+# Configuration options related to warning and progress messages
+#---------------------------------------------------------------------------
+
+# The QUIET tag can be used to turn on/off the messages that are generated to
+# standard output by doxygen. If QUIET is set to YES this implies that the
+# messages are off.
+# The default value is: NO.
+
+QUIET                  = NO
+
+# The WARNINGS tag can be used to turn on/off the warning messages that are
+# generated to standard error ( stderr) by doxygen. If WARNINGS is set to YES
+# this implies that the warnings are on.
+#
+# Tip: Turn warnings on while writing the documentation.
+# The default value is: YES.
+
+WARNINGS               = YES
+
+# If the WARN_IF_UNDOCUMENTED tag is set to YES, then doxygen will generate
+# warnings for undocumented members. If EXTRACT_ALL is set to YES then this flag
+# will automatically be disabled.
+# The default value is: YES.
+
+WARN_IF_UNDOCUMENTED   = YES
+
+# If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for
+# potential errors in the documentation, such as not documenting some parameters
+# in a documented function, or documenting parameters that don't exist or using
+# markup commands wrongly.
+# The default value is: YES.
+
+WARN_IF_DOC_ERROR      = YES
+
+# This WARN_NO_PARAMDOC option can be enabled to get warnings for functions that
+# are documented, but have no documentation for their parameters or return
+# value. If set to NO doxygen will only warn about wrong or incomplete parameter
+# documentation, but not about the absence of documentation.
+# The default value is: NO.
+
+WARN_NO_PARAMDOC       = NO
+
+# The WARN_FORMAT tag determines the format of the warning messages that doxygen
+# can produce. The string should contain the $file, $line, and $text tags, which
+# will be replaced by the file and line number from which the warning originated
+# and the warning text. Optionally the format may contain $version, which will
+# be replaced by the version of the file (if it could be obtained via
+# FILE_VERSION_FILTER)
+# The default value is: $file:$line: $text.
+
+WARN_FORMAT            = "$file:$line: $text"
+
+# The WARN_LOGFILE tag can be used to specify a file to which warning and error
+# messages should be written. If left blank the output is written to standard
+# error (stderr).
+
+WARN_LOGFILE           =
+
+#---------------------------------------------------------------------------
+# Configuration options related to the input files
+#---------------------------------------------------------------------------
+
+# The INPUT tag is used to specify the files and/or directories that contain
+# documented source files. You may enter file names like myfile.cpp or
+# directories like /usr/src/myproject. Separate the files or directories with
+# spaces.
+# Note: If this tag is empty the current directory is searched.
+
+INPUT                  = . example
+
+# This tag can be used to specify the character encoding of the source files
+# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
+# libiconv (or the iconv built into libc) for the transcoding. See the libiconv
+# documentation (see: http://www.gnu.org/software/libiconv) for the list of
+# possible encodings.
+# The default value is: UTF-8.
+
+INPUT_ENCODING         = UTF-8
+
+# If the value of the INPUT tag contains directories, you can use the
+# FILE_PATTERNS tag to specify one or more wildcard patterns (like *.cpp and
+# *.h) to filter out the source-files in the directories. If left blank the
+# following patterns are tested:*.c, *.cc, *.cxx, *.cpp, *.c++, *.java, *.ii,
+# *.ixx, *.ipp, *.i++, *.inl, *.idl, *.ddl, *.odl, *.h, *.hh, *.hxx, *.hpp,
+# *.h++, *.cs, *.d, *.php, *.php4, *.php5, *.phtml, *.inc, *.m, *.markdown,
+# *.md, *.mm, *.dox, *.py, *.f90, *.f, *.for, *.tcl, *.vhd, *.vhdl, *.ucf,
+# *.qsf, *.as and *.js.
+
+FILE_PATTERNS          =
+
+# The RECURSIVE tag can be used to specify whether or not subdirectories should
+# be searched for input files as well.
+# The default value is: NO.
+
+RECURSIVE              = YES
+
+# The EXCLUDE tag can be used to specify files and/or directories that should be
+# excluded from the INPUT source files. This way you can easily exclude a
+# subdirectory from a directory tree whose root is specified with the INPUT tag.
+#
+# Note that relative paths are relative to the directory from which doxygen is
+# run.
+
+EXCLUDE                =
+
+# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or
+# directories that are symbolic links (a Unix file system feature) are excluded
+# from the input.
+# The default value is: NO.
+
+EXCLUDE_SYMLINKS       = NO
+
+# If the value of the INPUT tag contains directories, you can use the
+# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude
+# certain files from those directories.
+#
+# Note that the wildcards are matched against the file with absolute path, so to
+# exclude all test directories for example use the pattern */test/*
+
+EXCLUDE_PATTERNS       =
+
+# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names
+# (namespaces, classes, functions, etc.) that should be excluded from the
+# output. The symbol name can be a fully qualified name, a word, or if the
+# wildcard * is used, a substring. Examples: ANamespace, AClass,
+# AClass::ANamespace, ANamespace::*Test
+#
+# Note that the wildcards are matched against the file with absolute path, so to
+# exclude all test directories use the pattern */test/*
+
+EXCLUDE_SYMBOLS        =
+
+# The EXAMPLE_PATH tag can be used to specify one or more files or directories
+# that contain example code fragments that are included (see the \include
+# command).
+
+EXAMPLE_PATH           =
+
+# If the value of the EXAMPLE_PATH tag contains directories, you can use the
+# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and
+# *.h) to filter out the source-files in the directories. If left blank all
+# files are included.
+
+EXAMPLE_PATTERNS       =
+
+# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be
+# searched for input files to be used with the \include or \dontinclude commands
+# irrespective of the value of the RECURSIVE tag.
+# The default value is: NO.
+
+EXAMPLE_RECURSIVE      = NO
+
+# The IMAGE_PATH tag can be used to specify one or more files or directories
+# that contain images that are to be included in the documentation (see the
+# \image command).
+
+IMAGE_PATH             =
+
+# The INPUT_FILTER tag can be used to specify a program that doxygen should
+# invoke to filter for each input file. Doxygen will invoke the filter program
+# by executing (via popen()) the command:
+#
+# <filter> <input-file>
+#
+# where <filter> is the value of the INPUT_FILTER tag, and <input-file> is the
+# name of an input file. Doxygen will then use the output that the filter
+# program writes to standard output. If FILTER_PATTERNS is specified, this tag
+# will be ignored.
+#
+# Note that the filter must not add or remove lines; it is applied before the
+# code is scanned, but not when the output code is generated. If lines are added
+# or removed, the anchors will not be placed correctly.
+
+INPUT_FILTER           =
+
+# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern
+# basis. Doxygen will compare the file name with each pattern and apply the
+# filter if there is a match. The filters are a list of the form: pattern=filter
+# (like *.cpp=my_cpp_filter). See INPUT_FILTER for further information on how
+# filters are used. If the FILTER_PATTERNS tag is empty or if none of the
+# patterns match the file name, INPUT_FILTER is applied.
+
+FILTER_PATTERNS        =
+
+# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using
+# INPUT_FILTER ) will also be used to filter the input files that are used for
+# producing the source files to browse (i.e. when SOURCE_BROWSER is set to YES).
+# The default value is: NO.
+
+FILTER_SOURCE_FILES    = NO
+
+# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file
+# pattern. A pattern will override the setting for FILTER_PATTERN (if any) and
+# it is also possible to disable source filtering for a specific pattern using
+# *.ext= (so without naming a filter).
+# This tag requires that the tag FILTER_SOURCE_FILES is set to YES.
+
+FILTER_SOURCE_PATTERNS =
+
+# If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that
+# is part of the input, its contents will be placed on the main page
+# (index.html). This can be useful if you have a project on for instance GitHub
+# and want to reuse the introduction page also for the doxygen output.
+
+USE_MDFILE_AS_MAINPAGE =
+
+#---------------------------------------------------------------------------
+# Configuration options related to source browsing
+#---------------------------------------------------------------------------
+
+# If the SOURCE_BROWSER tag is set to YES then a list of source files will be
+# generated. Documented entities will be cross-referenced with these sources.
+#
+# Note: To get rid of all source code in the generated output, make sure that
+# also VERBATIM_HEADERS is set to NO.
+# The default value is: NO.
+
+SOURCE_BROWSER         = NO
+
+# Setting the INLINE_SOURCES tag to YES will include the body of functions,
+# classes and enums directly into the documentation.
+# The default value is: NO.
+
+INLINE_SOURCES         = NO
+
+# Setting the STRIP_CODE_COMMENTS tag to YES will instruct doxygen to hide any
+# special comment blocks from generated source code fragments. Normal C, C++ and
+# Fortran comments will always remain visible.
+# The default value is: YES.
+
+STRIP_CODE_COMMENTS    = YES
+
+# If the REFERENCED_BY_RELATION tag is set to YES then for each documented
+# function all documented functions referencing it will be listed.
+# The default value is: NO.
+
+REFERENCED_BY_RELATION = NO
+
+# If the REFERENCES_RELATION tag is set to YES then for each documented function
+# all documented entities called/used by that function will be listed.
+# The default value is: NO.
+
+REFERENCES_RELATION    = NO
+
+# If the REFERENCES_LINK_SOURCE tag is set to YES and SOURCE_BROWSER tag is set
+# to YES, then the hyperlinks from functions in REFERENCES_RELATION and
+# REFERENCED_BY_RELATION lists will link to the source code. Otherwise they will
+# link to the documentation.
+# The default value is: YES.
+
+REFERENCES_LINK_SOURCE = YES
+
+# If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the
+# source code will show a tooltip with additional information such as prototype,
+# brief description and links to the definition and documentation. Since this
+# will make the HTML file larger and loading of large files a bit slower, you
+# can opt to disable this feature.
+# The default value is: YES.
+# This tag requires that the tag SOURCE_BROWSER is set to YES.
+
+SOURCE_TOOLTIPS        = YES
+
+# If the USE_HTAGS tag is set to YES then the references to source code will
+# point to the HTML generated by the htags(1) tool instead of doxygen built-in
+# source browser. The htags tool is part of GNU's global source tagging system
+# (see http://www.gnu.org/software/global/global.html). You will need version
+# 4.8.6 or higher.
+#
+# To use it do the following:
+# - Install the latest version of global
+# - Enable SOURCE_BROWSER and USE_HTAGS in the config file
+# - Make sure the INPUT points to the root of the source tree
+# - Run doxygen as normal
+#
+# Doxygen will invoke htags (and that will in turn invoke gtags), so these
+# tools must be available from the command line (i.e. in the search path).
+#
+# The result: instead of the source browser generated by doxygen, the links to
+# source code will now point to the output of htags.
+# The default value is: NO.
+# This tag requires that the tag SOURCE_BROWSER is set to YES.
+
+USE_HTAGS              = NO
+
+# If the VERBATIM_HEADERS tag is set the YES then doxygen will generate a
+# verbatim copy of the header file for each class for which an include is
+# specified. Set to NO to disable this.
+# See also: Section \class.
+# The default value is: YES.
+
+VERBATIM_HEADERS       = YES
+
+#---------------------------------------------------------------------------
+# Configuration options related to the alphabetical class index
+#---------------------------------------------------------------------------
+
+# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index of all
+# compounds will be generated. Enable this if the project contains a lot of
+# classes, structs, unions or interfaces.
+# The default value is: YES.
+
+ALPHABETICAL_INDEX     = YES
+
+# The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in
+# which the alphabetical index list will be split.
+# Minimum value: 1, maximum value: 20, default value: 5.
+# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.
+
+COLS_IN_ALPHA_INDEX    = 5
+
+# In case all classes in a project start with a common prefix, all classes will
+# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag
+# can be used to specify a prefix (or a list of prefixes) that should be ignored
+# while generating the index headers.
+# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.
+
+IGNORE_PREFIX          =
+
+#---------------------------------------------------------------------------
+# Configuration options related to the HTML output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_HTML tag is set to YES doxygen will generate HTML output
+# The default value is: YES.
+
+GENERATE_HTML          = YES
+
+# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. If a
+# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
+# it.
+# The default directory is: html.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_OUTPUT            = .
+
+# The HTML_FILE_EXTENSION tag can be used to specify the file extension for each
+# generated HTML page (for example: .htm, .php, .asp).
+# The default value is: .html.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_FILE_EXTENSION    = .html
+
+# The HTML_HEADER tag can be used to specify a user-defined HTML header file for
+# each generated HTML page. If the tag is left blank doxygen will generate a
+# standard header.
+#
+# To get valid HTML the header file that includes any scripts and style sheets
+# that doxygen needs, which is dependent on the configuration options used (e.g.
+# the setting GENERATE_TREEVIEW). It is highly recommended to start with a
+# default header using
+# doxygen -w html new_header.html new_footer.html new_stylesheet.css
+# YourConfigFile
+# and then modify the file new_header.html. See also section "Doxygen usage"
+# for information on how to generate the default header that doxygen normally
+# uses.
+# Note: The header is subject to change so you typically have to regenerate the
+# default header when upgrading to a newer version of doxygen. For a description
+# of the possible markers and block names see the documentation.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_HEADER            =
+
+# The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each
+# generated HTML page. If the tag is left blank doxygen will generate a standard
+# footer. See HTML_HEADER for more information on how to generate a default
+# footer and what special commands can be used inside the footer. See also
+# section "Doxygen usage" for information on how to generate the default footer
+# that doxygen normally uses.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_FOOTER            =
+
+# The HTML_STYLESHEET tag can be used to specify a user-defined cascading style
+# sheet that is used by each HTML page. It can be used to fine-tune the look of
+# the HTML output. If left blank doxygen will generate a default style sheet.
+# See also section "Doxygen usage" for information on how to generate the style
+# sheet that doxygen normally uses.
+# Note: It is recommended to use HTML_EXTRA_STYLESHEET instead of this tag, as
+# it is more robust and this tag (HTML_STYLESHEET) will in the future become
+# obsolete.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_STYLESHEET        =
+
+# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional user-
+# defined cascading style sheet that is included after the standard style sheets
+# created by doxygen. Using this option one can overrule certain style aspects.
+# This is preferred over using HTML_STYLESHEET since it does not replace the
+# standard style sheet and is therefor more robust against future updates.
+# Doxygen will copy the style sheet file to the output directory. For an example
+# see the documentation.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_EXTRA_STYLESHEET  =
+
+# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or
+# other source files which should be copied to the HTML output directory. Note
+# that these files will be copied to the base HTML output directory. Use the
+# $relpath^ marker in the HTML_HEADER and/or HTML_FOOTER files to load these
+# files. In the HTML_STYLESHEET file, use the file name only. Also note that the
+# files will be copied as-is; there are no commands or markers available.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_EXTRA_FILES       =
+
+# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen
+# will adjust the colors in the stylesheet and background images according to
+# this color. Hue is specified as an angle on a colorwheel, see
+# http://en.wikipedia.org/wiki/Hue for more information. For instance the value
+# 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300
+# purple, and 360 is red again.
+# Minimum value: 0, maximum value: 359, default value: 220.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_COLORSTYLE_HUE    = 220
+
+# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of the colors
+# in the HTML output. For a value of 0 the output will use grayscales only. A
+# value of 255 will produce the most vivid colors.
+# Minimum value: 0, maximum value: 255, default value: 100.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_COLORSTYLE_SAT    = 100
+
+# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to the
+# luminance component of the colors in the HTML output. Values below 100
+# gradually make the output lighter, whereas values above 100 make the output
+# darker. The value divided by 100 is the actual gamma applied, so 80 represents
+# a gamma of 0.8, The value 220 represents a gamma of 2.2, and 100 does not
+# change the gamma.
+# Minimum value: 40, maximum value: 240, default value: 80.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_COLORSTYLE_GAMMA  = 80
+
+# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML
+# page will contain the date and time when the page was generated. Setting this
+# to NO can help when comparing the output of multiple runs.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_TIMESTAMP         = YES
+
+# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML
+# documentation will contain sections that can be hidden and shown after the
+# page has loaded.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_DYNAMIC_SECTIONS  = NO
+
+# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries
+# shown in the various tree structured indices initially; the user can expand
+# and collapse entries dynamically later on. Doxygen will expand the tree to
+# such a level that at most the specified number of entries are visible (unless
+# a fully collapsed tree already exceeds this amount). So setting the number of
+# entries 1 will produce a full collapsed tree by default. 0 is a special value
+# representing an infinite number of entries and will result in a full expanded
+# tree by default.
+# Minimum value: 0, maximum value: 9999, default value: 100.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+HTML_INDEX_NUM_ENTRIES = 100
+
+# If the GENERATE_DOCSET tag is set to YES, additional index files will be
+# generated that can be used as input for Apple's Xcode 3 integrated development
+# environment (see: http://developer.apple.com/tools/xcode/), introduced with
+# OSX 10.5 (Leopard). To create a documentation set, doxygen will generate a
+# Makefile in the HTML output directory. Running make will produce the docset in
+# that directory and running make install will install the docset in
+# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at
+# startup. See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html
+# for more information.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+GENERATE_DOCSET        = NO
+
+# This tag determines the name of the docset feed. A documentation feed provides
+# an umbrella under which multiple documentation sets from a single provider
+# (such as a company or product suite) can be grouped.
+# The default value is: Doxygen generated docs.
+# This tag requires that the tag GENERATE_DOCSET is set to YES.
+
+DOCSET_FEEDNAME        = "Doxygen generated docs"
+
+# This tag specifies a string that should uniquely identify the documentation
+# set bundle. This should be a reverse domain-name style string, e.g.
+# com.mycompany.MyDocSet. Doxygen will append .docset to the name.
+# The default value is: org.doxygen.Project.
+# This tag requires that the tag GENERATE_DOCSET is set to YES.
+
+DOCSET_BUNDLE_ID       = org.doxygen.Project
+
+# The DOCSET_PUBLISHER_ID tag specifies a string that should uniquely identify
+# the documentation publisher. This should be a reverse domain-name style
+# string, e.g. com.mycompany.MyDocSet.documentation.
+# The default value is: org.doxygen.Publisher.
+# This tag requires that the tag GENERATE_DOCSET is set to YES.
+
+DOCSET_PUBLISHER_ID    = org.doxygen.Publisher
+
+# The DOCSET_PUBLISHER_NAME tag identifies the documentation publisher.
+# The default value is: Publisher.
+# This tag requires that the tag GENERATE_DOCSET is set to YES.
+
+DOCSET_PUBLISHER_NAME  = Publisher
+
+# If the GENERATE_HTMLHELP tag is set to YES then doxygen generates three
+# additional HTML index files: index.hhp, index.hhc, and index.hhk. The
+# index.hhp is a project file that can be read by Microsoft's HTML Help Workshop
+# (see: http://www.microsoft.com/en-us/download/details.aspx?id=21138) on
+# Windows.
+#
+# The HTML Help Workshop contains a compiler that can convert all HTML output
+# generated by doxygen into a single compiled HTML file (.chm). Compiled HTML
+# files are now used as the Windows 98 help format, and will replace the old
+# Windows help format (.hlp) on all Windows platforms in the future. Compressed
+# HTML files also contain an index, a table of contents, and you can search for
+# words in the documentation. The HTML workshop also contains a viewer for
+# compressed HTML files.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+GENERATE_HTMLHELP      = NO
+
+# The CHM_FILE tag can be used to specify the file name of the resulting .chm
+# file. You can add a path in front of the file if the result should not be
+# written to the html output directory.
+# This tag requires that the tag GENERATE_HTMLHELP is set to YES.
+
+CHM_FILE               =
+
+# The HHC_LOCATION tag can be used to specify the location (absolute path
+# including file name) of the HTML help compiler ( hhc.exe). If non-empty
+# doxygen will try to run the HTML help compiler on the generated index.hhp.
+# The file has to be specified with full path.
+# This tag requires that the tag GENERATE_HTMLHELP is set to YES.
+
+HHC_LOCATION           =
+
+# The GENERATE_CHI flag controls if a separate .chi index file is generated (
+# YES) or that it should be included in the master .chm file ( NO).
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTMLHELP is set to YES.
+
+GENERATE_CHI           = NO
+
+# The CHM_INDEX_ENCODING is used to encode HtmlHelp index ( hhk), content ( hhc)
+# and project file content.
+# This tag requires that the tag GENERATE_HTMLHELP is set to YES.
+
+CHM_INDEX_ENCODING     =
+
+# The BINARY_TOC flag controls whether a binary table of contents is generated (
+# YES) or a normal table of contents ( NO) in the .chm file.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTMLHELP is set to YES.
+
+BINARY_TOC             = NO
+
+# The TOC_EXPAND flag can be set to YES to add extra items for group members to
+# the table of contents of the HTML help documentation and to the tree view.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTMLHELP is set to YES.
+
+TOC_EXPAND             = NO
+
+# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and
+# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated that
+# can be used as input for Qt's qhelpgenerator to generate a Qt Compressed Help
+# (.qch) of the generated HTML documentation.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+GENERATE_QHP           = NO
+
+# If the QHG_LOCATION tag is specified, the QCH_FILE tag can be used to specify
+# the file name of the resulting .qch file. The path specified is relative to
+# the HTML output folder.
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QCH_FILE               =
+
+# The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help
+# Project output. For more information please see Qt Help Project / Namespace
+# (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#namespace).
+# The default value is: org.doxygen.Project.
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QHP_NAMESPACE          = org.doxygen.Project
+
+# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating Qt
+# Help Project output. For more information please see Qt Help Project / Virtual
+# Folders (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#virtual-
+# folders).
+# The default value is: doc.
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QHP_VIRTUAL_FOLDER     = doc
+
+# If the QHP_CUST_FILTER_NAME tag is set, it specifies the name of a custom
+# filter to add. For more information please see Qt Help Project / Custom
+# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-
+# filters).
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QHP_CUST_FILTER_NAME   =
+
+# The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the
+# custom filter to add. For more information please see Qt Help Project / Custom
+# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-
+# filters).
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QHP_CUST_FILTER_ATTRS  =
+
+# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this
+# project's filter section matches. Qt Help Project / Filter Attributes (see:
+# http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes).
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QHP_SECT_FILTER_ATTRS  =
+
+# The QHG_LOCATION tag can be used to specify the location of Qt's
+# qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the
+# generated .qhp file.
+# This tag requires that the tag GENERATE_QHP is set to YES.
+
+QHG_LOCATION           =
+
+# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be
+# generated, together with the HTML files, they form an Eclipse help plugin. To
+# install this plugin and make it available under the help contents menu in
+# Eclipse, the contents of the directory containing the HTML and XML files needs
+# to be copied into the plugins directory of eclipse. The name of the directory
+# within the plugins directory should be the same as the ECLIPSE_DOC_ID value.
+# After copying Eclipse needs to be restarted before the help appears.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+GENERATE_ECLIPSEHELP   = NO
+
+# A unique identifier for the Eclipse help plugin. When installing the plugin
+# the directory name containing the HTML and XML files should also have this
+# name. Each documentation set should have its own identifier.
+# The default value is: org.doxygen.Project.
+# This tag requires that the tag GENERATE_ECLIPSEHELP is set to YES.
+
+ECLIPSE_DOC_ID         = org.doxygen.Project
+
+# If you want full control over the layout of the generated HTML pages it might
+# be necessary to disable the index and replace it with your own. The
+# DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) at top
+# of each HTML page. A value of NO enables the index and the value YES disables
+# it. Since the tabs in the index contain the same information as the navigation
+# tree, you can set this option to YES if you also set GENERATE_TREEVIEW to YES.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+DISABLE_INDEX          = NO
+
+# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index
+# structure should be generated to display hierarchical information. If the tag
+# value is set to YES, a side panel will be generated containing a tree-like
+# index structure (just like the one that is generated for HTML Help). For this
+# to work a browser that supports JavaScript, DHTML, CSS and frames is required
+# (i.e. any modern browser). Windows users are probably better off using the
+# HTML help feature. Via custom stylesheets (see HTML_EXTRA_STYLESHEET) one can
+# further fine-tune the look of the index. As an example, the default style
+# sheet generated by doxygen has an example that shows how to put an image at
+# the root of the tree instead of the PROJECT_NAME. Since the tree basically has
+# the same information as the tab index, you could consider setting
+# DISABLE_INDEX to YES when enabling this option.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+GENERATE_TREEVIEW      = NO
+
+# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that
+# doxygen will group on one line in the generated HTML documentation.
+#
+# Note that a value of 0 will completely suppress the enum values from appearing
+# in the overview section.
+# Minimum value: 0, maximum value: 20, default value: 4.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+ENUM_VALUES_PER_LINE   = 4
+
+# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be used
+# to set the initial width (in pixels) of the frame in which the tree is shown.
+# Minimum value: 0, maximum value: 1500, default value: 250.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+TREEVIEW_WIDTH         = 250
+
+# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open links to
+# external symbols imported via tag files in a separate window.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+EXT_LINKS_IN_WINDOW    = NO
+
+# Use this tag to change the font size of LaTeX formulas included as images in
+# the HTML documentation. When you change the font size after a successful
+# doxygen run you need to manually remove any form_*.png images from the HTML
+# output directory to force them to be regenerated.
+# Minimum value: 8, maximum value: 50, default value: 10.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+FORMULA_FONTSIZE       = 10
+
+# Use the FORMULA_TRANPARENT tag to determine whether or not the images
+# generated for formulas are transparent PNGs. Transparent PNGs are not
+# supported properly for IE 6.0, but are supported on all modern browsers.
+#
+# Note that when changing this option you need to delete any form_*.png files in
+# the HTML output directory before the changes have effect.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+FORMULA_TRANSPARENT    = YES
+
+# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see
+# http://www.mathjax.org) which uses client side Javascript for the rendering
+# instead of using prerendered bitmaps. Use this if you do not have LaTeX
+# installed or if you want to formulas look prettier in the HTML output. When
+# enabled you may also need to install MathJax separately and configure the path
+# to it using the MATHJAX_RELPATH option.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+USE_MATHJAX            = NO
+
+# When MathJax is enabled you can set the default output format to be used for
+# the MathJax output. See the MathJax site (see:
+# http://docs.mathjax.org/en/latest/output.html) for more details.
+# Possible values are: HTML-CSS (which is slower, but has the best
+# compatibility), NativeMML (i.e. MathML) and SVG.
+# The default value is: HTML-CSS.
+# This tag requires that the tag USE_MATHJAX is set to YES.
+
+MATHJAX_FORMAT         = HTML-CSS
+
+# When MathJax is enabled you need to specify the location relative to the HTML
+# output directory using the MATHJAX_RELPATH option. The destination directory
+# should contain the MathJax.js script. For instance, if the mathjax directory
+# is located at the same level as the HTML output directory, then
+# MATHJAX_RELPATH should be ../mathjax. The default value points to the MathJax
+# Content Delivery Network so you can quickly see the result without installing
+# MathJax. However, it is strongly recommended to install a local copy of
+# MathJax from http://www.mathjax.org before deployment.
+# The default value is: http://cdn.mathjax.org/mathjax/latest.
+# This tag requires that the tag USE_MATHJAX is set to YES.
+
+MATHJAX_RELPATH        = http://cdn.mathjax.org/mathjax/latest
+
+# The MATHJAX_EXTENSIONS tag can be used to specify one or more MathJax
+# extension names that should be enabled during MathJax rendering. For example
+# MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols
+# This tag requires that the tag USE_MATHJAX is set to YES.
+
+MATHJAX_EXTENSIONS     =
+
+# The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces
+# of code that will be used on startup of the MathJax code. See the MathJax site
+# (see: http://docs.mathjax.org/en/latest/output.html) for more details. For an
+# example see the documentation.
+# This tag requires that the tag USE_MATHJAX is set to YES.
+
+MATHJAX_CODEFILE       =
+
+# When the SEARCHENGINE tag is enabled doxygen will generate a search box for
+# the HTML output. The underlying search engine uses javascript and DHTML and
+# should work on any modern browser. Note that when using HTML help
+# (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET)
+# there is already a search function so this one should typically be disabled.
+# For large projects the javascript based search engine can be slow, then
+# enabling SERVER_BASED_SEARCH may provide a better solution. It is possible to
+# search using the keyboard; to jump to the search box use <access key> + S
+# (what the <access key> is depends on the OS and browser, but it is typically
+# <CTRL>, <ALT>/<option>, or both). Inside the search box use the <cursor down
+# key> to jump into the search results window, the results can be navigated
+# using the <cursor keys>. Press <Enter> to select an item or <escape> to cancel
+# the search. The filter options can be selected when the cursor is inside the
+# search box by pressing <Shift>+<cursor down>. Also here use the <cursor keys>
+# to select a filter and <Enter> or <escape> to activate or cancel the filter
+# option.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_HTML is set to YES.
+
+SEARCHENGINE           = YES
+
+# When the SERVER_BASED_SEARCH tag is enabled the search engine will be
+# implemented using a web server instead of a web client using Javascript. There
+# are two flavours of web server based searching depending on the
+# EXTERNAL_SEARCH setting. When disabled, doxygen will generate a PHP script for
+# searching and an index file used by the script. When EXTERNAL_SEARCH is
+# enabled the indexing and searching needs to be provided by external tools. See
+# the section "External Indexing and Searching" for details.
+# The default value is: NO.
+# This tag requires that the tag SEARCHENGINE is set to YES.
+
+SERVER_BASED_SEARCH    = NO
+
+# When EXTERNAL_SEARCH tag is enabled doxygen will no longer generate the PHP
+# script for searching. Instead the search results are written to an XML file
+# which needs to be processed by an external indexer. Doxygen will invoke an
+# external search engine pointed to by the SEARCHENGINE_URL option to obtain the
+# search results.
+#
+# Doxygen ships with an example indexer ( doxyindexer) and search engine
+# (doxysearch.cgi) which are based on the open source search engine library
+# Xapian (see: http://xapian.org/).
+#
+# See the section "External Indexing and Searching" for details.
+# The default value is: NO.
+# This tag requires that the tag SEARCHENGINE is set to YES.
+
+EXTERNAL_SEARCH        = NO
+
+# The SEARCHENGINE_URL should point to a search engine hosted by a web server
+# which will return the search results when EXTERNAL_SEARCH is enabled.
+#
+# Doxygen ships with an example indexer ( doxyindexer) and search engine
+# (doxysearch.cgi) which are based on the open source search engine library
+# Xapian (see: http://xapian.org/). See the section "External Indexing and
+# Searching" for details.
+# This tag requires that the tag SEARCHENGINE is set to YES.
+
+SEARCHENGINE_URL       =
+
+# When SERVER_BASED_SEARCH and EXTERNAL_SEARCH are both enabled the unindexed
+# search data is written to a file for indexing by an external tool. With the
+# SEARCHDATA_FILE tag the name of this file can be specified.
+# The default file is: searchdata.xml.
+# This tag requires that the tag SEARCHENGINE is set to YES.
+
+SEARCHDATA_FILE        = searchdata.xml
+
+# When SERVER_BASED_SEARCH and EXTERNAL_SEARCH are both enabled the
+# EXTERNAL_SEARCH_ID tag can be used as an identifier for the project. This is
+# useful in combination with EXTRA_SEARCH_MAPPINGS to search through multiple
+# projects and redirect the results back to the right project.
+# This tag requires that the tag SEARCHENGINE is set to YES.
+
+EXTERNAL_SEARCH_ID     =
+
+# The EXTRA_SEARCH_MAPPINGS tag can be used to enable searching through doxygen
+# projects other than the one defined by this configuration file, but that are
+# all added to the same external search index. Each project needs to have a
+# unique id set via EXTERNAL_SEARCH_ID. The search mapping then maps the id of
+# to a relative location where the documentation can be found. The format is:
+# EXTRA_SEARCH_MAPPINGS = tagname1=loc1 tagname2=loc2 ...
+# This tag requires that the tag SEARCHENGINE is set to YES.
+
+EXTRA_SEARCH_MAPPINGS  =
+
+#---------------------------------------------------------------------------
+# Configuration options related to the LaTeX output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_LATEX tag is set to YES doxygen will generate LaTeX output.
+# The default value is: YES.
+
+GENERATE_LATEX         = NO
+
+# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. If a
+# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
+# it.
+# The default directory is: latex.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_OUTPUT           = latex
+
+# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be
+# invoked.
+#
+# Note that when enabling USE_PDFLATEX this option is only used for generating
+# bitmaps for formulas in the HTML output, but not in the Makefile that is
+# written to the output directory.
+# The default file is: latex.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_CMD_NAME         = latex
+
+# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to generate
+# index for LaTeX.
+# The default file is: makeindex.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+MAKEINDEX_CMD_NAME     = makeindex
+
+# If the COMPACT_LATEX tag is set to YES doxygen generates more compact LaTeX
+# documents. This may be useful for small projects and may help to save some
+# trees in general.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+COMPACT_LATEX          = NO
+
+# The PAPER_TYPE tag can be used to set the paper type that is used by the
+# printer.
+# Possible values are: a4 (210 x 297 mm), letter (8.5 x 11 inches), legal (8.5 x
+# 14 inches) and executive (7.25 x 10.5 inches).
+# The default value is: a4.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+PAPER_TYPE             = a4
+
+# The EXTRA_PACKAGES tag can be used to specify one or more LaTeX package names
+# that should be included in the LaTeX output. To get the times font for
+# instance you can specify
+# EXTRA_PACKAGES=times
+# If left blank no extra packages will be included.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+EXTRA_PACKAGES         =
+
+# The LATEX_HEADER tag can be used to specify a personal LaTeX header for the
+# generated LaTeX document. The header should contain everything until the first
+# chapter. If it is left blank doxygen will generate a standard header. See
+# section "Doxygen usage" for information on how to let doxygen write the
+# default header to a separate file.
+#
+# Note: Only use a user-defined header if you know what you are doing! The
+# following commands have a special meaning inside the header: $title,
+# $datetime, $date, $doxygenversion, $projectname, $projectnumber. Doxygen will
+# replace them by respectively the title of the page, the current date and time,
+# only the current date, the version number of doxygen, the project name (see
+# PROJECT_NAME), or the project number (see PROJECT_NUMBER).
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_HEADER           =
+
+# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for the
+# generated LaTeX document. The footer should contain everything after the last
+# chapter. If it is left blank doxygen will generate a standard footer.
+#
+# Note: Only use a user-defined footer if you know what you are doing!
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_FOOTER           =
+
+# The LATEX_EXTRA_FILES tag can be used to specify one or more extra images or
+# other source files which should be copied to the LATEX_OUTPUT output
+# directory. Note that the files will be copied as-is; there are no commands or
+# markers available.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_EXTRA_FILES      =
+
+# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated is
+# prepared for conversion to PDF (using ps2pdf or pdflatex). The PDF file will
+# contain links (just like the HTML output) instead of page references. This
+# makes the output suitable for online browsing using a PDF viewer.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+PDF_HYPERLINKS         = YES
+
+# If the LATEX_PDFLATEX tag is set to YES, doxygen will use pdflatex to generate
+# the PDF file directly from the LaTeX files. Set this option to YES to get a
+# higher quality PDF documentation.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+USE_PDFLATEX           = YES
+
+# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \batchmode
+# command to the generated LaTeX files. This will instruct LaTeX to keep running
+# if errors occur, instead of asking the user for help. This option is also used
+# when generating formulas in HTML.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_BATCHMODE        = NO
+
+# If the LATEX_HIDE_INDICES tag is set to YES then doxygen will not include the
+# index chapters (such as File Index, Compound Index, etc.) in the output.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_HIDE_INDICES     = NO
+
+# If the LATEX_SOURCE_CODE tag is set to YES then doxygen will include source
+# code with syntax highlighting in the LaTeX output.
+#
+# Note that which sources are shown also depends on other settings such as
+# SOURCE_BROWSER.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_SOURCE_CODE      = NO
+
+# The LATEX_BIB_STYLE tag can be used to specify the style to use for the
+# bibliography, e.g. plainnat, or ieeetr. See
+# http://en.wikipedia.org/wiki/BibTeX and \cite for more info.
+# The default value is: plain.
+# This tag requires that the tag GENERATE_LATEX is set to YES.
+
+LATEX_BIB_STYLE        = plain
+
+#---------------------------------------------------------------------------
+# Configuration options related to the RTF output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_RTF tag is set to YES doxygen will generate RTF output. The
+# RTF output is optimized for Word 97 and may not look too pretty with other RTF
+# readers/editors.
+# The default value is: NO.
+
+GENERATE_RTF           = NO
+
+# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. If a
+# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
+# it.
+# The default directory is: rtf.
+# This tag requires that the tag GENERATE_RTF is set to YES.
+
+RTF_OUTPUT             = rtf
+
+# If the COMPACT_RTF tag is set to YES doxygen generates more compact RTF
+# documents. This may be useful for small projects and may help to save some
+# trees in general.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_RTF is set to YES.
+
+COMPACT_RTF            = NO
+
+# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated will
+# contain hyperlink fields. The RTF file will contain links (just like the HTML
+# output) instead of page references. This makes the output suitable for online
+# browsing using Word or some other Word compatible readers that support those
+# fields.
+#
+# Note: WordPad (write) and others do not support links.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_RTF is set to YES.
+
+RTF_HYPERLINKS         = NO
+
+# Load stylesheet definitions from file. Syntax is similar to doxygen's config
+# file, i.e. a series of assignments. You only have to provide replacements,
+# missing definitions are set to their default value.
+#
+# See also section "Doxygen usage" for information on how to generate the
+# default style sheet that doxygen normally uses.
+# This tag requires that the tag GENERATE_RTF is set to YES.
+
+RTF_STYLESHEET_FILE    =
+
+# Set optional variables used in the generation of an RTF document. Syntax is
+# similar to doxygen's config file. A template extensions file can be generated
+# using doxygen -e rtf extensionFile.
+# This tag requires that the tag GENERATE_RTF is set to YES.
+
+RTF_EXTENSIONS_FILE    =
+
+#---------------------------------------------------------------------------
+# Configuration options related to the man page output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_MAN tag is set to YES doxygen will generate man pages for
+# classes and files.
+# The default value is: NO.
+
+GENERATE_MAN           = NO
+
+# The MAN_OUTPUT tag is used to specify where the man pages will be put. If a
+# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
+# it. A directory man3 will be created inside the directory specified by
+# MAN_OUTPUT.
+# The default directory is: man.
+# This tag requires that the tag GENERATE_MAN is set to YES.
+
+MAN_OUTPUT             = man
+
+# The MAN_EXTENSION tag determines the extension that is added to the generated
+# man pages. In case the manual section does not start with a number, the number
+# 3 is prepended. The dot (.) at the beginning of the MAN_EXTENSION tag is
+# optional.
+# The default value is: .3.
+# This tag requires that the tag GENERATE_MAN is set to YES.
+
+MAN_EXTENSION          = .3
+
+# If the MAN_LINKS tag is set to YES and doxygen generates man output, then it
+# will generate one additional man file for each entity documented in the real
+# man page(s). These additional files only source the real man page, but without
+# them the man command would be unable to find the correct page.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_MAN is set to YES.
+
+MAN_LINKS              = NO
+
+#---------------------------------------------------------------------------
+# Configuration options related to the XML output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_XML tag is set to YES doxygen will generate an XML file that
+# captures the structure of the code including all documentation.
+# The default value is: NO.
+
+GENERATE_XML           = NO
+
+# The XML_OUTPUT tag is used to specify where the XML pages will be put. If a
+# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
+# it.
+# The default directory is: xml.
+# This tag requires that the tag GENERATE_XML is set to YES.
+
+XML_OUTPUT             = xml
+
+# The XML_SCHEMA tag can be used to specify a XML schema, which can be used by a
+# validating XML parser to check the syntax of the XML files.
+# This tag requires that the tag GENERATE_XML is set to YES.
+
+XML_SCHEMA             =
+
+# The XML_DTD tag can be used to specify a XML DTD, which can be used by a
+# validating XML parser to check the syntax of the XML files.
+# This tag requires that the tag GENERATE_XML is set to YES.
+
+XML_DTD                =
+
+# If the XML_PROGRAMLISTING tag is set to YES doxygen will dump the program
+# listings (including syntax highlighting and cross-referencing information) to
+# the XML output. Note that enabling this will significantly increase the size
+# of the XML output.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_XML is set to YES.
+
+XML_PROGRAMLISTING     = YES
+
+#---------------------------------------------------------------------------
+# Configuration options related to the DOCBOOK output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_DOCBOOK tag is set to YES doxygen will generate Docbook files
+# that can be used to generate PDF.
+# The default value is: NO.
+
+GENERATE_DOCBOOK       = NO
+
+# The DOCBOOK_OUTPUT tag is used to specify where the Docbook pages will be put.
+# If a relative path is entered the value of OUTPUT_DIRECTORY will be put in
+# front of it.
+# The default directory is: docbook.
+# This tag requires that the tag GENERATE_DOCBOOK is set to YES.
+
+DOCBOOK_OUTPUT         = docbook
+
+#---------------------------------------------------------------------------
+# Configuration options for the AutoGen Definitions output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_AUTOGEN_DEF tag is set to YES doxygen will generate an AutoGen
+# Definitions (see http://autogen.sf.net) file that captures the structure of
+# the code including all documentation. Note that this feature is still
+# experimental and incomplete at the moment.
+# The default value is: NO.
+
+GENERATE_AUTOGEN_DEF   = NO
+
+#---------------------------------------------------------------------------
+# Configuration options related to the Perl module output
+#---------------------------------------------------------------------------
+
+# If the GENERATE_PERLMOD tag is set to YES doxygen will generate a Perl module
+# file that captures the structure of the code including all documentation.
+#
+# Note that this feature is still experimental and incomplete at the moment.
+# The default value is: NO.
+
+GENERATE_PERLMOD       = NO
+
+# If the PERLMOD_LATEX tag is set to YES doxygen will generate the necessary
+# Makefile rules, Perl scripts and LaTeX code to be able to generate PDF and DVI
+# output from the Perl module output.
+# The default value is: NO.
+# This tag requires that the tag GENERATE_PERLMOD is set to YES.
+
+PERLMOD_LATEX          = NO
+
+# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be nicely
+# formatted so it can be parsed by a human reader. This is useful if you want to
+# understand what is going on. On the other hand, if this tag is set to NO the
+# size of the Perl module output will be much smaller and Perl will parse it
+# just the same.
+# The default value is: YES.
+# This tag requires that the tag GENERATE_PERLMOD is set to YES.
+
+PERLMOD_PRETTY         = YES
+
+# The names of the make variables in the generated doxyrules.make file are
+# prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. This is useful
+# so different doxyrules.make files included by the same Makefile don't
+# overwrite each other's variables.
+# This tag requires that the tag GENERATE_PERLMOD is set to YES.
+
+PERLMOD_MAKEVAR_PREFIX =
+
+#---------------------------------------------------------------------------
+# Configuration options related to the preprocessor
+#---------------------------------------------------------------------------
+
+# If the ENABLE_PREPROCESSING tag is set to YES doxygen will evaluate all
+# C-preprocessor directives found in the sources and include files.
+# The default value is: YES.
+
+ENABLE_PREPROCESSING   = YES
+
+# If the MACRO_EXPANSION tag is set to YES doxygen will expand all macro names
+# in the source code. If set to NO only conditional compilation will be
+# performed. Macro expansion can be done in a controlled way by setting
+# EXPAND_ONLY_PREDEF to YES.
+# The default value is: NO.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+MACRO_EXPANSION        = NO
+
+# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES then
+# the macro expansion is limited to the macros specified with the PREDEFINED and
+# EXPAND_AS_DEFINED tags.
+# The default value is: NO.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+EXPAND_ONLY_PREDEF     = NO
+
+# If the SEARCH_INCLUDES tag is set to YES the includes files in the
+# INCLUDE_PATH will be searched if a #include is found.
+# The default value is: YES.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+SEARCH_INCLUDES        = YES
+
+# The INCLUDE_PATH tag can be used to specify one or more directories that
+# contain include files that are not input files but should be processed by the
+# preprocessor.
+# This tag requires that the tag SEARCH_INCLUDES is set to YES.
+
+INCLUDE_PATH           =
+
+# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard
+# patterns (like *.h and *.hpp) to filter out the header-files in the
+# directories. If left blank, the patterns specified with FILE_PATTERNS will be
+# used.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+INCLUDE_FILE_PATTERNS  =
+
+# The PREDEFINED tag can be used to specify one or more macro names that are
+# defined before the preprocessor is started (similar to the -D option of e.g.
+# gcc). The argument of the tag is a list of macros of the form: name or
+# name=definition (no spaces). If the definition and the "=" are omitted, "=1"
+# is assumed. To prevent a macro definition from being undefined via #undef or
+# recursively expanded use the := operator instead of the = operator.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+PREDEFINED             = USE_VC
+
+# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this
+# tag can be used to specify a list of macro names that should be expanded. The
+# macro definition that is found in the sources will be used. Use the PREDEFINED
+# tag if you want to use a different macro definition that overrules the
+# definition found in the source code.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+EXPAND_AS_DEFINED      =
+
+# If the SKIP_FUNCTION_MACROS tag is set to YES then doxygen's preprocessor will
+# remove all refrences to function-like macros that are alone on a line, have an
+# all uppercase name, and do not end with a semicolon. Such function macros are
+# typically used for boiler-plate code, and will confuse the parser if not
+# removed.
+# The default value is: YES.
+# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
+
+SKIP_FUNCTION_MACROS   = YES
+
+#---------------------------------------------------------------------------
+# Configuration options related to external references
+#---------------------------------------------------------------------------
+
+# The TAGFILES tag can be used to specify one or more tag files. For each tag
+# file the location of the external documentation should be added. The format of
+# a tag file without this location is as follows:
+# TAGFILES = file1 file2 ...
+# Adding location for the tag files is done as follows:
+# TAGFILES = file1=loc1 "file2 = loc2" ...
+# where loc1 and loc2 can be relative or absolute paths or URLs. See the
+# section "Linking to external documentation" for more information about the use
+# of tag files.
+# Note: Each tag file must have an unique name (where the name does NOT include
+# the path). If a tag file is not located in the directory in which doxygen is
+# run, you must also specify the path to the tagfile here.
+
+TAGFILES               =
+
+# When a file name is specified after GENERATE_TAGFILE, doxygen will create a
+# tag file that is based on the input files it reads. See section "Linking to
+# external documentation" for more information about the usage of tag files.
+
+GENERATE_TAGFILE       =
+
+# If the ALLEXTERNALS tag is set to YES all external class will be listed in the
+# class index. If set to NO only the inherited external classes will be listed.
+# The default value is: NO.
+
+ALLEXTERNALS           = NO
+
+# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed in
+# the modules index. If set to NO, only the current project's groups will be
+# listed.
+# The default value is: YES.
+
+EXTERNAL_GROUPS        = YES
+
+# If the EXTERNAL_PAGES tag is set to YES all external pages will be listed in
+# the related pages index. If set to NO, only the current project's pages will
+# be listed.
+# The default value is: YES.
+
+EXTERNAL_PAGES         = YES
+
+# The PERL_PATH should be the absolute path and name of the perl script
+# interpreter (i.e. the result of 'which perl').
+# The default file (with absolute path) is: /usr/bin/perl.
+
+PERL_PATH              = /usr/bin/perl
+
+#---------------------------------------------------------------------------
+# Configuration options related to the dot tool
+#---------------------------------------------------------------------------
+
+# If the CLASS_DIAGRAMS tag is set to YES doxygen will generate a class diagram
+# (in HTML and LaTeX) for classes with base or super classes. Setting the tag to
+# NO turns the diagrams off. Note that this option also works with HAVE_DOT
+# disabled, but it is recommended to install and use dot, since it yields more
+# powerful graphs.
+# The default value is: YES.
+
+CLASS_DIAGRAMS         = YES
+
+# You can define message sequence charts within doxygen comments using the \msc
+# command. Doxygen will then run the mscgen tool (see:
+# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the
+# documentation. The MSCGEN_PATH tag allows you to specify the directory where
+# the mscgen tool resides. If left empty the tool is assumed to be found in the
+# default search path.
+
+MSCGEN_PATH            =
+
+# You can include diagrams made with dia in doxygen documentation. Doxygen will
+# then run dia to produce the diagram and insert it in the documentation. The
+# DIA_PATH tag allows you to specify the directory where the dia binary resides.
+# If left empty dia is assumed to be found in the default search path.
+
+DIA_PATH               =
+
+# If set to YES, the inheritance and collaboration graphs will hide inheritance
+# and usage relations if the target is undocumented or is not a class.
+# The default value is: YES.
+
+HIDE_UNDOC_RELATIONS   = YES
+
+# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is
+# available from the path. This tool is part of Graphviz (see:
+# http://www.graphviz.org/), a graph visualization toolkit from AT&T and Lucent
+# Bell Labs. The other options in this section have no effect if this option is
+# set to NO
+# The default value is: NO.
+
+HAVE_DOT               = NO
+
+# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is allowed
+# to run in parallel. When set to 0 doxygen will base this on the number of
+# processors available in the system. You can set it explicitly to a value
+# larger than 0 to get control over the balance between CPU load and processing
+# speed.
+# Minimum value: 0, maximum value: 32, default value: 0.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_NUM_THREADS        = 0
+
+# When you want a differently looking font n the dot files that doxygen
+# generates you can specify the font name using DOT_FONTNAME. You need to make
+# sure dot is able to find the font, which can be done by putting it in a
+# standard location or by setting the DOTFONTPATH environment variable or by
+# setting DOT_FONTPATH to the directory containing the font.
+# The default value is: Helvetica.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_FONTNAME           = Helvetica
+
+# The DOT_FONTSIZE tag can be used to set the size (in points) of the font of
+# dot graphs.
+# Minimum value: 4, maximum value: 24, default value: 10.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_FONTSIZE           = 10
+
+# By default doxygen will tell dot to use the default font as specified with
+# DOT_FONTNAME. If you specify a different font using DOT_FONTNAME you can set
+# the path where dot can find it using this tag.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_FONTPATH           =
+
+# If the CLASS_GRAPH tag is set to YES then doxygen will generate a graph for
+# each documented class showing the direct and indirect inheritance relations.
+# Setting this tag to YES will force the CLASS_DIAGRAMS tag to NO.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+CLASS_GRAPH            = YES
+
+# If the COLLABORATION_GRAPH tag is set to YES then doxygen will generate a
+# graph for each documented class showing the direct and indirect implementation
+# dependencies (inheritance, containment, and class references variables) of the
+# class with other documented classes.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+COLLABORATION_GRAPH    = YES
+
+# If the GROUP_GRAPHS tag is set to YES then doxygen will generate a graph for
+# groups, showing the direct groups dependencies.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+GROUP_GRAPHS           = YES
+
+# If the UML_LOOK tag is set to YES doxygen will generate inheritance and
+# collaboration diagrams in a style similar to the OMG's Unified Modeling
+# Language.
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+UML_LOOK               = NO
+
+# If the UML_LOOK tag is enabled, the fields and methods are shown inside the
+# class node. If there are many fields or methods and many nodes the graph may
+# become too big to be useful. The UML_LIMIT_NUM_FIELDS threshold limits the
+# number of items for each type to make the size more manageable. Set this to 0
+# for no limit. Note that the threshold may be exceeded by 50% before the limit
+# is enforced. So when you set the threshold to 10, up to 15 fields may appear,
+# but if the number exceeds 15, the total amount of fields shown is limited to
+# 10.
+# Minimum value: 0, maximum value: 100, default value: 10.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+UML_LIMIT_NUM_FIELDS   = 10
+
+# If the TEMPLATE_RELATIONS tag is set to YES then the inheritance and
+# collaboration graphs will show the relations between templates and their
+# instances.
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+TEMPLATE_RELATIONS     = NO
+
+# If the INCLUDE_GRAPH, ENABLE_PREPROCESSING and SEARCH_INCLUDES tags are set to
+# YES then doxygen will generate a graph for each documented file showing the
+# direct and indirect include dependencies of the file with other documented
+# files.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+INCLUDE_GRAPH          = YES
+
+# If the INCLUDED_BY_GRAPH, ENABLE_PREPROCESSING and SEARCH_INCLUDES tags are
+# set to YES then doxygen will generate a graph for each documented file showing
+# the direct and indirect include dependencies of the file with other documented
+# files.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+INCLUDED_BY_GRAPH      = YES
+
+# If the CALL_GRAPH tag is set to YES then doxygen will generate a call
+# dependency graph for every global function or class method.
+#
+# Note that enabling this option will significantly increase the time of a run.
+# So in most cases it will be better to enable call graphs for selected
+# functions only using the \callgraph command.
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+CALL_GRAPH             = NO
+
+# If the CALLER_GRAPH tag is set to YES then doxygen will generate a caller
+# dependency graph for every global function or class method.
+#
+# Note that enabling this option will significantly increase the time of a run.
+# So in most cases it will be better to enable caller graphs for selected
+# functions only using the \callergraph command.
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+CALLER_GRAPH           = NO
+
+# If the GRAPHICAL_HIERARCHY tag is set to YES then doxygen will graphical
+# hierarchy of all classes instead of a textual one.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+GRAPHICAL_HIERARCHY    = YES
+
+# If the DIRECTORY_GRAPH tag is set to YES then doxygen will show the
+# dependencies a directory has on other directories in a graphical way. The
+# dependency relations are determined by the #include relations between the
+# files in the directories.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DIRECTORY_GRAPH        = YES
+
+# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images
+# generated by dot.
+# Note: If you choose svg you need to set HTML_FILE_EXTENSION to xhtml in order
+# to make the SVG files visible in IE 9+ (other browsers do not have this
+# requirement).
+# Possible values are: png, jpg, gif and svg.
+# The default value is: png.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_IMAGE_FORMAT       = png
+
+# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to
+# enable generation of interactive SVG images that allow zooming and panning.
+#
+# Note that this requires a modern browser other than Internet Explorer. Tested
+# and working are Firefox, Chrome, Safari, and Opera.
+# Note: For IE 9+ you need to set HTML_FILE_EXTENSION to xhtml in order to make
+# the SVG files visible. Older versions of IE do not have SVG support.
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+INTERACTIVE_SVG        = NO
+
+# The DOT_PATH tag can be used to specify the path where the dot tool can be
+# found. If left blank, it is assumed the dot tool can be found in the path.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_PATH               =
+
+# The DOTFILE_DIRS tag can be used to specify one or more directories that
+# contain dot files that are included in the documentation (see the \dotfile
+# command).
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOTFILE_DIRS           =
+
+# The MSCFILE_DIRS tag can be used to specify one or more directories that
+# contain msc files that are included in the documentation (see the \mscfile
+# command).
+
+MSCFILE_DIRS           =
+
+# The DIAFILE_DIRS tag can be used to specify one or more directories that
+# contain dia files that are included in the documentation (see the \diafile
+# command).
+
+DIAFILE_DIRS           =
+
+# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of nodes
+# that will be shown in the graph. If the number of nodes in a graph becomes
+# larger than this value, doxygen will truncate the graph, which is visualized
+# by representing a node as a red box. Note that doxygen if the number of direct
+# children of the root node in a graph is already larger than
+# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note that
+# the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH.
+# Minimum value: 0, maximum value: 10000, default value: 50.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_GRAPH_MAX_NODES    = 50
+
+# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the graphs
+# generated by dot. A depth value of 3 means that only nodes reachable from the
+# root by following a path via at most 3 edges will be shown. Nodes that lay
+# further from the root node will be omitted. Note that setting this option to 1
+# or 2 may greatly reduce the computation time needed for large code bases. Also
+# note that the size of a graph can be further restricted by
+# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction.
+# Minimum value: 0, maximum value: 1000, default value: 0.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+MAX_DOT_GRAPH_DEPTH    = 0
+
+# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent
+# background. This is disabled by default, because dot on Windows does not seem
+# to support this out of the box.
+#
+# Warning: Depending on the platform used, enabling this option may lead to
+# badly anti-aliased labels on the edges of a graph (i.e. they become hard to
+# read).
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_TRANSPARENT        = NO
+
+# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output
+# files in one run (i.e. multiple -o and -T options on the command line). This
+# makes dot run faster, but since only newer versions of dot (>1.8.10) support
+# this, this feature is disabled by default.
+# The default value is: NO.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_MULTI_TARGETS      = YES
+
+# If the GENERATE_LEGEND tag is set to YES doxygen will generate a legend page
+# explaining the meaning of the various boxes and arrows in the dot generated
+# graphs.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+GENERATE_LEGEND        = YES
+
+# If the DOT_CLEANUP tag is set to YES doxygen will remove the intermediate dot
+# files that are used to generate the various graphs.
+# The default value is: YES.
+# This tag requires that the tag HAVE_DOT is set to YES.
+
+DOT_CLEANUP            = YES
diff --git a/vspline.h b/vspline.h
new file mode 100644
index 0000000..bf89f9c
--- /dev/null
+++ b/vspline.h
@@ -0,0 +1,46 @@
+/************************************************************************/
+/*                                                                      */
+/*    vspline - a set of generic tools for creation and evaluation      */
+/*              of uniform b-splines                                    */
+/*                                                                      */
+/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
+/*                                                                      */
+/*    The git repository for this software is at                        */
+/*                                                                      */
+/*    https://bitbucket.org/kfj/vspline                                 */
+/*                                                                      */
+/*    Please direct questions, bug reports, and contributions to        */
+/*                                                                      */
+/*    kfjahnke+vspline at gmail.com                                        */
+/*                                                                      */
+/*    Permission is hereby granted, free of charge, to any person       */
+/*    obtaining a copy of this software and associated documentation    */
+/*    files (the "Software"), to deal in the Software without           */
+/*    restriction, including without limitation the rights to use,      */
+/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
+/*    sell copies of the Software, and to permit persons to whom the    */
+/*    Software is furnished to do so, subject to the following          */
+/*    conditions:                                                       */
+/*                                                                      */
+/*    The above copyright notice and this permission notice shall be    */
+/*    included in all copies or substantial portions of the             */
+/*    Software.                                                         */
+/*                                                                      */
+/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
+/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
+/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
+/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
+/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
+/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
+/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
+/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
+/*                                                                      */
+/************************************************************************/
+
+/*! \file vspline.h
+
+    \brief includes all headers from vspline (most of them indirectly)
+
+*/
+
+#include "remap.h"

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/vspline.git



More information about the debian-science-commits mailing list