[vspline] 01/01: brought the sources up to 0.2.0

Fri Sep 8 10:15:57 UTC 2017

This is an automated email from the git hooks/post-receive script.

kfj-guest pushed a commit to branch master
in repository vspline.

commit 4f1f2d1cfedb69b2d635c865f3b8ee918a72131e
Author: Kay F. Jahnke <kfjahnke at gmail.com>
Date:   Fri Sep 8 12:10:28 2017 +0200

    brought the sources up to 0.2.0
---
 README.rst                   |   8 +-
 brace.h                      |   2 +-
 bspline.h                    |  13 ++
 common.h                     | 174 ++++++--------------------
 debian/changelog             |   4 +-
 debian/control               |   9 +-
 debian/debhelper-build-stamp |   1 -
 debian/files                 |   2 +-
 debian/vspline-dev.examples  |   1 -
 eval.h                       | 288 ++++++++++++++++++++++++-------------------
 example/eval.cc              |  50 ++------
 example/gradient.cc          |   4 +-
 example/gsm.cc               |  10 +-
 example/gsm2.cc              |  54 ++++++--
 example/roundtrip.cc         |  34 +----
 example/slice.cc             |  15 +--
 example/slice2.cc            |  16 +--
 example/slice3.cc            |   1 +
 example/splinus.cc           |  80 +++++++++---
 filter.h                     |   9 +-
 map.h                        |  96 ++++++++-------
 multithread.h                |  36 +-----
 poles.h                      |   4 +-
 prefilter.h                  |   6 +-
 prefilter_poles.cc           |   4 +-
 remap.h                      | 124 ++-----------------
 thread_pool.h                |   2 +-
 unary_functor.h              |  40 ++++--
 vspline.doxy                 |   2 +-
 29 files changed, 474 insertions(+), 615 deletions(-)

diff --git a/README.rst b/README.rst
index 1417fb4..2f1ba3b 100644
--- a/README.rst
+++ b/README.rst
@@ -26,8 +26,8 @@ would have been compromised.
 While some of the code is quite low-level, there are reasonably high-level mechanisms
 to interface with vspline, allowing easy access to it's functionality without requiring
 users to familiarize themselves with the internal workings. High-level approach is
-provided via class 'bspline' defined in bspline.h, and via the remap functions
-defined in remap.h.
+provided via class 'bspline' defined in bspline.h, 'evaluator' objects from eval.h
+and via the remap functions defined in remap.h.
 
 While I made an attempt to write code which is portable, vspline is only tested with
 g++ and clang++ on Linux. It may work in other environments, but it's unlikely it will
@@ -52,8 +52,8 @@ The second approach to B-splines comes from signal processing, and it's the one
 
 I have made an attempt to generalize the code so that it can handle
 
-- arbitrary real data types and their aggregates [1]_
-- coming in strided memory
+- real data types and their aggregates [1]_
+- coming in strided nD memory
 - a reasonable selection of boundary conditions
 - used in either an implicit or an explicit scheme of extrapolation
 - arbitrary spline orders
diff --git a/brace.h b/brace.h
index 98d251a..30e934c 100644
--- a/brace.h
+++ b/brace.h
@@ -39,7 +39,7 @@
 
 /*! \file brace.h
 
-    \brief This file provides code for 'bracing' the spline coefficient array.
+    \brief This file provides code for 'bracing' the spline's coefficient array.
 
     Inspired by libeinspline, I wrote code to 'brace' the spline coefficients. The concept is
     this: while the IIR filter used to calculate the coefficients has infinite support (though
diff --git a/bspline.h b/bspline.h
index bf29b80..6579b7d 100644
--- a/bspline.h
+++ b/bspline.h
@@ -52,6 +52,11 @@
   bspline objects can be used without any knowledge of their internals,
   e.g. as parameters to the remap functions.
   
+  Note that class bspline does not provide evaluation of the spline. To evaluate,
+  objects of class evaluator (see eval.h) are used, which construct from a bspline
+  object with additional parameters, like, whether to calculate the spline's
+  value or it's derivative(s) or whether to use optimizations for special cases.
+  
   While using 'raw' coefficient arrays with an evaluation scheme which applies
   boundary conditions is feasible and most memory-efficient, it's not so well
   suited for very fast evaluation, since the boundary treatment needs conditionals,
@@ -165,6 +170,14 @@
 
 namespace vspline {
 
+/// This enumeration is used to determine the prefiltering scheme to be used.
+
+typedef enum { UNBRACED , ///< implicit scheme, no bracing applied
+               BRACED ,   ///< implicit scheme, bracing will be applied
+               EXPLICIT , ///< explicit scheme, frame with extrapolated signal, brace
+               MANUAL     ///< like explicit, but don't frame before filtering
+} prefilter_strategy  ;
+
 /// struct bspline is a convenience class which bundles a coefficient array (and it's creation)
 /// with a set of metadata describing the parameters used to create the coefficients and the
 /// resulting data. I have chosen to implement class bspline so that there is only a minimal
diff --git a/common.h b/common.h
index 0f8b44c..b41b386 100644
--- a/common.h
+++ b/common.h
@@ -46,9 +46,6 @@
     - a traits class fixing the simdized types used for vectorized code
     
     - exceptions used throughout vspline
-    
-    - constants and enums used throughout vspline
-  
 */
 
 #ifndef VSPLINE_COMMON
@@ -62,7 +59,43 @@
 
 #endif
 
-namespace vspline {
+namespace vspline
+{
+
+/// This enumeration is used for codes connected to boundary conditions. There are
+/// two aspects to boundary conditions: During prefiltering, if the implicit scheme is used,
+/// the initial causal and anticausal coefficients have to be calculated in a way specific to
+/// the chosen boundary conditions. Bracing, both before prefiltering when using the explicit
+/// scheme, and after prefiltering when using the implicit scheme, also needs these codes to
+/// pick the appropriate extrapolation code to extend the knot point data/coefficients beyond
+/// the core array.
+
+typedef enum { 
+  MIRROR ,    ///< mirror on the bounds, so that f(-x) == f(x)
+  PERIODIC,   ///< periodic boundary conditions
+  REFLECT ,   ///< reflect, so  that f(-1) == f(0) (mirror between bounds)
+  NATURAL,    ///< natural boundary conditions, f(-x) + f(x) == 2 * f(0)
+  CONSTANT ,  ///< clamp. used for framing, with explicit prefilter scheme
+  ZEROPAD ,   ///< used for boundary condition, bracing
+  IDENTITY ,  ///< used as solver argument, mostly internal use
+  GUESS ,     ///< used with EXPLICIT scheme to keep margin errors low
+  SPHERICAL , ///< use for spherical panoramas, y axis
+} bc_code;
+
+/// bc_name is for diagnostic output of bc codes
+
+const std::string bc_name[] =
+{
+  "MIRROR" ,
+  "PERIODIC",
+  "REFLECT" ,
+  "NATURAL",
+  "CONSTANT" ,
+  "ZEROPAD" ,
+  "IDENTITY" ,
+  "GUESS" ,
+  "SPHERICAL" ,
+} ;
 
 #ifdef USE_VC
 
@@ -179,139 +212,6 @@ struct numeric_overflow
   : std::invalid_argument ( msg ) { }  ;
 } ;
 
-/// This enumeration is used for codes connected to boundary conditions. There are
-/// two aspects to boundary conditions: During prefiltering, if the implicit scheme is used,
-/// the initial causal and anticausal coefficients have to be calculated in a way specific to
-/// the chosen boundary conditions. Bracing, both before prefiltering when using the explicit
-/// scheme, and after prefiltering when using the implicit scheme, also needs these codes to
-/// pick the appropriate extrapolation code to extend the knot point data/coefficients beyond
-/// the core array.
-
-typedef enum { 
-  MIRROR ,    ///< mirror on the bounds, so that f(-x) == f(x)
-  PERIODIC,   ///< periodic boundary conditions
-  REFLECT ,   ///< reflect, so  that f(-1) == f(0) (mirror between bounds)
-  NATURAL,    ///< natural boundary conditions, f(-x) + f(x) == 2 * f(0)
-  CONSTANT ,  ///< clamp. used for framing, with explicit prefilter scheme
-  ZEROPAD ,   ///< used for boundary condition, bracing
-  IDENTITY ,  ///< used as solver argument, mostly internal use
-  GUESS ,     ///< used with EXPLICIT scheme to keep margin errors low
-  SPHERICAL , ///< use for spherical panoramas, y axis
-} bc_code;
-
-/// This enumeration is used by the convenience class 'bspline' to determine the prefiltering
-/// scheme to be used.
-
-typedef enum { UNBRACED , ///< implicit scheme, no bracing applied
-               BRACED ,   ///< implicit scheme, bracing will be applied
-               EXPLICIT , ///< explicit scheme, frame with extrapolated signal, brace
-               MANUAL     ///< like explicit, but don't frame before filtering
-} prefilter_strategy  ;
-
-/// bc_name is for diagnostic output of bc codes
-
-const std::string bc_name[] =
-{
-  "MIRROR" ,
-  "PERIODIC",
-  "REFLECT" ,
-  "NATURAL",
-  "CONSTANT" ,
-  "ZEROPAD" ,
-  "IDENTITY" ,
-  "GUESS" ,
-  "SPHERICAL" ,
-} ;
-
 } ; // end of namespace vspline
 
-#ifdef USE_VC
-
-// by defining the relevant traits for Vc::Vectors and Vc::Simdarrays,
-// vspline is able to use vigra arithmetics with these types.
-// This is intended mainly to convert legacy code iterating over
-// TinyVectors of Vc SIMD types by code which directly applies
-// arithmetic operations to the TinyVectors themselves.
-// This is great. consider:
-//
-//  typedef Vc::SimdArray < float , 6 > simd_type ;
-//  typedef vigra::TinyVector < simd_type , 3 > nd_simd_type ;
-//  simd_type a { -1 , 1 , 2 , 3 , 4 , 5 } ;
-//  simd_type b { 2 , 3 , 4 , 5 , 1 , 2 } ;
-//  nd_simd_type aaa { a , a , a } ;
-//  nd_simd_type bbb = { b , b , b } ;
-//  auto ccc = aaa + sqrt ( bbb ) ;
-
-namespace vigra
-{
-  template < typename real_type , int N >
-  struct NumericTraits < Vc::SimdArray < real_type , N > >
-  {
-      typedef Vc::SimdArray < real_type , N > Type;
-      typedef Type Promote;
-      typedef Type UnsignedPromote;
-      typedef Type RealPromote;
-      typedef std::complex<RealPromote> ComplexPromote;
-      typedef Type ValueType;
-      
-      typedef VigraFalseType isIntegral;
-      typedef VigraFalseType isScalar;
-      typedef VigraFalseType isSigned;
-      typedef VigraFalseType isOrdered;
-      typedef VigraFalseType isComplex;
-      
-      static Type zero() { return Type::Zero() ; }
-      static Type one() { return Type::One() ; }
-      static Type nonZero() { return Type::One() ; }
-      
-      static Promote toPromote(Type v) { return v; }
-      static RealPromote toRealPromote(Type v) { return v; }
-      static Type fromPromote(Promote v) { return v; }
-      static Type fromRealPromote(RealPromote v) { return v; }
-  };
-  
-  template < typename real_type >
-  struct NumericTraits < Vc::Vector < real_type > >
-  {
-      typedef Vc::Vector < real_type > Type;
-      typedef Type Promote;
-      typedef Type UnsignedPromote;
-      typedef Type RealPromote;
-      typedef std::complex<RealPromote> ComplexPromote;
-      typedef Type ValueType;
-      
-      typedef VigraFalseType isIntegral;
-      typedef VigraFalseType isScalar;
-      typedef VigraFalseType isSigned;
-      typedef VigraFalseType isOrdered;
-      typedef VigraFalseType isComplex;
-      
-      static Type zero() { return Type::Zero() ; }
-      static Type one() { return Type::One() ; }
-      static Type nonZero() { return Type::One() ; }
-      
-      static Promote toPromote(Type v) { return v; }
-      static RealPromote toRealPromote(Type v) { return v; }
-      static Type fromPromote(Promote v) { return v; }
-      static Type fromRealPromote(RealPromote v) { return v; }
-  };
-  
-  // note that for now, we limit definition of PromoteTraits
-  // to homogeneous operations. I tried to define the promotion
-  // traits for float and double but failed, because I could not
-  // figure out what vigra means by typeToSize() used in
-  // metaprogramming.h
-
-  template < typename real_type , int N >
-  struct PromoteTraits < Vc::SimdArray < real_type , N > ,
-                         Vc::SimdArray < real_type , N > >
-  {
-      typedef typename Vc::SimdArray < real_type , N > Promote;
-      typedef typename Vc::SimdArray < real_type , N > toPromote;
-  };
-
-} ;
-
-#endif // USE_VC
-
 #endif // VSPLINE_COMMON
diff --git a/debian/changelog b/debian/changelog
index c78bb37..1ae2c67 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,5 +1,5 @@
-vspline (0.1.2-1) UNRELEASED; urgency=low
+vspline (0:0.2.0-1) UNRELEASED; urgency=low
 
   * intended initial upload to alioth
   
- -- Kay F. Jahnke <kfjahnke at gmail.com>  Wed, 28 Jun 2017 12:00:00 +0200
+ -- Kay F. Jahnke <kfjahnke at gmail.com>  Fri, 08 Sep 2017 11:30:00 +0200
diff --git a/debian/control b/debian/control
index 54fe8af..32b9830 100644
--- a/debian/control
+++ b/debian/control
@@ -2,17 +2,18 @@ Source: vspline
 Maintainer: Debian Science Maintainers <debian-science-maintainers at lists.alioth.debian.org>
 Uploaders: Kay F. Jahnke <kfjahnke at gmail.com>
 Section: math
-Priority: extra
+Priority: optional
 Build-Depends: debhelper (>= 10)
 Build-Depends-Indep: libvigraimpex-dev
-Standards-Version: 3.9.8
-Vcs-Browser: https://anonscm.debian.org/git/debian-science/packages/vspline.git
+Standards-Version: 4.1.0
+Vcs-Browser: https://anonscm.debian.org/cgit/debian-science/packages/vspline.git
 Vcs-Git: https://anonscm.debian.org/git/debian-science/packages/vspline.git
 Homepage: https://bitbucket.org/kfj/vspline
 
 Package: vspline-dev
 Architecture: all
-Depends: libvigraimpex-dev
+Depends: libvigraimpex-dev,
+         ${misc:Depends}
 Suggests: clang,
           vc-dev
 Description: header-only C++ template library for uniform b-spline processing
diff --git a/debian/debhelper-build-stamp b/debian/debhelper-build-stamp
deleted file mode 100644
index 9508a41..0000000
--- a/debian/debhelper-build-stamp
+++ /dev/null
@@ -1 +0,0 @@
-vspline-dev
diff --git a/debian/files b/debian/files
index 0aa7637..ca3aa74 100644
--- a/debian/files
+++ b/debian/files
@@ -1 +1 @@
-vspline-dev_0.1.2-1_all.deb math extra
+vspline-dev_0.2.0-1_all.deb math optional
diff --git a/debian/vspline-dev.examples b/debian/vspline-dev.examples
index d3ba779..06442d9 100644
--- a/debian/vspline-dev.examples
+++ b/debian/vspline-dev.examples
@@ -5,7 +5,6 @@ example/gradient.cc
 example/gsm2.cc
 example/gsm.cc
 example/impulse_response.cc
-example/pano_extract.cc
 example/roundtrip.cc
 example/slice2.cc
 example/slice3.cc
diff --git a/eval.h b/eval.h
index 9b9a57a..911a5c5 100644
--- a/eval.h
+++ b/eval.h
@@ -75,11 +75,11 @@
     It means that calling the functor will not have any effect on the functor itself - it
     can't change once it has been constructed. This has several nice effects: it can
     potentially be optimized very well, it is thread-safe, and it will play well with
-    functioanl programming concepts - and it's conceptionally appealing.
+    functional programming concepts - and it's conceptionally appealing.
     
     Code using class evaluator will probably use it at some core place where it is
     part of some processing pipeline. An example would be an image processing program:
-    one might have some outer loop generating arguments (typically SIMDIZED types)
+    one might have some outer loop generating arguments (typically simdized types)
     which are processed one after the other to yield a result. The processing will
     typically have several stages, like coordinate generation and transformations,
     then use class evaluator to pick an interpolated intermediate result, which is
@@ -433,7 +433,7 @@ struct average_weight_functor
 /// being bspline objects created with UNBRACED or MANUAL strategy). While the most
 /// general constructor will accept a MultiArrayView to coefficients (including the necessary
 /// 'brace'), this will rarely be used, and an evaluator will be constructed from a bspline
-/// object. In the most trivial case there are only two thing which need to be done:
+/// object. In the most trivial case there are only two things which need to be done:
 ///
 /// The specific type of evaluator has to be established by providing the relevant template
 /// arguments. Here, we need two types: the 'coordinate type' and the 'value type'.
@@ -446,9 +446,8 @@ struct average_weight_functor
 /// by vigra's ExpandElementResult mechanism should also work. When processing colour images,
 /// your value type would typically be a vigra::TinyVector < float , 3 >.
 ///
-/// Additionally, the bool template argument 'even_spline_order' has to be set. This is due to
-/// the fact that even splines (degree 0, 2, 4...) and odd splines (1, 3, 5...) are computed
-/// differently, and making the distinction at run-time would be possible, but less efficient.
+/// Additionally, the width of SIMD vectors to be used has to be established. This is not
+/// mandatory - if omitted, a default value is picked.
 ///
 /// With the evaluator's type established, an evaluator of this type can be constructed by
 /// passing a vspline::bspline object to the constructor. Naturally, the bspline object has
@@ -469,23 +468,32 @@ struct average_weight_functor
 /// code in the subclasses _eval and _v_eval takes care of performing the necessary operations
 /// recursively over the dimensions of the spline.
 ///
-/// The code in class evaluator begins with a sizeable constructor which sets up as much
+/// a vspline::evaluator is technically a vspline::unary_functor. This way, it can be directly
+/// used by constructs like vspline::chain and has a consistent interface which allows code
+/// using evaluators to query it's specifics. The ectual evaluation capabilites (eval() routines)
+/// are incorporated into the vspline::unary_functor as a policy - class evaluator_policy
+/// below. The final type, vspline::evaluator, is fixed by a using declaration right at the
+/// end of this file.
+///
+/// The code in evaluator_policy begins with a sizeable constructor which sets up as much
 /// as possible to support the evaluation code to do it's job as fast as possible. Next follows
 /// the unvectorized code, finally the vectorized code.
 ///
-/// There is a variety of overloads of class evaluator's eval() method available, because
-/// class evaluator inherits from vspline::unary_functor.
+/// There is a variety of overloads of class evaluator_policy's eval() method available, which
+/// become methods of vspline::evaluator via the policy incorporation. Some of these overloads
+/// go beyond the 'usual' eval routines, like evaluation with known weights, sidestepping the
+/// weight generation from coordinates.
 ///
 /// The evaluation strategy is to have all dependencies of the evaluation except for the actual
 /// coordinates taken care of by the constructor - and immutable for the evaluator's lifetime.
 /// The resulting object has no state which is modified after construction, making it thread-safe.
-/// It also constitutes a 'pure' function in a functional-programming sense, because it has
+/// It also constitutes a 'pure' functor in a functional-programming sense, because it has
 /// no mutable state and no side-effects, as can be seen by the fact that the eval methods
 /// are all marked const.
 ///
 /// The eval() overloads form a hierarchy, as evaluation progresses from accepting unsplit real
 /// coordinates to split coordinates and finally offsets and weights. This allows calling code to
-/// handle parts of the delegation hierarchy itself, only using class evaluator at a specific level.
+/// handle parts of the delegation hierarchy itself, only using an evaluator at a specific level.
 ///
 /// By providing the evaluation in this way, it becomes easy for calling code to integrate
 /// the evaluation into more complex functors. Consider, for example, code
@@ -503,7 +511,12 @@ struct average_weight_functor
 /// code which isn't needed for nearest neighbour interpolation (like the application of weights,
 /// which is futile under the circumstances, the weight always being 1.0).
 /// specialize can also be set to 1 for explicit n-linear interpolation. Any other value will
-/// result in the general-purpose code being used.
+/// result in the general-purpose code being used. Note that the specialization is passed in
+/// as a type, so instead of accepting a straight 'int' template argument,
+/// std::integral_constant<int,SPEC> is required, This is for technical reasons:
+/// vspline::unary_functor routes trailing template arguments to it's evaluation policy,
+/// which is coded via a variadic template argument list. Such a list cannot have a mixture
+/// of type and non-type template arguments, so I've gone for a type template argument list.
 ///
 /// Note that, contrary to my initial implementation, all forms of coordinate mapping were
 /// removed from class evaluator. The 'mapping' which is left is, more aptly, called
@@ -511,15 +524,10 @@ struct average_weight_functor
 /// Folding arbitrary coordinates into the spline's defined range now has to be done
 /// externally, typically by wrapping class evaluator together with some coordinate
 /// modification code into a combined vspline::unary_functor. map.h provides code for
-/// common mappings, see there.
-///
-/// class evaluator inherits from class unary_functor, which implements the broader concept.
-/// class evaluator has some methods to help with special cases, but apart from that it is
-/// a standard vspline::unary_functor. This inheritance gives us the full set of class
-/// unary_functor's methods, among them the convenient overloads of operator() which
-/// allow us to invoke class evaluator's evaluation with function call syntax.
+/// common mappings, see there. Arbitrary manipulation of incoming and outgoing data
+/// can be realized be using vspline::chain in unary_functor.h, see there.
 ///
-/// Note how the number of vector elements is fixed here by picking the number of ele_type
+/// Note how the default number of vector elements is fixed by picking the number of ele_type
 /// which vspline::vector_traits considers appropriate. There should rarely be a need to
 /// choose a different number of vector elements: evaluation will often be the most
 /// computationally intensive part of a processing chain, and therefore this choice is
@@ -531,6 +539,9 @@ struct average_weight_functor
 // of uf_types altogether, and this is the only other place where
 // it's used.
 
+// TODO: evaluator_policy uses a great many typedefs which is a bit over the top, even though
+// it was helpful in the development process.
+
 template < typename _coordinate_type , // nD real coordinate
            typename _value_type ,      // type of coefficient/result
 #ifdef USE_VC
@@ -567,7 +578,7 @@ public:
   // that using a const bool set at construction time performed just as well.
   // Since this makes using class evaluator easier, I have chosen to go this way.
 
-  const bool even_spline_order ;   // flag containing the 'evenness' of the spline
+  const bool even_spline_degree ;   // flag containing the 'evenness' of the spline
 
   enum { dimension = dim_in }  ;
   enum { level = dim_in - 1 }  ;
@@ -598,14 +609,8 @@ public:
 
   typedef typename MultiArrayView < 1 , ic_type > :: const_iterator offset_iterator ;
   
-  typedef vigra::MultiCoordinateIterator<dimension>                nd_offset_iterator ;
-  
-  typedef MultiArrayView < dimension + 1 , ele_type >              component_view_type ;
-  
-  typedef typename component_view_type::difference_type            expanded_shape_type ;
+  typedef vigra::MultiCoordinateIterator<dimension>                 nd_offset_iterator ;
   
-  typedef vigra::TinyVector < ele_type* , channels >               component_base_type ;
-
   typedef weight_functor_base < ele_type , rc_type , vsize > weight_functor_base_type ;
 
   /// in the context of b-spline calculation, this is the weight generating
@@ -632,14 +637,16 @@ public:
   
 private:
   
-  nd_weight_functor fweight ;       ///< set of pointers to weight functors, one per dimension
-  const view_type & coefficients ;   ///< b-spline coefficient array
-  const shape_type expanded_stride ;                 ///< strides in terms of expanded value_type
-  MultiArray < 1 , ic_type > offsets ;               ///< offsets in terms of value_type
-  MultiArray < 1 , ic_type > component_offsets ;     ///< offsets in terms of ele_type, for vc op
-  component_base_type component_base ;
-  component_view_type component_view ;
-  bspline_weight_functor_type wfd0 ; ///< default weight functor: underived bspline
+  nd_weight_functor fweight ;        ///< set of pointers to weight functors, one per dimension
+  const value_type * const cf_base ; ///< coefficient base adress in terms of value_type
+  const ele_type * const cf_ebase ;  ///< coefficient base adress in terms of ele_type
+  const shape_type cf_shape ;           ///< shape in terms of value_type
+  const shape_type cf_eshape ;          ///< shape in terms of ele_type
+  const shape_type cf_stride ;          ///< strides in terms of value_type
+  const shape_type cf_estride ;         ///< strides in terms of expanded value_type
+  MultiArray < 1 , ic_type > offsets ;  ///< offsets in terms of value_type
+  MultiArray < 1 , ic_type > eoffsets ; ///< offsets in terms of ele_type, for vc op
+  bspline_weight_functor_type wfd0 ;    ///< default weight functor: underived bspline
   const int spline_degree ;
   const int spline_order ;
   const int window_size ;
@@ -658,7 +665,7 @@ public:
 
   const shape_type & get_stride() const
   {
-    return coefficients.stride() ;
+    return cf_stride ;
   }
 
   /// this constructor is the most flexible variant and will ultimately be called by all other
@@ -667,16 +674,23 @@ public:
   /// - a vigra::MultiArrayView to (braced) coefficients
   /// - the degree of the spline (3 == cubic spline)
   /// - specification of the desired derivative of the spline, defaults to 0 (plain evaluation).
+  
+  // TODO now that evaluator_policy doesn't hold a MultiArrayView anymore, it's easy to
+  // provide a constructor which directly takes address, shape and stride of the coefficient
+  // array to approach an interface which does not use vigra types.
 
   evaluator_policy ( const view_type & _coefficients ,
               int _spline_degree ,
               derivative_spec_type _derivative )
-  : coefficients ( _coefficients ) ,
+  : cf_base ( _coefficients.data() ) ,
+    cf_ebase ( (ele_type*)(_coefficients.data()) ) ,
+    cf_shape ( _coefficients.shape() ) ,
+    cf_eshape ( channels * _coefficients.shape() ) ,
+    cf_stride ( _coefficients.stride() ) ,
+    cf_estride ( channels * _coefficients.stride() ) ,
     spline_degree ( _spline_degree ) ,
-    even_spline_order ( ! ( _spline_degree & 1 ) ) ,
+    even_spline_degree ( ! ( _spline_degree & 1 ) ) ,
     spline_order ( _spline_degree + 1 ) ,
-    component_view ( _coefficients.expandElements ( 0 ) ) ,
-    expanded_stride ( channels * _coefficients.stride() ) ,
     wfd0 ( _spline_degree , 0 ) ,
     window_size ( std::pow ( _spline_degree + 1 , int(dimension) ) )
   {
@@ -709,16 +723,16 @@ public:
     // also makes it easy to vectorize the code.
     
     offsets = MultiArray < 1 , ptrdiff_t > ( window_size ) ;
-    component_offsets = MultiArray < 1 , ptrdiff_t > ( window_size ) ;
+    eoffsets = MultiArray < 1 , ptrdiff_t > ( window_size ) ;
     
     // we fill the offset array in a simple fashion: we do a traversal of the window
     // and calculate the pointer difference for every element reached. We use the
     // same loop to code the corresponding offsets to elementary values (ele_type)
   
-    auto sample = coefficients.subarray ( shape_type() , shape_type(spline_order) ) ;
+    auto sample = _coefficients.subarray ( shape_type() , shape_type(spline_order) ) ;
     auto base = sample.data() ;
     auto target = offsets.begin() ;
-    auto component_target = component_offsets.begin() ;
+    auto component_target = eoffsets.begin() ;
 
     for ( auto &e : sample )
     {
@@ -727,17 +741,6 @@ public:
       ++target ;
       ++component_target ;
     }
-
-    // set up a set of base adresses for the component channels. This is needed
-    // for the vectorization, as the vector units can only work on elementary types
-    // (ele_type) and not on aggregates, like pixels.
-    
-    expanded_shape_type eshp ;
-    for ( int i = 0 ; i < channels ; i++ )
-    {
-      eshp[0] = i ;
-      component_base[i] = &(component_view[eshp]) ;
-    }
   } ;
 
   /// simplified constructors from a bspline object
@@ -747,11 +750,10 @@ public:
   /// individually chosen for each axis.
   
   evaluator_policy ( const bspline < value_type , dimension > & bspl ,
-              derivative_spec_type _derivative = derivative_spec_type ( 0 )
-            )
+                     derivative_spec_type _derivative = derivative_spec_type ( 0 ) )
   : evaluator_policy ( bspl.coeffs ,
-                bspl.spline_degree ,
-                _derivative )
+                       bspl.spline_degree ,
+                      _derivative )
   {
     if ( bspl.spline_degree > 1 && ! bspl.braced )
       throw not_supported ( "for spline degree > 1: evaluation needs braced coefficients" ) ;
@@ -797,7 +799,7 @@ public:
   /// split function. This function is used to split incoming real coordinates
   /// into an integral and a remainder part, which are used at the core of the
   /// evaluation. selection of even or odd splitting is done via the const bool
-  /// flag 'even_spline_order' My initial implementation had this flag as a
+  /// flag 'even_spline_degree' My initial implementation had this flag as a
   /// template argument, but this way it's more flexible and there seems to
   /// be no runtime penalty. This method delegates to the free function templates
   /// even_split and odd_split, respectively, which are defined above class evaluator.
@@ -805,7 +807,7 @@ public:
   template < class IT , class RT >
   void split ( const RT& input , IT& select , RT& tune ) const
   {
-    if ( even_spline_order )
+    if ( even_spline_degree )
       even_split < IT , RT , vsize > ( input , select , tune ) ;
     else
       odd_split < IT , RT , vsize > ( input , select , tune ) ;
@@ -833,7 +835,7 @@ public:
   template < int level , class dtype >
   struct _eval
   {
-    dtype operator() ( const dtype* & pdata ,
+    dtype operator() ( const dtype* const & pdata ,
                        offset_iterator & ofs ,
                        const MultiArrayView < 2 , ele_type > & weight
                      ) const
@@ -857,7 +859,7 @@ public:
   template < class dtype >
   struct _eval < 0 , dtype >
   {
-    dtype operator() ( const dtype* & pdata ,
+    dtype operator() ( const dtype* const & pdata ,
                        offset_iterator & ofs ,
                        const MultiArrayView < 2 , ele_type > & weight
                      ) const
@@ -879,7 +881,7 @@ public:
   template < int level , class dtype >
   struct _eval_linear
   {
-    dtype operator() ( const dtype* & pdata ,
+    dtype operator() ( const dtype* const & pdata ,
                        offset_iterator & ofs ,
                        const nd_rc_type& tune
                      ) const
@@ -897,7 +899,7 @@ public:
   template < class dtype >
   struct _eval_linear < 0 , dtype >
   {
-    dtype operator() ( const dtype* & pdata ,
+    dtype operator() ( const dtype* const & pdata ,
                        offset_iterator & ofs ,
                        const nd_rc_type& tune
                      ) const
@@ -912,6 +914,17 @@ public:
     
   // next are evaluation routines. there are quite a few, since I've coded for operation
   // from different starting points and both for vectorized and nonvectorized operation.
+  // Note how my code is, as ever, bottom-up, so that the highest-level access is below
+  // the code it delegates to.
+
+  // The code for vectorized and non-vectorized evaluation is often quite similar, in fact
+  // so much so that I did an experimental implementation of a set of generic evaluators
+  // which used the same code for vectorized and unverctorized operation. This implementation
+  // had to use generic memory access functions which resulted in nD values being loaded
+  // and stored in one go, rather than doing this channel-wise. I think that this made the
+  // code slower (at least when compiled with clang++) and the necessary collateral code
+  // to access memory was much more than what was saved in the evaluation code. So I
+  // abandoned this approach for the time being.
   
   /// evaluation variant which takes an offset and a set of weights. This is the final delegate,
   /// calling the recursive _eval method. Having the weights passed in via a const MultiArrayView &
@@ -924,7 +937,7 @@ public:
               const MultiArrayView < 2 , ele_type > & weight ,
               value_type & result ) const
   {
-    const value_type * base = coefficients.data() + select ;
+    const value_type * const base = cf_base + select ;
     // offsets reflects the positions inside the subarray:
     offset_iterator ofs = offsets.begin() ;
     result = _eval<level,value_type>() ( base , ofs , weight ) ;
@@ -940,7 +953,7 @@ public:
               const MultiArrayView < 2 , ele_type > & weight ,
               value_type & result ) const
   {
-    eval ( sum ( select * coefficients.stride() ) , weight , result ) ;
+    eval ( sum ( select * cf_stride ) , weight , result ) ;
   }
 
   /// evaluation variant taking the components of a split coordinate, first the shape_type
@@ -956,33 +969,44 @@ public:
   
   void eval ( const nd_ic_type& select ,
               const nd_rc_type& tune ,
-              value_type & result ) const
+              value_type & result ,
+              std::integral_constant < int , 0 >
+            ) const
   {
-    if ( specialize::value == 0 )
-    {
-      // nearest neighbour. simply pick the coefficient
-      result = coefficients [ select ] ;
-    }
-    else if ( specialize::value == 1 )
-    {
-      // linear interpolation. use specialized _eval_linear object
-      const value_type * base =   coefficients.data()
-                                + sum ( select * coefficients.stride() ) ;
-      offset_iterator ofs = offsets.begin() ;  // offsets reflects the positions inside the subarray
-      result = _eval_linear<level,value_type>() ( base , ofs , tune ) ;
-    }
-    else
-    {
-      // general case. this works for degree 0 and 1 as well, but is less efficient
+    // nearest neighbour. simply pick the coefficient
+    result = cf_base [ sum ( select * cf_stride ) ] ;
+  }
+  
+  void eval ( const nd_ic_type& select ,
+              const nd_rc_type& tune ,
+              value_type & result ,
+              std::integral_constant < int , 1 >
+            ) const
+  {
+    // linear interpolation. use specialized _eval_linear object
+    const value_type * base =   cf_base
+                              + sum ( select * cf_stride ) ;
+    // offsets reflects the positions inside the subarray
+    offset_iterator ofs = offsets.begin() ;
+    result = _eval_linear<level,value_type>() ( base , ofs , tune ) ;
+  }
+  
+  template < int arbitrary_spline_degree >
+  void eval ( const nd_ic_type& select ,
+              const nd_rc_type& tune ,
+              value_type & result ,
+              std::integral_constant < int , arbitrary_spline_degree >
+            ) const
+  {
+    // general case. this works for degree 0 and 1 as well, but is less efficient
 
-      MultiArray < 2 , ele_type > weight ( Shape2 ( spline_order , dimension ) ) ;
-      
-      // now we call obtain_weights, which will fill in 'weight' with weights for the
-      // given set of fractional coordinate parts in 'tune'
+    MultiArray < 2 , ele_type > weight ( Shape2 ( spline_order , dimension ) ) ;
+    
+    // now we call obtain_weights, which will fill in 'weight' with weights for the
+    // given set of fractional coordinate parts in 'tune'
 
-      obtain_weights ( weight , tune ) ;
-      eval ( select , weight , result ) ;   // delegate
-    }
+    obtain_weights ( weight , tune ) ;
+    eval ( select , weight , result ) ;   // delegate
   }
 
   /// this eval variant take an unsplit coordinate and splits it into an integral
@@ -1002,7 +1026,7 @@ public:
     // we now have the split coordinate in 'select'
     // and 'tune', and we delegate to the next level:
 
-    eval ( select , tune , result ) ;
+    eval ( select , tune , result , specialize() ) ;
   }
   
   /// variant taking a plain rc_type for a coordinate. only for 1D splines!
@@ -1127,7 +1151,7 @@ public:
                    value_type & result ) const
   {
     // get a pointer to the coefficient window's beginning
-    const value_type * base = coefficients.data() + select ;
+    const value_type * base = cf_base + select ;
     // prepare space for the multiplied-out weights
     ele_type flat_weight [ window_size ] ;
     // we need an instance of a pointer to these weights here, passing it in
@@ -1141,7 +1165,7 @@ public:
 
 #ifdef USE_VC
 
-  /// vectorized version of _eval. This works just about the same way as _eval, with the
+  /// vectorized evaluation code. This works the same way as the unvectorized code, with the
   /// only difference being the innner loop over the channels, which is necessary because
   /// in the vector code we can't code for vectors of, say, pixels, but only for vectors of
   /// elementary types, like float.
@@ -1153,7 +1177,7 @@ public:
   /// note that the vectorized routine couldn't function like this if it were to
   /// evaluate unbraced splines: it relies on the bracing and can't do without it, because
   /// it works with a fixed sequence of offsets, whereas the evaluation of an unbraced spline
-  /// would use a different offset sequence for values affected by the boundary condition.
+  /// could use a different offset sequence for values affected by the boundary condition.
 
   typedef typename base_type::out_ele_v ele_v ;
   typedef typename base_type::out_v mc_ele_v ;
@@ -1163,17 +1187,17 @@ public:
   template < class dtype , int level >
   struct _v_eval
   {
-    dtype operator() ( const component_base_type& base , ///< base adresses of components
+    dtype operator() ( const ele_type * const & ebase ,  ///< base adresses of components
                        const ic_v& origin ,              ///< offsets to evaluation window origins
                        offset_iterator & ofs ,           ///< offsets to coefficients inside this window
                        const MultiArrayView < 2 , ele_v > & weight ) const ///< weights to apply
     {
-      dtype sum = dtype() ;    ///< to accumulate the result
-      dtype subsum ; ///< to pick up the result of the recursive call
+      dtype sum = dtype() ; ///< to accumulate the result
+      dtype subsum ;        ///< to pick up the result of the recursive call
 
       for ( int i = 0 ; i < weight.shape ( 0 ) ; i++ )
       {
-        subsum = _v_eval < dtype , level - 1 >() ( base , origin , ofs , weight );
+        subsum = _v_eval < dtype , level - 1 >() ( ebase , origin , ofs , weight );
         for ( int ch = 0 ; ch < channels ; ch++ )
         {
           sum[ch] += weight [ Shape2 ( i , level ) ] * subsum[ch] ;
@@ -1188,7 +1212,7 @@ public:
   template < class dtype >
   struct _v_eval < dtype , 0 >
   {
-    dtype operator() ( const component_base_type& base , ///< base adresses of components
+    dtype operator() ( const ele_type * const & ebase ,  ///< base adresses of components
                        const ic_v& origin ,              ///< offsets to evaluation window origins
                        offset_iterator & ofs ,           ///< offsets to coefficients in this window
                        const MultiArrayView < 2 , ele_v > & weight ) const ///< weights to apply
@@ -1199,7 +1223,7 @@ public:
       {
         for ( int ch = 0 ; ch < channels ; ch++ )
         {
-          sum[ch] += weight [ Shape2 ( i , 0 ) ] * ele_v ( base[ch] , origin + *ofs ) ;
+          sum[ch] += weight [ Shape2 ( i , 0 ) ] * ele_v ( ebase + ch , origin + *ofs ) ;
         }
         ++ofs ;
       }
@@ -1214,7 +1238,7 @@ public:
   template < class dtype , int level >
   struct _v_eval_linear
   {
-    dtype operator() ( const component_base_type& base , // base adresses of components
+    dtype operator() ( const ele_type * const & ebase ,  ///< base adresses of components
                        const ic_v& origin ,        // offsets to evaluation window origins
                        offset_iterator & ofs ,     // offsets to coefficients inside this window
                        const in_v& tune ) const       // weights to apply
@@ -1222,13 +1246,13 @@ public:
       dtype sum ;    ///< to accumulate the result
       dtype subsum ;
 
-      sum = _v_eval_linear < dtype , level - 1 >() ( base , origin , ofs , tune ) ;
+      sum = _v_eval_linear < dtype , level - 1 >() ( ebase , origin , ofs , tune ) ;
       for ( int ch = 0 ; ch < channels ; ch++ )
       {
         sum[ch] *= ( rc_type ( 1.0 ) - tune [ level ] ) ;
       }
       
-      subsum = _v_eval_linear < dtype , level - 1 >() ( base , origin , ofs , tune );
+      subsum = _v_eval_linear < dtype , level - 1 >() ( ebase , origin , ofs , tune );
       for ( int ch = 0 ; ch < channels ; ch++ )
       {
         sum[ch] += ( tune [ level ] ) * subsum[ch] ;
@@ -1243,7 +1267,7 @@ public:
   template < class dtype >
   struct _v_eval_linear < dtype , 0 >
   {
-    dtype operator() ( const component_base_type& base , // base adresses of components
+    dtype operator() ( const ele_type * const & ebase ,  ///< base adresses of components
                        const ic_v& origin ,              // offsets to evaluation window origins
                        offset_iterator & ofs ,           // offsets to coefficients in this window
                        const in_v& tune ) const       // weights to apply
@@ -1257,10 +1281,10 @@ public:
       for ( int ch = 0 ; ch < channels ; ch++ )
       {
         sum[ch] = ( rc_type ( 1.0 ) - tune [ 0 ] )
-                  * ele_v ( base[ch] , origin + o1 ) ;
+                  * ele_v ( ebase + ch , origin + o1 ) ;
                   
         sum[ch] += tune [ 0 ]
-                   * ele_v ( base[ch] , origin + o2 ) ; 
+                   * ele_v ( ebase + ch , origin + o2 ) ; 
       }
       
       
@@ -1283,11 +1307,11 @@ public:
     // we need an instance of this iterator because it's passed into _v_eval by reference
     // and manipulated by the code there:
     
-    offset_iterator ofs = component_offsets.begin() ;
+    offset_iterator ofs = eoffsets.begin() ;
     
     // now we can call the recursive _v_eval routine yielding the result
     
-    result = _v_eval < out_v , level >() ( component_base , select , ofs , weight ) ;
+    result = _v_eval < out_v , level >() ( cf_ebase , select , ofs , weight ) ;
   }
 
   /// cdeval starts from a set of offsets to coefficient windows, so here
@@ -1309,30 +1333,30 @@ public:
   /// for nearest-neighbour interpolation, which doesn't delegate further, since the
   /// result can be obtained directly by gathering from the coefficients:
   
-  void cdeval ( const ic_v& select ,  // offsets to coefficient windows
-                const in_v& tune ,    // fractional parts of the coordinates
-                out_v & result ,      // target
-                std::integral_constant < int , 0 > ) const
+  void eval ( const ic_v& select ,  // offsets to coefficient windows
+              const in_v& tune ,    // fractional parts of the coordinates
+              out_v & result ,      // target
+              std::integral_constant < int , 0 > ) const
   {
     // nearest neighbour. here, no weights are needed and we can instantly
     // gather the data from the coefficient array.
 
     for ( int ch = 0 ; ch < channels ; ch++ )
-      result[ch].gather ( component_base[ch] , select ) ;
+      result[ch].gather ( cf_ebase + ch , select ) ;
   }
   
   /// linear interpolation. this uses a specialized _v_eval object which
   /// directly processes 'tune' instead of creating an array of weights first.
 
-  void cdeval ( const ic_v& select ,  // offsets to coefficient windows
-                const in_v& tune ,    // fractional parts of the coordinates
-                out_v & result ,      // target
-                std::integral_constant < int , 1 > ) const
+  void eval ( const ic_v& select ,  // offsets to coefficient windows
+              const in_v& tune ,    // fractional parts of the coordinates
+              out_v & result ,      // target
+              std::integral_constant < int , 1 > ) const
   {
-    offset_iterator ofs = component_offsets.begin() ;
+    offset_iterator ofs = eoffsets.begin() ;
   
     result = _v_eval_linear < out_v , level >()
-              ( component_base , select , ofs , tune ) ;
+              ( cf_ebase , select , ofs , tune ) ;
    }
   
   /// finally, the general uniform b-spline evaluation.
@@ -1340,11 +1364,11 @@ public:
   /// in the use of this general-purpose code, which can also handle degree 0 and 1
   /// splines, albeit less efficiently.
 
-  template < int anything >
-  void cdeval ( const ic_v& select ,  // offsets to coefficient windows
-                const in_v& tune ,    // fractional parts of the coordinates
-                out_v & result ,      // target
-                std::integral_constant < int , anything > ) const
+  template < int arbitrary_spline_degree >
+  void eval ( const ic_v& select ,  // offsets to coefficient windows
+              const in_v& tune ,    // fractional parts of the coordinates
+              out_v & result ,      // target
+              std::integral_constant < int , arbitrary_spline_degree > ) const
   {
     MultiArray < 2 , ele_v > weight ( Shape2 ( spline_order , dimension ) ) ;
 
@@ -1373,12 +1397,12 @@ public:
               out_v & result ) const
   {
     // condense the nD index into an offset
-    ic_v origin = select[0] * ic_type ( expanded_stride [ 0 ] ) ;
+    ic_v origin = select[0] * ic_type ( cf_estride [ 0 ] ) ;
     for ( int d = 1 ; d < dimension ; d++ )
-      origin += select[d] * ic_type ( expanded_stride [ d ] ) ;
+      origin += select[d] * ic_type ( cf_estride [ d ] ) ;
     
     // pass on to overload taking the offset, dispatching on 'specialize'
-    cdeval ( origin , tune , result , specialize() ) ;
+    eval ( origin , tune , result , specialize() ) ;
   }
 
   /// This variant of eval() works directly on vector data (of unsplit coordinates)
@@ -1409,6 +1433,9 @@ public:
     eval ( select , tune , result ) ;
   }
 
+  /// To make the evaluator more versatile, we provide overloads taking simdized elementary
+  /// type as well, where unary_functor would expect TinyVectors < T , 1 >.
+  
   void eval ( const in_v & input ,        // number of dimensions * coordinate vectors
               out_ele_v & result )  const // single value vector
   {
@@ -1457,12 +1484,13 @@ public:
 } ;
 
 /// the definition for vspline::evaluator incorporates the policy class
-/// into a vspline::unary_functor:
+/// into a vspline::unary_functor. this is quite a mouthful, so we use a
+/// using declaration:
 
 template < typename _coordinate_type , // nD real coordinate
            typename _value_type ,      // type of coefficient/result
 #ifdef USE_VC
-             // nr. of vector elements
+           // nr. of vector elements with sensible default
            int _vsize = vspline::vector_traits < _value_type > :: size ,
 #else
            int _vsize = 1 ,
diff --git a/example/eval.cc b/example/eval.cc
index ca6b11b..963faa4 100644
--- a/example/eval.cc
+++ b/example/eval.cc
@@ -36,6 +36,11 @@
 /// The output shows how the coordinate is split into integral and real
 /// part by the mapping used for the specified boundary condition
 /// and the result of evaluating the spline at this point.
+/// Note how the coordinate is not checked for validity. when accessing
+/// the spline outside the defined range, the coefficient array will
+/// receive an out-of-bounds access, which may result in a seg fault.
+/// To see how out-of-range incoming coordinates can be handled to avoid
+/// out-of-bounds access to the coefficients, see use_map.cc
 ///
 /// compile: clang++ -std=c++11 -o eval -pthread eval.cc
 
@@ -85,45 +90,6 @@ int main ( int argc , char * argv[] )
   // put the BC code into a TinyVector
   TinyVector < bc_code , 1 > bcv ( bc ) ;
 
-//   int mci = -1 ;
-//   map_code mc ;
-//   
-//   while ( mci < 1 || mci > 6 )
-//   {
-//     cout << "choose mapping mode" << endl ;
-//     cout << "1) MAP_MIRROR" << endl ;
-//     cout << "2) MAP_PERIODIC" << endl ;
-//     cout << "3) MAP_REFLECT" << endl ;
-//     cout << "4) MAP_LIMIT" << endl ;
-//     cout << "5) MAP_REJECT" << endl ;
-//     cout << "6) MAP_RAW" << endl ;
-//     cin >> mci ;
-//   }
-//   
-//   switch ( mci )
-//   {
-//     case 1 :
-//       mc = MAP_MIRROR ;
-//       break ;
-//     case 2 :
-//       mc = MAP_PERIODIC ;
-//       break ;
-//     case 3 :
-//       mc = MAP_REFLECT ;
-//       break ;
-//     case 4 :
-//       mc = MAP_LIMIT ;
-//       break ;
-//     case 5 :
-//       mc = MAP_REJECT ;
-//       break ;
-//     case 6 :
-//       mc = MAP_RAW ;
-//       break ;
-//   }
-//   // put the mapping code into a TinyVector
-//   TinyVector < map_code , 1 > mcv ( mc ) ;
-
   TinyVector < int , 1 > deriv_spec ( 0 ) ;
   // obtain knot point values
 
@@ -157,8 +123,8 @@ int main ( int argc , char * argv[] )
 
   // fix the type for the evaluator and create it
   typedef evaluator < double , double > eval_type ;
-  eval_type ev ( bsp , deriv_spec ) ; // , mcv ) ;
-//   auto map = ev.get_mapping() ;
+  eval_type ev ( bsp , deriv_spec ) ;
+
   int ic ;
   double rc ;
 
@@ -169,7 +135,7 @@ int main ( int argc , char * argv[] )
     cin >> v ;
     // evaluate it
     double res = ev ( v ) ;
-    // apply the mapping to the coordinate to output that as well
+    // apply the split to the coordinate to output that as well
     ev.split ( v , ic , rc ) ;
 
     cout << v << " -> ( " << ic << " , " << rc << " ) -> " << res << endl ;
diff --git a/example/gradient.cc b/example/gradient.cc
index e7ef513..2e0c6c7 100644
--- a/example/gradient.cc
+++ b/example/gradient.cc
@@ -66,8 +66,8 @@ int main ( int argc , char * argv[] )
                      3 ,                             // cubic b-spline
                      bcv_type ( vspline::NATURAL ) , // natural boundary conditions
                      vspline::BRACED ,               // implicit scheme, bracing coeffs
-                     -1 ,                            // default, not using EXPLICIT
-                     0.0 ) ;                         // tolerance 0.0 for this test!
+                     -1 ,                            // horizon (unused w. BRACED)
+                     0.0 ) ;                         // no tolerance
 
   // get a view to the bspline's core, to fill it with data
 
diff --git a/example/gsm.cc b/example/gsm.cc
index b237130..07802f7 100644
--- a/example/gsm.cc
+++ b/example/gsm.cc
@@ -39,7 +39,7 @@
 /// compile with:
 /// clang++ -std=c++11 -march=native -o gsm -O3 -pthread -DUSE_VC gsm.cc -lvigraimpex -lVc
 ///
-/// invoke passing an image file. the result will be written to 'gsm.tif'
+/// invoke passing a colour image file. the result will be written to 'gsm.tif'
 
 #include <vspline/vspline.h>
 
@@ -70,6 +70,12 @@ typedef vspline::evaluator < coordinate_type , // incoming coordinate's type
 
 int main ( int argc , char * argv[] )
 {
+  if ( argc < 2 )
+  {
+    std::cerr << "pass a colour image file as argument" << std::endl ;
+    exit( -1 ) ;
+  }
+  
   vigra::ImageImportInfo imageInfo ( argv[1] ) ;
 
   // we want a b-spline with natural boundary conditions
@@ -110,7 +116,7 @@ int main ( int argc , char * argv[] )
   for ( auto it = start ; it < end ; ++it )
   {
     // we fetch the discrete coordinate from the coupled iterator
-    // and instantiate a coordinate_type from it. Note that we can't pass
+    // and create a coordinate_type from it. Note that we can't pass
     // the discrete coordinate directly to the evaluator's operator()
     // because this fails to be disambiguated.
     
diff --git a/example/gsm2.cc b/example/gsm2.cc
index 7f43107..543dcae 100644
--- a/example/gsm2.cc
+++ b/example/gsm2.cc
@@ -81,6 +81,9 @@ typedef vspline::evaluator < coordinate_type , // incoming coordinate's type
 /// - if the vector code is identical to the unvectorized code, implement
 ///   eval() with a template                      
 
+// we start out by coding the evaluation policy for the functor.
+// this class does the actual computations:
+
 template < typename coordinate_type ,
            typename pixel_type ,
            int vsize >
@@ -117,20 +120,28 @@ struct gsm_policy
     auto dx = xev ( c ) ; // get the gradient in x direction
     auto dy = yev ( c ) ; // get the gradient in y direction
     
-    // TODO: really, we'd like to write:
+    // really, we'd like to write:
     // result = dx * dx + dy * dy ;
-    // but fail due to problems with type inference, so we need to be
-    // a bit more explicit:
+    // but fail due to problems with type inference: for vectorized
+    // types, both dx and dy are vigra::TinyVectors of two simdized
+    // values, and multiplying two such objects is not defined.
+    // It's possible to activate code performing such operations by
+    // defining the relevant traits in namespace vigra, but for this
+    // example we'll stick with using multiply-assigns, which are defined.
     
     dx *= dx ;            // square the gradients
     dy *= dy ;
-    dx += dy ;
+    dx += dy ;            // add them up
     
     result = dx ;         // assign to result
   } 
   
 } ;
 
+// here we create the vspline::unary_functor incorporating the evaluation
+// policy above. the resulting type, ev_gsm, is the functor we want to
+// use on the source image to produce the gradient squared magnitude
+
 typedef vspline::unary_functor < coordinate_type ,
                                  pixel_type ,
                                  VSIZE ,
@@ -138,6 +149,11 @@ typedef vspline::unary_functor < coordinate_type ,
                                  
 int main ( int argc , char * argv[] )
 {
+  if ( argc < 2 )
+  {
+    std::cerr << "pass a colour image file as argument" << std::endl ;
+    exit( -1 ) ;
+  }
   // get the image file name
   
   vigra::ImageImportInfo imageInfo ( argv[1] ) ;
@@ -155,10 +171,11 @@ int main ( int argc , char * argv[] )
   
   // load the image data into the b-spline's core. This is a common idiom:
   // the spline's 'core' is a MultiArrayView to that part of the spline's
-  // data container which is meant to hold the input data. This saves loading
-  // the image to some memory first and then transferring the data into
-  // the spline. Since the core is a vigra::MultiarrayView, we can pass it
-  // to importImage as the desired target for loading the image from disk.
+  // data container which precisely corresponds with the input data.
+  // This saves loading the image to some memory first and then transferring
+  // the data into the spline. Since the core is a vigra::MultiarrayView,
+  // we can pass it to importImage as the desired target for loading the
+  // image from disk.
                    
   vigra::importImage ( imageInfo , bspl.core ) ;
   
@@ -166,7 +183,7 @@ int main ( int argc , char * argv[] )
 
   bspl.prefilter() ;
   
-  // now we can construct the gsm evaluator
+  // now we construct the gsm evaluator
   
   ev_gsm ev ( bspl ) ;
   
@@ -175,13 +192,28 @@ int main ( int argc , char * argv[] )
   target_type target ( imageInfo.shape() ) ;
 
   // now we obtain the result by performing an index_remap. index_remap
-  // successively passes discrete indices into the target to the evaluator
+  // successively passes discrete coordinates into the target to the evaluator
   // it's invoked with, storing the result of the evaluator's evaluation
   // at the self-same coordinates. This is done multithreaded and vectorized
   // automatically, so it's very convenient, if the evaluator is at hand.
   // So here we have invested moderately more coding effort in the evaluator
   // and are rewarded with being able to use the evaluator with vspline's
-  // high-level code for a very fast implementation of our gsm problem.
+  // high-level code for a fast implementation of our gsm problem.
+  // The difference is quite noticeable. On my system, processing a full HD
+  // image, I get:
+  
+  //   $ time ./gsm image.jpg
+  //
+  // real    0m0.838s
+  // user    0m0.824s
+  // sys     0m0.012s
+  //
+  //   $ time ./gsm2 image.jpg
+  // 
+  // real    0m0.339s
+  // user    0m0.460s
+  // sys     0m0.016s
+
   
   vspline::index_remap < ev_gsm > ( ev , target ) ;
 
diff --git a/example/roundtrip.cc b/example/roundtrip.cc
index 137b969..3ffc083 100644
--- a/example/roundtrip.cc
+++ b/example/roundtrip.cc
@@ -145,7 +145,7 @@ void run_test ( view_type & data ,
        << " ms" << endl ;
 #endif
   
-  // do it again, data above are useless after 10 times filtering
+  // do it again, data above are useless after TIMES times filtering
   bsp.core = data ;
   bsp.prefilter() ;
 
@@ -158,38 +158,9 @@ void run_test ( view_type & data ,
   // set the evaluator type
   typedef vspline::evaluator<coordinate_type,pixel_type> eval_type ;
 
-  // create the evaluator. We create an evaluator using a mapping which corresponds
-  // to the boundary conditions we're using, instead of the default 'REJECT' mapping.
-  // For natural boundary conditions, there is no corresponding mapping and we use
-  // MAP_LIMIT instead.
-  
-// KFJ 2017-08-10 no more mapping in class evaluator!
-//   vspline::map_code mc ;
-//   switch ( bc )
-//   {
-//     case vspline::MIRROR:
-//       mc = vspline::MAP_MIRROR ;
-//       break ;
-//     case vspline::NATURAL:
-//       mc = vspline::MAP_LIMIT ;
-//       break ;
-//     case vspline::REFLECT:
-//       mc = vspline::MAP_REFLECT ;
-//       break ;
-//     case vspline::PERIODIC:
-//       mc = vspline::MAP_PERIODIC ;
-//       break ;
-//     default:
-//       mc = vspline::MAP_REJECT ;
-//       break ;
-//   }
-  
   // create the evaluator for the b-spline, using plain evaluation (no derivatives)
-  // and the same mapping mode for both axes:
   
   eval_type ev ( bsp ) ;           // spline
-//                  { 0 , 0 } ,     // (no) derivatives
-//                  { mc , mc } ) ; // mapping code as per switch above
 
   // type for coordinate array
   typedef vigra::MultiArray<2, coordinate_type> coordinate_array ;
@@ -235,7 +206,7 @@ void run_test ( view_type & data ,
   
 #ifdef PRINT_ELAPSED
   end = std::chrono::system_clock::now();
-  cout << "avg " << TIMES << " x remap from unsplit coordinates:... "
+  cout << "avg " << TIMES << " x remap with ready-made bspline:.... "
        << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
        << " ms" << endl ;
 #endif
@@ -387,7 +358,6 @@ int main ( int argc , char * argv[] )
   process_image<double,float> ( argv[1] ) ;
   
   cout << "reached end" << std::endl ;
-  // oops... we hang here, failing to terminate
   
   exit ( 0 ) ;
 }
diff --git a/example/slice.cc b/example/slice.cc
index 4502330..f523ba0 100644
--- a/example/slice.cc
+++ b/example/slice.cc
@@ -32,7 +32,7 @@
 /// slice.cc
 ///
 /// build a 3D volume from samples of the RGB colour space
-/// build a spline over it and extract a 2D slice
+/// build a spline over it and extract a 2D slice, using vspline::remap()
 ///
 /// compile with:
 /// clang++ -std=c++11 -march=native -o slice -O3 -pthread -DUSE_VC=1 slice.cc -lvigraimpex -lVc
@@ -49,14 +49,14 @@ int main ( int argc , char * argv[] )
   // pixel_type is the result type, an RGB float pixel
   typedef vigra::TinyVector < float , 3 > pixel_type ;
   
-  // voxel_type is the source data type
+  // voxel_type is the source data type - the same as pixel_type
   typedef vigra::TinyVector < float , 3 > voxel_type ;
   
-  // coordinate_type has a 3D coordinate
-  typedef vigra::TinyVector < float , 3 > coordinate_type ;
+  // coordinate_3d has a 3D coordinate
+  typedef vigra::TinyVector < float , 3 > coordinate_3d ;
   
   // warp_type is a 2D array of coordinates
-  typedef vigra::MultiArray < 2 , coordinate_type > warp_type ;
+  typedef vigra::MultiArray < 2 , coordinate_3d > warp_type ;
   
   // target_type is a 2D array of pixels  
   typedef vigra::MultiArray < 2 , pixel_type > target_type ;
@@ -87,7 +87,7 @@ int main ( int argc , char * argv[] )
   space.prefilter() ;
   
   // get an evaluator for the b-spline
-  typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ;
+  typedef vspline::evaluator < coordinate_3d , voxel_type > ev_type ;
   ev_type ev ( space ) ;
   
   // now make a warp array with 1920X1080 3D coordinates
@@ -101,7 +101,7 @@ int main ( int argc , char * argv[] )
   {
     for ( int x = 0 ; x < 1920 ; x++ )
     {
-      coordinate_type & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
+      coordinate_3d & c ( warp [ vigra::Shape2 ( x , y ) ] ) ;
       c[0] = float ( x ) / 192.0 ;
       c[1] = 10.0 - c[0] ;
       c[2] = float ( y ) / 108.0 ;
@@ -123,5 +123,6 @@ int main ( int argc , char * argv[] )
                       .setCompression("100")
                       .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ;
   
+  std::cout << "result was written to slice.tif" << std::endl ;
   exit ( 0 ) ;
 }
diff --git a/example/slice2.cc b/example/slice2.cc
index 7a46432..b7c1a16 100644
--- a/example/slice2.cc
+++ b/example/slice2.cc
@@ -32,7 +32,7 @@
 /// slice.cc
 ///
 /// build a 3D volume from samples of the RGB colour space
-/// build a spline over it and extract a 2D slice
+/// build a spline over it and extract a 2D slice, using vspline::remap()
 ///
 /// while the result is just about the same as the one we get from slice.cc,
 /// here our volume contains double-precision voxels. Since the evaluator over
@@ -86,16 +86,6 @@ struct downcast_policy
 {
 public:
 
-  // we want to access facilites of the base class (vspline::uf_types<...>)
-  // so we use a typedef for the base class.
-
-  typedef vspline::uf_types
-    < coordinate_type , pixel_type , ev_type::vsize > base_type ;
-
-  // pull in standard evaluation type system with this macro:
-
-  using_unary_functor_types ( base_type ) ;
-  
   // we keep a const reference to the wrappee
   const ev_type & inner ;
   
@@ -104,8 +94,9 @@ public:
   : inner ( _inner )
   { } ;
   
+  // here we define the method template doing the type conversion
   template < class IN ,
-             class OUT = typename base_type::template out_type_of<IN> >
+             class OUT >
   void eval ( const IN & c ,
                     OUT & result ) const
   {
@@ -188,5 +179,6 @@ int main ( int argc , char * argv[] )
                       .setCompression("100")
                       .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ;
   
+  std::cout << "result was written to slice.tif" << std::endl ;
   exit ( 0 ) ;
 }
diff --git a/example/slice3.cc b/example/slice3.cc
index 8080756..48c792d 100644
--- a/example/slice3.cc
+++ b/example/slice3.cc
@@ -143,5 +143,6 @@ int main ( int argc , char * argv[] )
                       .setCompression("100")
                       .setForcedRangeMapping ( 0 , 255 , 0 , 255 ) ) ;
   
+  std::cout << "result was written to slice.tif" << std::endl ;
   exit ( 0 ) ;
 }
diff --git a/example/splinus.cc b/example/splinus.cc
index b84abf2..aca4c45 100644
--- a/example/splinus.cc
+++ b/example/splinus.cc
@@ -42,11 +42,19 @@
 /// degree, the closer the match with the sine. So apart from serving as a
 /// very simple demonstration of using a 1D periodic b-spline, it teaches us
 /// that a periodic b-spline can approximate a sine.
+/// To show off, we use long double as the spline's data type - but we can
+/// only do so without using Vc, since long doubles can't be vectorized.
 ///
 /// compile with: clang++ -pthread -O3 -std=c++11 splinus.cc -o splinus
 
+// if USE_VC was defined for compiling this example, we undefine it, since
+// long double can't be vectorized.
+
+#undef USE_VC
+
 #include <assert.h>
 #include <vspline/vspline.h>
+#include <vspline/map.h>
 
 int main ( int argc , char * argv[] )
 {
@@ -58,17 +66,20 @@ int main ( int argc , char * argv[] )
   
   // create the bspline object
   
-  vspline::bspline < double ,   // spline's data type
-                     1 >        // one dimension
+  vspline::bspline < long double ,  // spline's data type
+                     1 >            // one dimension
     bsp ( 2 ,                   // two values
           degree ,              // degree as per command line
-          vspline::PERIODIC ) ; // periodic boundary conditions
+          vspline::PERIODIC ,   // periodic boundary conditions
+          vspline::BRACED ,     // implicit scheme, bracing coeffs
+          -1 ,                  // horizon (unused w. BRACED)
+          0.0 ) ;               // no tolerance
           
   // the bspline object's 'core' is a MultiArrayView to the knot point
   // data, which we set one by one for this simple example:
   
-  bsp.core[0] = 1.0 ;
-  bsp.core[1] = -1.0 ;
+  bsp.core[0] = 1.0L ;
+  bsp.core[1] = -1.0L ;
   
   // now we prefilter the data
   
@@ -77,23 +88,58 @@ int main ( int argc , char * argv[] )
   // the spline is now ready for use, we create an evaluator
   // to obtain interpolated values
   
-  vspline::evaluator < double ,       // evaluator taking double coordinates
-                       double         // and producing double results
-                       > ev ( bsp ) ; // from the bspline object we just made
+  typedef vspline::evaluator < long double , long double > ev_type ;
+  
+  ev_type ev ( bsp ) ; // from the bspline object we just made
+
+  // we want to map the incoming coordinates into the defined range.
+  // Here we go all the way, using a vspline::mapper object, which
+  // is overkill for the simple 1D operation, but it's instructive.
+
+  // First define the 'gate' type - this type 'treats' a single value
+
+  typedef vspline::periodic_gate_type < long double > pg_type ;
+
+  // next we define the mapper type, which is n-dimensional, but since
+  // we only use one dimension, it's a mere shell around the gate type
+
+  typedef vspline::mapper < long double ,  // coordinate type
+                            1 ,            // vector size
+                            pg_type        // the single gate type used here
+                          > mapper_type ;
+  
+  // now we create the mapper, passing the periodic gate with the desired start
+  // and end point to the mapper's constructor
+
+  mapper_type m ( pg_type ( 0.0L , 2.0L ) ) ;
+  
+  // finally we chain the mapper and the evaluator to create a functor
+  // which first passes the coordinate through the 'gate', mapping it to
+  // [ 0 : 2 ], and then evaluates at the mapped coordinate:
 
+  auto periodic_ev = vspline::chain < mapper_type , ev_type > ( m , ev ) ;
+  
   while ( true )
   {
     std::cout << " > " ;
-    double x ;
-    std::cin >> x ;                // get an angle
-    double xs = x * M_PI / 180.0 ; // sin() uses radians 
-    double xx = x / 180.0 - .5 ;   // 'splinus' has period 1 and is shifted .5
+    long double x ;
+    std::cin >> x ;                       // get an angle
+    long double xs = x * M_PI / 180.0L ;  // sin() uses radians 
+    long double xx = x / 180.0L - .5L ;   // 'splinus' has period 2 and is shifted .5
     
-    // finally we can produce both results. Note how we can use ev, the evaluator,
-    // like an ordinary function.
+    // finally we can produce both results. Note how we can use periodic_ev,
+    // the combination of gate and evaluator, like an ordinary function.
+
+    std::cout << "sin(" << xs << ") = "
+              << sin(xs)
+              << " splinus(" << xx << ") = "
+              << periodic_ev ( xx )
+              << " difference: "
+              << sin(xs) - periodic_ev(xx)
+              << std::endl ;
+
+    // for this simple example we might as well have used the gate directly:
 
-    std::cout << "sin(" << x << ") = " << sin(xs)
-              << " splin(" << x << ") = " << ev(xx)
-              << " difference: " << sin(xs) - ev(xx) << std::endl ;
+    assert ( m ( xx ) == pg_type ( 0.0L , 2.0L ) ( xx ) ) ;
   }
 }
diff --git a/filter.h b/filter.h
index f85064b..be65460 100644
--- a/filter.h
+++ b/filter.h
@@ -54,7 +54,8 @@
     but templated to the array types used, so the dimensionality is not a run time parameter.
     
     Note the code organization is bottom-up, so the highest-level code comes last.
-    Most code using filter.h will only call the final routine, filter_nd.
+    Most code using filter.h will only call the final routine, filter_nd - and user code,
+    working with vspline::bspline, will not directly call code in this file at all.
     
     While the initial purpose for the code in this file was, of course, b-spline prefiltering,
     the generalized version I present here can be used for arbitrary filters. There is probably
@@ -757,10 +758,11 @@ filter ( int _M ,               ///< number of input/output elements (DataLength
 // n-dimensional arrays. We use the following strategy:
 // - perform the prefiltering collinear to each axis separately
 // - when processing a specific axis, split the array(s) into chunks and use one job per chunk
-// - perform a traverse on each chunk, copying out subsets collinear to the processing axis
+// - perform a traversal on each chunk, copying out subsets collinear to the processing axis
 //   to a buffer
 // - perform the filter on the buffer
 // - copy the filtered data to the target
+
 // The code is organized bottom-up, with the highest-level routines furthest down, saving
 // on forward declarations. The section of code immediately following doesn't use vectorization,
 // the vector code follows.
@@ -769,6 +771,7 @@ filter ( int _M ,               ///< number of input/output elements (DataLength
 /// starting at source and deposting compactly at target. scatter performs the reverse
 /// operation. source_type and target_type can be different; on assignment source_type is
 /// simply cast to target_type.
+///
 /// index_type is passed in as a template argument, allowing for wider types than int,
 /// so these routines can also operate on very large areas of memory.
 
@@ -1005,7 +1008,7 @@ gather ( const source_type * source ,
   {
     while ( count-- )
     {
-// while Vc hasn't yet implemented gathering using intrinsices (from AVX2)
+// while Vc hadn't yet implemented gathering using intrinsices (from AVX2)
 // I played with using tem directly to see if I could get better performance.
 // So far it looks like as if the prefiltering code doesn't benefit.
 //       __m256i ix = _mm256_loadu_si256 ( (const __m256i *)&(indexes) ) ;
diff --git a/map.h b/map.h
index 1e97b41..eece9f7 100644
--- a/map.h
+++ b/map.h
@@ -80,43 +80,7 @@
 
 namespace vspline
 {
-/// 'mapping' in vspline means the application of a functor to real coordinates,
-/// either rejecting them (with an out_of_bounds exception) when out-of-bounds
-/// or folding them into the defined range in various ways. After this initial
-/// processing the coordinates are 'split' into their integral part and remainder.
-/// The specific mapping mode to be used is coded with these values:
-
-// typedef enum {
-//   MAP_REJECT ,    ///< throw out_of_bounds for out-of-bounds coordinates
-//   MAP_LIMIT ,     ///< clamps out-of-bounds coordinates to the bounds
-//   MAP_CONSTANT ,  ///< replace out-of-bounds coordinates by constant value
-//   MAP_MIRROR ,    ///< mirror-fold into the bounds
-//   MAP_PERIODIC,   ///< periodic-fold into the bounds
-// } map_code;
-// 
-// const std::string mc_name[] =
-// {
-//   "MAP_REJECT" ,
-//   "MAP_LIMIT" ,
-//   "MAP_CONSTANT" ,
-//   "MAP_MIRROR" ,
-//   "MAP_PERIODIC" ,
-// } ;
-
-// /// unspecialized gate_type. only classes specialized by 'mode'
-// /// actually do any work.
-// 
-// template < typename in_type ,
-//            typename out_type ,
-//            int vsize ,
-//            vspline::map_code mode >
-// struct gate_policy
-// {
-// } ;
-
-/// specializations for the supported mapping modes
-
-/// gate with REJECT throws vspline::out_of_bounds for invalid coordinates
+/// gate with reject_policy throws vspline::out_of_bounds for invalid coordinates
 
 template < typename _in_type ,
            typename _out_type ,
@@ -159,7 +123,7 @@ struct reject_policy
 
 } ;
 
-/// gate with LIMIT clamps out-of-bounds values
+/// gate with limit_policy clamps out-of-bounds values
 
 template < typename _in_type ,
            typename _out_type ,
@@ -273,7 +237,7 @@ rc_v v_fmod ( const rc_v& lhs ,
 
 #endif
 
-/// gate with mirror 'folds' coordinates into the range. It places the
+/// gate with mirror_policy 'folds' coordinates into the range. It places the
 /// result value so that mirroring it on both lower and upper (which
 /// produces an infinite number of mirror images) will produce one
 /// mirror image coinciding with the input.
@@ -410,6 +374,9 @@ struct periodic_policy
 
 } ;
 
+/// using declaration for the gate type. This declaration incoporates
+/// a mapping policy into a vspline::unary_functor:
+
 template < typename coordinate_type ,
            int _vsize ,
            template < class ,
@@ -421,20 +388,60 @@ using gate_type =
                            _vsize ,
                             gate_policy > ;
 
+template < typename coordinate_type ,
+           int _vsize = 1 >
+using reject_gate_type =
+vspline::gate_type < coordinate_type ,        // coordinate type
+                     _vsize ,                 // vector size
+                     vspline::reject_policy // mapping policy
+                   > ;
+
+template < typename coordinate_type ,
+           int _vsize = 1 >
+using limit_gate_type =
+vspline::gate_type < coordinate_type ,        // coordinate type
+                     _vsize ,                 // vector size
+                     vspline::limit_policy // mapping policy
+                   > ;
+
+template < typename coordinate_type ,
+           int _vsize = 1 >
+using constant_gate_type =
+vspline::gate_type < coordinate_type ,        // coordinate type
+                     _vsize ,                 // vector size
+                     vspline::constant_policy // mapping policy
+                   > ;
+
+template < typename coordinate_type ,
+           int _vsize = 1 >
+using mirror_gate_type =
+vspline::gate_type < coordinate_type ,        // coordinate type
+                     _vsize ,                 // vector size
+                     vspline::mirror_policy // mapping policy
+                   > ;
+
+template < typename coordinate_type ,
+           int _vsize = 1 >
+using periodic_gate_type =
+vspline::gate_type < coordinate_type ,        // coordinate type
+                     _vsize ,                 // vector size
+                     vspline::periodic_policy // mapping policy
+                   > ;
+
 /// finally we define class mapper which is initialized with a set of
 /// gate objects (of arbitrary type) which are applied to each component
 /// of an incoming nD coordinate in turn.
 /// The trickery with the variadic template argument list is necessary,
 /// because we want to be able to combine arbitrary gate types, which
-/// have distinct types to make them as efficient as possible.
+/// have distinct types to make the mapper as efficient as possible.
 /// the only requirement for a gate type is that it has to provide the
-/// necessary eval() functions. Typically a gate type would inherit from
-/// a 1D vspline::unary_functor, like the types above, since this guarantees
+/// necessary eval() functions. Typically a gate type would be a 1D
+/// vspline::unary_functor, like the types above, since this guarantees
 /// a suitable type, but this is not enforced.
 
 // TODO: I'd like to be able to construct a mapper object from
 // a vspline::bspline (which provides lower_limit() and upper_limit()
-// for each axis) and a set of vspline::map_codes
+// for each axis)
 
 template < typename _in_type ,
            typename _out_type ,
@@ -514,6 +521,9 @@ struct mapper_policy
 
 } ;
 
+/// this using declaration incorporates the types of the gates
+/// into the mapper, which is a vspline::unary_functor.
+
 template < typename _coordinate_type ,
            int _vsize ,
            class ... gate_types >
diff --git a/multithread.h b/multithread.h
index eea6679..e42866c 100644
--- a/multithread.h
+++ b/multithread.h
@@ -79,15 +79,8 @@
 /// partitioning of the original range and the same parameter pack that is to be passed
 /// to the functor. The multithreading routine proceeds to set up 'tasks' as needed,
 /// providing each with the functor as it's functional, a subrange from
-/// the partitioning and the parameter pack as arguments. There's also an overload
-/// to multithread() where we pass in a range over all the data and a desired number of
-/// tasks, leaving the partitioning to code inside multithread(), resulting in some
-/// default partitioning provided by a suitable overload of function partition().
-/// This is the commonly used variant, since it's usually not necessary to obtain
-/// a partitioning other than the default one. The notable exception here is partitioning
-/// for b-spline prefiltering, where the axis along which we apply the filter must
-/// not be split, and hence the prefiltering code partitions the range itself and
-/// uses the variant of multithread() which takes a partitioning.
+/// the partitioning and the parameter pack as arguments. The routine to be used
+/// to partition the 'whole' range is passed in.
 ///
 /// The tasks, once prepared, are handed over to a 'joint_task' object which handles
 /// the interaction with the thread pool (in thread_pool.h). While my initial code
@@ -604,31 +597,6 @@ int multithread ( void (*pfunc) ( range_type , Types... ) ,
   return nparts ;
 }
 
-// /// this overload of multithread() takes the desired number of tasks and a
-// /// range covering the 'whole' data. partition() is called with the range,
-// /// resulting in a partitioning which above overload of multithread() can use.
-// 
-// template < class range_type , class ...Types >
-// int multithread ( void (*pfunc) ( range_type , Types... ) ,
-//                   int nparts ,
-//                   range_type range ,
-//                   Types ...args )
-// {
-//   if ( nparts <= 1 )
-//   {
-//     // if only one part is requested, we take a shortcut and execute
-//     // the function right here:
-//     (*pfunc) ( range , args... ) ;
-//     return 1 ;
-//   }
-// 
-//   // partition the range using partition_to_tiles()
-// 
-//   partition_type < range_type > partitioning = partition_to_tiles ( range , nparts ) ;
-//   
-//   return multithread ( pfunc , partitioning , args... ) ;
-// }
-
 /// This variant of multithread() takes a pointer to a function performing
 /// the partitioning of the incoming range. The partitioning function is
 /// invoked on the incoming range (provided nparts is greater than 1) and
diff --git a/poles.h b/poles.h
index 59c6b74..ac7f4b5 100644
--- a/poles.h
+++ b/poles.h
@@ -628,7 +628,7 @@ const double Poles_24[] = {
 -3.5407088073360672255e-12 ,
 } ;
 
-const double* precomputed_poles[] = {
+const double* const precomputed_poles[] = {
   0, 
   0, 
   Poles_2, 
@@ -656,7 +656,7 @@ const double* precomputed_poles[] = {
   Poles_24, 
 } ;
 
-const long double* precomputed_basis_function_values[] = {
+const long double* const precomputed_basis_function_values[] = {
   K0, 
   K1, 
   K2, 
diff --git a/prefilter.h b/prefilter.h
index 96272b9..cba1457 100644
--- a/prefilter.h
+++ b/prefilter.h
@@ -49,7 +49,7 @@
     A good example of how this is done can be found in libeinspline. I term it
     the 'linear algebra approach'. In this implementation, I have chosen what I
     call the 'DSP approach'. In a nutshell, the DSP approach looks at the b-spline's
-    reconstruction by convolving the coefficients with a specific kernel. This
+    reconstruction as a convolution of the coefficients with a specific kernel. This
     kernel acts as a low-pass filter. To counteract the effect of this filter and
     obtain the input signal from the convolution of the coefficients, a high-pass
     filter with the inverse transfer function to the low-pass is used. This high-pass
@@ -105,6 +105,10 @@
     I was tempted to choose the explicit scheme over the implicit. Yet since the code for the
     implicit scheme is there already and some of it is even used in the explicit scheme I keep
     both methods in the code base for now.
+    
+    Note that using the explicit scheme also makes it possible to, if necessary, widen the
+    shape of the complete coefficient arry (including the 'frame') so that it becomes
+    vector-friendly. Currently, this is not done.
 
     [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000,
 */
diff --git a/prefilter_poles.cc b/prefilter_poles.cc
index 5f2d11a..769038c 100644
--- a/prefilter_poles.cc
+++ b/prefilter_poles.cc
@@ -163,13 +163,13 @@ int main ( int argc , char * argv[] )
     print_poles(degree) ;
   
   cout << noshowpos ;
-  cout << "const double* precomputed_poles[] = {" << endl ;
+  cout << "const double* const precomputed_poles[] = {" << endl ;
   cout << "  0, " << endl ;
   cout << "  0, " << endl ;
   for ( int i = 2 ; i < 25 ; i++ )
     cout << "  Poles_" << i << ", " << endl ;
   cout << "} ;" << endl ;
-  cout << "const long double* precomputed_basis_function_values[] = {" << endl ;
+  cout << "const long double* const precomputed_basis_function_values[] = {" << endl ;
   for ( int i = 0 ; i < 25 ; i++ )
     cout << "  K" << i << ", " << endl ;
   cout << "} ;" << endl ;
diff --git a/remap.h b/remap.h
index 81805b8..8d12972 100644
--- a/remap.h
+++ b/remap.h
@@ -42,7 +42,7 @@
 /// \brief set of generic remap functions
 ///
 /// My foremost reason to have efficient B-spline processing is the formulation of
-/// a generic remap function. This is a function which takes an array of real-valued
+/// generic remap functions. remap() is a function which takes an array of real-valued
 /// nD coordinates and an interpolator over a source array. Now each of the real-valued
 /// coordinates is fed into the interpolator in turn, yielding a value, which is placed
 /// in the output array at the same place the coordinate occupies in the coordinate
@@ -51,7 +51,7 @@
 /// - c, the coordinate array (or 'warp' array)
 /// - a, the source array
 /// - i, the interpolator over a
-/// - j, the coordinates in c
+/// - j, a coordinates into c and t
 /// - and t, the target
 ///
 /// remap defines the operation
@@ -60,7 +60,7 @@
 ///
 /// Now we widen the concept of remapping to something which is more like a 'transform'
 /// function. Instead of limiting the process to the use of an 'interpolator', we use
-/// an arbitrary unary functor to transform incoming values to outgoing values, where
+/// an arbitrary unary functor transforming incoming values to outgoing values, where
 /// the type of the incoming and outgoing values is determined by the functor. If the
 /// functor actually is an interpolator, we have a 'true' remap transforming coordinates
 /// into values, but this is merely a special case.
@@ -79,9 +79,9 @@
 ///
 /// There is also a second set of remap functions in this file, which don't take a
 /// 'warp' array. Instead, for every target location, the location's discrete coordinates
-/// are passed to the unary_functor_type object. This way, transformation-based remaps
+/// are passed to the unary_functor type object. This way, transformation-based remaps
 /// can be implemented easily: the user code just has to provide a suitable functor
-/// to yield values for coordinates. This interpolator will internally take the discrete
+/// to yield values for coordinates. This functor will internally take the discrete
 /// incoming coordinates (into the target array) and transform them as required, internally
 /// producing coordinates suitable for the 'actual' interpolation using a b-spline or some
 /// other object capable of producing values for real coordinates. The routine offering
@@ -139,11 +139,11 @@ using bcv_type = vigra::TinyVector < bc_code , dimension > ;
 /// mainly memory-bound, this strategy helps keeping memory use low. The data can
 /// be produced one by one, but the code has vectorized operation as well, which
 /// brings noticeable performance gain. With vectorized operation, instead of producing
-/// single values, the engine produces bunches of values. This operation is transparent
-/// to the caller, since the data are deposited in normal interleaved fashion. The
-/// extra effort for vectorized operation is in the implementation of the generator
-/// functor and reasonably straightforward. If only the standard remap functions
-/// are used, the user can remain ignorant of the vectorization.
+/// single values, the engine produces vectors of values. This operation is transparent
+/// to the caller, since the data are picked up and deposited in normal interleaved
+/// fashion. The extra effort for vectorized operation is in the implementation of the
+/// generator functor and reasonably straightforward. If only the standard remap
+/// functions are used, the user can remain ignorant of the vectorization.
 ///
 /// Why is _fill an object? It might be a function if partial specialization of functions
 /// were allowed. Since this is not so, we have to use a class and put the function in it's
@@ -381,6 +381,8 @@ void fill ( generator_type & gen ,
 /// to the fill routine, each in it's specific way. The first type we define is
 /// warp_generator. This generator yields data from an array, which, in the context
 /// of a remap-like function, will provide the coordinates to feed to the interpolator.
+/// Seen from the generalized context, it provides arguments to the functor to
+/// use to produce result values.
 /// First is warp_generator for dimensions > 1. Here we provide 'subrange' and
 /// 'bindOuter' to be used for the hierarchical descent in _fill. The current
 /// implementation relies of the hierarchical descent going all the way to 1D,
@@ -429,106 +431,6 @@ struct warp_generator
   }  
 } ;
 
-// previous implementation:
-// specialization of warp_generator for dimension 1. Here we have
-// (vectorized) evaluation code and subrange(), but no bindOuter().
-
-// template < typename unary_functor_type ,
-//            bool strided_warp >
-// struct warp_generator < 1 , unary_functor_type , strided_warp >
-// {
-//   typedef unary_functor_type functor_type ;
-//   
-//   typedef typename unary_functor_type::in_type nd_rc_type ;
-//   typedef typename unary_functor_type::out_type value_type ;
-//   
-//   typedef MultiArrayView < 1 , nd_rc_type > warp_array_type ;
-//   
-//   const warp_array_type warp ; // must not use reference here!
-//   const int stride ;
-//   const nd_rc_type * data ;
-//   typename warp_array_type::const_iterator witer ;
-//   
-//   const unary_functor_type & itp ;
-//   
-//   const unary_functor_type & get_functor()
-//   {
-//     return itp ;
-//   }
-//   
-//   warp_generator
-//     ( const warp_array_type & _warp ,
-//       const unary_functor_type & _itp )
-//   : warp ( _warp ) ,
-//     stride ( _warp.stride(0) ) ,
-//     itp ( _itp ) ,
-//     witer ( _warp.begin() ) ,
-//     data ( _warp.data() )
-//   {
-// #ifdef USE_VC
-//     int aggregates = warp.size() / vsize ;
-//     witer += aggregates * vsize ;
-// #endif
-//   } ;
-// 
-//   // in the context of this class, single value evaluation is only ever
-//   // used for mop-up action after all full vectors have been processed.
-//   // If Vc isn't used, this routine does all the work.
-// 
-//   void operator() ( value_type & target )
-//   {
-//     itp.eval ( *witer , target ) ;
-//     ++witer ;
-//   }
-// 
-// #ifdef USE_VC
-// 
-//   enum { vsize = unary_functor_type :: vsize } ;
-// 
-//   // operator() incorporates two variants, which depend on a template argument,
-//   // so the conditional has no run-time effect. The template argument target_type
-//   // determines the evaluation target and picks the appropriate eval variant.
-//   
-//   template < typename target_type >
-//   void operator() ( target_type & target )
-//   {
-//     typename unary_functor_type::in_v buffer ;
-// 
-//     // KFJ 2017-08-18 I've pushed load and store operations out of
-//     // class unary_functor and am now using generic free functions
-//     // in load_store.h. The idea is to reduce the required interface
-//     // of unary_functor so far that other objects can more easily
-//     // be made to take the place of objects derived from unary_functor.
-//     if ( strided_warp )
-//     {
-//       // if the warp array is strided, we use vspline::load
-//       // to assemble a simdized input value with load/gather operations      
-//       load ( data , buffer , stride ) ;
-//       // now we pass the simdized value to the evaluation routine
-//       itp.eval ( buffer , target ) ;
-//       data += vsize * stride ;
-//     }
-//     else
-//     {
-//       // otherwise we can collect them with an unstrided load operation,
-//       // which is more efficient for 1D data, the same for nD.
-//       load ( data , buffer ) ;
-//       itp.eval ( buffer , target ) ;
-//       data += vsize ;
-//     }
-//   }
-// 
-// #endif
-// 
-//   warp_generator < 1 , unary_functor_type , strided_warp >
-//     subrange ( const shape_range_type < 1 > & range ) const
-//   {
-//     return warp_generator < 1 , unary_functor_type , strided_warp >
-//              ( warp.subarray ( range[0] , range[1] ) , itp ) ;
-//   }
-// 
-// } ;
-
 template < typename unary_functor_type ,
            bool strided_warp >
 struct warp_generator < 1 , unary_functor_type , strided_warp >
@@ -756,7 +658,7 @@ void remap ( const unary_functor_type & ev ,
 /// back to the same location.
 
 template < class unary_functor_type , // type satisfying the interface in class unary_functor
-           int dim_target >          // dimension of target array
+           int dim_target >           // dimension of target array
 void apply ( const unary_functor_type & ev ,
               MultiArrayView < dim_target ,
                                typename unary_functor_type::out_type > & output )
diff --git a/thread_pool.h b/thread_pool.h
index 01b16e9..e4677e5 100644
--- a/thread_pool.h
+++ b/thread_pool.h
@@ -45,7 +45,7 @@
 /// of a thread pool for multithread() in multithread.h, but the class might find
 /// use elsewhere. The operation is simple:
 ///
-/// a set of worker threads is launched who wait for 'tasks', which come in the shape
+/// a set of worker threads is launched which wait for 'tasks', which come in the shape
 /// of std::function<void()>, from a queue. When woken, a worker thread tries to obtain
 /// a task. If it succeeds, the task is executed, and the worker thread tries to get
 /// another task. If none is to be had, it goes to sleep, waiting to be woken once
diff --git a/unary_functor.h b/unary_functor.h
index e981f69..52b2479 100644
--- a/unary_functor.h
+++ b/unary_functor.h
@@ -61,14 +61,14 @@
     more complex unary functors, a typical use would be to 'chain' two unary_functors,
     see class template 'chain' below, which also provides an example for the use
     of unary_functor. While this is currently the only explicitly coded combination,
-    providing more should be easy and the code would be similar to the expression
+    providing more should be easy and the code would be similar to other expression
     template implementations.
     
     vspline's unary functor objects are composed of two components:
     
     - class template uf_types, to provide a standard set of types
     
-    - an evalation policy providing the actual code for the functor
+    - an 'evaluation policy' providing the actual code for the functor
     
     the policy defines one or several functions eval() handling the operation
     at hand. Using the policy approach seems a roundabout way at first, but
@@ -82,6 +82,9 @@
     there is a different syntax for type- and non-type arguments. This is annoying
     but can't be helped. In class evaluator (eval.h) I work around this problem by
     creating types from int template arguments using std::numerical_constant.
+    
+    Class vspline::evaluator is itself coded as a vspline::unary_functor and can
+    serve as another example for the use of the code in this file.
 */
 
 #ifndef VSPLINE_UNARY_FUNCTOR_H
@@ -98,12 +101,16 @@ namespace vspline {
 /// have to be explicit about their i/o types (simple policies can avoid refering
 /// to the types by using templated eval routines)
 
+// TODO surround the _vsize template argument with #ifdef USE_VC, to make it disappear
+// or become 1 in unvectorized code
+
 template < typename IN ,  // argument or input type, like coordinate for interpolators
            typename OUT , // result type, like the type of an interpolation result
            int _vsize >   // vector width. often derived from OUT with vector_traits
 struct uf_types
 {  
   // number of dimensions. This may well be different for IN and OUT.
+  // note how we rely on vigra's ExpandElementResult mechanism to inspect IN and OUT.
 
   enum { dim_in = vigra::ExpandElementResult < IN > :: size } ;
   enum { dim_out = vigra::ExpandElementResult < OUT > :: size } ;
@@ -112,7 +119,10 @@ struct uf_types
 
   enum { vsize = _vsize } ;
 
-  // typedefs for incoming (argument) and outgoing (result) type
+  // typedefs for incoming (argument) and outgoing (result) type. These two types
+  // are non-vectorized types, like vigra::TinyVector < float , 2 >. Since such types
+  // consist of elements of the same type, the corresponding vectorized type can be
+  // easily automatically determined.
   
   typedef IN in_type ;
   typedef OUT out_type ;
@@ -126,7 +136,7 @@ struct uf_types
 #ifdef USE_VC
   
   // for vectorized operation, we need a few extra typedefs
-  // I use a _v suffix to indicate vectorized types.
+  // I use a _v suffix instead of the _type suffix above to indicate vectorized types.
 
   /// a simdized type of the elementary type of result_type,
   /// which is used for coefficients and results. this is fixed via
@@ -140,6 +150,14 @@ struct uf_types
   
   /// vectorized in_type and out_type. with the current implementation this is
   /// a vigra::TinyVector of the ele_v types above, which may only have one member.
+  /// one-member TinyVectors are annoying in places - to operate on them, they have
+  /// to be 'unpacked' to their single element, or the access has to be done with
+  /// a cast, On the other hand, having both single and plural data packaged in the
+  /// same container type makes handling them generically easier. Anyway, since the
+  /// dimension of incoming and outgoing values is a compile-time constant,
+  /// time-critical code can dispatch to specialized code - notably SIMD load/store
+  /// operations in perference to SIMD gather/scatters for singular data interacting
+  /// with unstrided memory.
   
   typedef typename vector_traits < IN , vsize > :: type in_v ;
   typedef typename vector_traits < OUT , vsize > :: type out_v ;
@@ -291,11 +309,11 @@ struct unary_functor
   /// arguments. But since class unary_functor is a mere shell and does not have
   /// state specific to itself, we can be certain that all arguments passed to it's
   /// constructor are 'meant' for it's evaluation policy, and just pass them on. We
-  /// use perfect forwarding to preserve the arguments as best as we can.
+//   /// use perfect forwarding to preserve the arguments as best as we can.
   
-  template < class ... ctor_args >
-  unary_functor ( ctor_args && ... args )
-  : evp ( std::forward<ctor_args> ( args ) ... )
+  template < class ... ctor_arg_types >
+  unary_functor ( ctor_arg_types && ... ctor_args )
+  : evp ( std::forward<ctor_arg_types> ( ctor_args ) ... )
   { } ;
 
   // note how unary_functor contains no eval() routines. These are the domain of
@@ -393,7 +411,7 @@ struct chain_policy
                     B & result ) const
   {
     I cc ;                    // have an intermediate handy
-    t1.eval ( c , cc ) ;      // evaluate first functor ro it
+    t1.eval ( c , cc ) ;      // evaluate first functor into it
     t2.eval ( cc , result ) ; // feed it as input to second functor
   }
 
@@ -405,8 +423,8 @@ struct chain_policy
 /// alias declaration below, given two unary functors A and B, we can
 /// instantiate a 'chain' class as simply as vspline::chain < A , B >.
 
-template < typename T1 ,  // 'chain' objects can be instantiated with two
-           typename T2 >  // template arguments
+template < typename T1 ,  // 'chain' objects must be instantiated with two
+           typename T2 >  // template arguments, spcifying the functors to use
 using chain = typename
  vspline::unary_functor < typename T1::in_type ,  // in_type and out_type are
                           typename T2::out_type , // inferred from T1 and T2
diff --git a/vspline.doxy b/vspline.doxy
index 3c10794..ce8abe9 100644
--- a/vspline.doxy
+++ b/vspline.doxy
@@ -38,7 +38,7 @@ PROJECT_NAME           = "vspline"
 # could be handy for archiving the generated documentation or if some version
 # control system is used.
 
-PROJECT_NUMBER         = 17
+PROJECT_NUMBER         = 0.2.0
 
 # Using the PROJECT_BRIEF tag one can provide an optional one line description
 # for a project that appears at the top of each page and should give viewer a

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/vspline.git