[vspline] 55/72: polishing, mainly documantation

Kay F. Jahnke kfj-guest at moszumanska.debian.org
Sun Jul 2 09:02:42 UTC 2017


This is an automated email from the git hooks/post-receive script.

kfj-guest pushed a commit to branch master
in repository vspline.

commit 290494e054abaa5934a96d1e1e85fc8335257573
Author: Kay F. Jahnke <kfjahnke at gmail.com>
Date:   Thu May 4 11:45:19 2017 +0200

    polishing, mainly documantation
---
 README.rst          |   2 +-
 brace.h             |  22 +++++-------
 bspline.h           |  78 ++++++++++++++++++----------------------
 common.h            |  98 +++++++++++++++++++++-----------------------------
 doxy.h              |  12 +++----
 eval.h              |  19 +++++-----
 example/channels.cc |   2 +-
 example/examples.sh |   2 +-
 example/gsm.cc      |   3 +-
 example/gsm2.cc     |   4 +--
 example/slice.cc    |   2 +-
 example/slice2.cc   |   4 +--
 example/slice3.cc   |   4 +--
 filter.h            | 100 +++++++++++++++++-----------------------------------
 map.h               |  57 +-----------------------------
 mapping.h           |  92 ++++-------------------------------------------
 multithread.h       |   4 +++
 prefilter.h         |  16 ++-------
 remap.h             |   2 +-
 19 files changed, 159 insertions(+), 364 deletions(-)

diff --git a/README.rst b/README.rst
index af28cff..c453a26 100644
--- a/README.rst
+++ b/README.rst
@@ -97,7 +97,7 @@ by running
 
 doxygen vspline.doxy
 
-in vspline's base folder. A reasonable up-to-date version of the documentation will be
+in vspline's base folder. A reasonably up-to-date version of the documentation will be
 provided online when I get around to it.
 
 TODO: construction site! link here to static HTML once it's online
diff --git a/brace.h b/brace.h
index aab113a..a6e4cc5 100644
--- a/brace.h
+++ b/brace.h
@@ -73,15 +73,14 @@
     Using boundary conditions like these, both the extrapolated signal and the
     coefficients share the same periodicity and mirroring.
     
-    There are two ways of
-    arriving at a braced coeffcient array: We can start from the extrapolated signal,
-    pick a section large enough to make margin effects vanish (due to limited arithmetic
-    precision), prefilter it and pick out a subsection containing the 'core' coefficients
-    and their support. Alternatively, we can work only on the core coefficients, calculate
-    suitable initial causal and anticausal coeffcients (where the calculation considers
-    the extrapolated signal, which remains implicit), apply the filter and *then* surround
-    the core coefficient array with more coeffcients (the brace) following the same
-    extrapolation pattern as we imposed on the signal, but now on the coefficients
+    There are two ways of arriving at a braced coeffcient array: We can start from the
+    extrapolated signal, pick a section large enough to make margin effects vanish
+    (due to limited arithmetic precision), prefilter it and pick out a subsection containing
+    the 'core' coefficients and their support. Alternatively, we can work only on the core
+    coefficients, calculate suitable initial causal and anticausal coeffcients (where the
+    calculation considers the extrapolated signal, which remains implicit), apply the filter
+    and *then* surround the core coefficient array with more coeffcients (the brace) following
+    the same extrapolation pattern as we imposed on the signal, but now on the coefficients
     rather than on the initial knot point values.
   
     The bracing can be performed without any solver-related maths by simply copying
@@ -128,7 +127,7 @@
     When using the higher-level access methods (via bspline objects), using the explicit or
     implicit scheme becomes a matter of passing in the right flag, so at this level, a deep
     understanding of the extrapolation mechanism isn't needed at all. I use the implicit scheme
-    as the default, because it needs slightly less memory and CPU cycles.
+    as the default.
 
     Since the bracing mainly requires copying data or trivial maths we can do the operations
     on higher-dimensional objects, like slices of a volume. To efficiently code these operations
@@ -177,9 +176,6 @@ struct bracer
 
   static int left_brace_size ( int spline_degree , bc_code bc )
   {
-//     if ( bc == REFLECT || bc == SPHERICAL )
-//       return ( spline_degree + 1 ) / 2 ;
-//     else
       return spline_degree / 2 ;
   }
 
diff --git a/bspline.h b/bspline.h
index cf013e4..e01b528 100644
--- a/bspline.h
+++ b/bspline.h
@@ -38,6 +38,7 @@
 /************************************************************************/
 
 /*! \file bspline.h
+
     \brief defines class bspline
 
   class bspline is the most convenient access to vspline's functionality.
@@ -50,8 +51,21 @@
   time, pretty much everything *can* be parametrized even at this level.
   bspline objects can be used without any knowledge of their internals,
   e.g. as parameters to the remap functions.
-
-  class bspline handles several views to the coefficients it operates on, these
+  
+  While 'raw' coefficient arrays with an evaluation scheme which applies
+  boundary conditions is feasible and most memory-efficient, it's not so well
+  suited for very fast evaluation, since the boundary treatment needs conditionals,
+  and 'breaks' uniform access, which is especially detrimental when using
+  vectorization. So vspline uses coefficient arrays with a few extra coefficients
+  'framing' or 'bracing' the 'core' coefficients. Since evaluation of the spline
+  looks at a certain small section of coefficients (the evaluator's 'support'),
+  the bracing is chosen so that this lookup will always succeed without having to
+  consider boundary conditions: the brace is set up to make the boundary conditions
+  explicit, and the evaluation can proceed blindly without bounds checking. With
+  large coefficient arrays, the little extra space needed for the brace becomes
+  negligible, but the code for evaluation becomes much faster and simpler.
+
+  So class bspline handles several views to the coefficients it operates on, these
   are realized as vigra::MultiArrayViews, and they all share the same storage:
 
   - the 'core', which is a view to an array of data precisely the same shape as
@@ -63,15 +77,15 @@
 
   - 'container', which contains the two views above plus an additional frame
     of coefficients used for the 'explicit' scheme of extrapolation before
-    prefiltering. 
+    prefiltering, or as extra 'headroom' if 'shifting' the spline is intended. 
 
   Using class bspline, there is a choice of 'strategy'. The simplest strategy is
-  'UNBRACED'. With this strategy, after putting the knot point data into the bspline's 'core'
-  area and calling prefilter(), the core area will contain the b-spline coefficients.
-  The resulting b-spline object can't be evaluated with the code in eval.h, this
-  mode of operation is intended for users who want to do their own processing of the
-  coefficients and don't need the code in eval.h. prefiltering is done using an
-  implicit scheme as far as the border conditions are concerned.
+  'UNBRACED'. With this strategy, after putting the knot point data into the bspline's
+  'core' area and calling prefilter(), the core area will contain the b-spline
+  coefficients. The resulting b-spline object can't be evaluated with the code in eval.h.
+  this mode of operation is intended for users who want to do their own processing of the
+  coefficients and don't need the code in eval.h. prefiltering is done using an implicit
+  scheme as far as the boundary conditions are concerned.
   
   The 'standard' strategy is 'BRACED'. Here, after prefiltering, the view 'coeffs'
   in the bspline object will contain the b-spline coefficients, surrounded by a 'brace'
@@ -84,8 +98,8 @@
   initial coefficient calculation. If the 'frame' of extrapolated data is large enough,
   the result is just the same. The inner part of the frame is taken as the brace, so no
   bracing needs to be performed explicitly. The resulting b-spline object will work with
-  vspline's evaluation code. Note that the explicit scheme uses 'IGNORE' boundary conditions
-  on the (framed) array, which is equivalent to zero-padding.
+  vspline's evaluation code. Note that the explicit scheme uses 'GUESS' boundary conditions
+  on the (framed) array, which tries to minimize margin effects further.
 
   Also note that the additional memory needed for the 'frame' will be held throughout the bspline
   object's life, the only way to 'shrink' the coefficient array to the size of the braced or core
@@ -127,11 +141,11 @@
   While there is no explicit code to create a 'smoothing spline' - a b-spline evaluating
   the source data without prefiltering them - this can be achieved simply by creating a b-spline
   object with spline degree 0 and 'shifting' it to the desired degree for evaluation. Note that
-  you'll need the EXPLICIT strategy for the purpose, because otherwise the spline won't have
-  enough 'headroom' for shifting.
+  you'll need the EXPLICIT strategy for the purpose, or specify extra 'headroom', because
+  otherwise the spline won't have enough 'headroom' for shifting.
   
   If stronger smoothing is needed, this can be achieved with the code in filter.h, passing in
-  appropriate pole values. a single-pole filter with a positive pole in [ 0 , 1 [ will do the
+  appropriate pole values. a single-pole filter with a positive pole in ] 0 , 1 [ will do the
   trick - the larger the pole, the stronger the smoothing. Note that smoothing with large pole
   values will need a large 'horizon' as well to handle the margins properly.
 
@@ -157,7 +171,7 @@ namespace vspline {
 /// set of template arguments, namely the spline's data type (like pixels etc.) and it's dimension.
 /// All other parameters relevant to the spline's creation are passed in at construction time.
 /// This way, if explicit specialization becomes necessary (like, to interface to code which
-/// can't use templates) the number of specialzations remains manageable. This design decision
+/// can't use templates) the number of specializations remains manageable. This design decision
 /// pertains specifically to the spline's degree, which can also be implemented as a template
 /// argument, allowing for some optimization by making some members static. Yet going down this
 /// path requires explicit specialization for every spline degree used and the performance gain
@@ -288,7 +302,7 @@ public:
   /// the bspline object is allocated externally and passed into the bspline object.
 
   static shape_type container_size ( shape_type core_shape ,  ///< shape of knot point data
-            int spline_degree = 3 ,                  ///< spline degree with reasonable default
+            int spline_degree = 3 ,                ///< spline degree with reasonable default
             bcv_type bcv = bcv_type ( MIRROR ) ,   ///< boundary conditions and common default
             prefilter_strategy strategy = BRACED , ///< default strategy is the 'implicit' scheme
             int horizon = sizeof(real_type) * 3 )  ///< width of frame for explicit scheme (heuristic)
@@ -343,8 +357,6 @@ public:
   // widen class bspline's scope to accept input of other types and/or use a different
   // math_type.
 
-  // TODO: write copy constructor, operator=
-  
   bspline ( shape_type _core_shape ,                ///< shape of knot point data
             int _spline_degree = 3 ,                ///< spline degree with reasonable default
             bcv_type _bcv = bcv_type ( MIRROR ) ,   ///< boundary conditions and common default
@@ -386,9 +398,6 @@ public:
     else
       horizon = _horizon ; // whatever the user specifies
 
-//     if ( _strategy == BRACED || _strategy == EXPLICIT )
-//       headroom ++ ;
-
     // first, calculate all the various shapes and sizes used internally
     setup_metrics ( headroom ) ;
 
@@ -455,15 +464,6 @@ public:
   
   /// get a bspline object for a single channel of the data. This is also lightweight
   /// and requires the viewed data to remain present as long as the channel view is used.
-  /// This is useful because, when working with multichannel data, often they will come
-  /// interleaved in memory, which assures good locality of access and therefore efficient
-  /// prefiltering and evaluation. If the channels require separate treatment, the
-  /// channel-specific operations are best performed at the same location with the
-  /// channel-specific operations in sequence, since this accesses memory in the same
-  /// vicinity - rather than processing each channel by itself, which traverses the
-  /// memory as many times as there are channels. Of course, if only one specific channel
-  /// needs to be operated upon, getting a view to the channel first and constructing
-  /// a b-spline over it is more efficient.
   /// the channel view inherits all metrics from it's parent, only the MultiArrayViews
   /// to the data are different.
   
@@ -504,8 +504,7 @@ public:
   /// after prefilter() terminates, so it's safe to pass in some MultiArrayView
   /// which is destroyed after the call to prefilter().
 
-  void prefilter ( int njobs = default_njobs ,    ///< intended number of jobs to use
-                   view_type data = view_type() ) ///< view to knot point data to use instead of 'core'
+  void prefilter ( view_type data = view_type() ) ///< view to knot point data to use instead of 'core'
   {
     if ( data.hasData() )
     {
@@ -557,8 +556,7 @@ public:
                 bcv ,
                 spline_degree ,
                 tolerance ,
-                smoothing ,
-                njobs
+                smoothing
               ) ;
         break ;
       case BRACED:
@@ -570,8 +568,7 @@ public:
                 bcv ,
                 spline_degree ,
                 tolerance ,
-                smoothing ,
-                njobs
+                smoothing
               ) ;
         // using the more general code here now, since the frame may be larger
         // than strictly necessary for the given spline degree due to a request
@@ -599,8 +596,7 @@ public:
                 explicit_bcv ,
                 spline_degree ,
                 tolerance ,
-                smoothing ,
-                njobs
+                smoothing
               ) ;
         break ;
       case MANUAL:
@@ -616,8 +612,7 @@ public:
                 explicit_bcv ,
                 spline_degree ,
                 tolerance ,
-                smoothing ,
-                njobs
+                smoothing
               ) ;
         break ;
     }
@@ -664,9 +659,6 @@ public:
   /// with the wider kernel of the higher-degree spline's reconstruction filter.
   /// So if a spline is set up with degree 0 and shifted to degree 5, it has to be
   /// constructed with an additional headroom of 3 (see the constructor).
-  /// This is a quick-shot solution to the problem of scaled-down interpolated
-  /// results; it may be better in some situations to shift the spline up and
-  /// evaluate than to apply smoothing to the source data.
   
   // TODO consider moving the concept of shifting to class evaluator
 
diff --git a/common.h b/common.h
index 899bf34..f086123 100644
--- a/common.h
+++ b/common.h
@@ -38,13 +38,17 @@
 /************************************************************************/
 
 /*! \file common.h
+
     \brief definitions common to all files in this project, utility code
     
-  Currenly there isn't much code here - the only thing used throughout is the definition
-  of the boundary condition codes/mapping codes, which are lumped together for now in one
-  enum. I have also started putting some utility code here.
+    This file contains
+    
+    - a traits class fixing the simdized types used for vectorized code
+    
+    - exceptions used throughout vspline
+    
+    - constants and enums used throughout vspline
   
-  TODO The exceptions need some work.
 */
 
 #ifndef VSPLINE_COMMON
@@ -58,47 +62,37 @@
 
 #endif
 
-#include <thread>
-
 namespace vspline {
 
-/// number of CPU cores in the system; per default, multithreading routines
-/// will perform their task with as many threads.
-
-const int ncores = std::thread::hardware_concurrency() ;
-
 #ifdef USE_VC
 
-// initially, I used Vc::SimdArray in many places, but I figured that having a central
-// place to fix a simdized type would be a good idea, so here's a traits class. With this
-// central definition of the simdized type, it's also easy to make more of an effort to get
-// that simdized type which is most suited for the situation, so I introduced a conditional
-// which picks plain Vc::Vector<T> if vsize is the right size, and SimdArray only if it is not.
-
-/// traits class vector_traits picks the appropriate simdized type for a given
-/// elementary type T and a vector length of _vsize.
-/// if vsize isn't given, it is taken to be a Vc::Vector<T>'s Size.
-/// the definition used here assures that whenever possible, actual Vc::Vector<T>
-/// is used as the simdized type, and SimdArray<T,_vsize> is only used if it can't be
-/// avoided. Why so? I have the suspicion that the code generated for SimdArrays
-/// may be less efficient than the code for Vc::Vectors.
-
-/// struct vector_traits is the central location to fix vecorization throughout vspline.
-/// Nothe how we use a default vector width twice the size of a standard Vc vector for
-/// the given T. In my tests, this made the code faster. TODO test more, and on other systems
-
 /// using definition for the 'elementary type' of a type via vigra's
-/// ExpandElementResult mechanism. The elementary type is used to determine
-/// the default vector size, which is used as default for the vector size
-/// template argument _vsize. This default should only be used once for a
-/// given domain of code sharing simdized types, and the resulting vector
-/// size used explicitly for other T - if, for example, most vectorized maths
-/// is done in float, one would use #define VSIZE vector_traits<float>::size,
-/// and for other T, code vector_traits<T,VSIZE>
+/// ExpandElementResult mechanism.
 
 template < class T >
 using ET = typename vigra::ExpandElementResult < T > :: type ;
 
+/// struct vector_traits is used throughout vspline to determine simdized
+/// data types. while inside the vspline code base, vsize, the number of
+/// elementary type members in a simdized type, is kept variable in form of
+/// the template argument _vsize, in code using vspline you'd expect to find
+/// a program-wide definition deriving the number of elements from the
+/// type commonly used for arithmetics, like
+///
+/// #define VSIZE (vspline::vector_traits<float>::size)
+///
+/// If other elementary types are used, their simdized types are fixed to use
+/// the same number of elements by using the program-wide number of elements:
+///
+/// typedef vspline::vector_traits<double,VSIZE>::ele_v double_v_type ;
+///
+/// typedef vigra::TinyVector < int , 3 > triplet_type ;
+///
+/// typedef vspline::vector_traits<triplet_type,VSIZE>::type triplet_v_type ;
+///
+/// using a default _vsize of twice the elementary type's Vc::Vector's size
+/// works best on my system, but may not be optimal everywhere.
+
 template < class T ,
            int _vsize = 2 * Vc::Vector < ET<T> > :: Size >
 struct vector_traits
@@ -116,6 +110,8 @@ struct vector_traits
 
 #endif
 
+// TODO The exceptions need some work. My use of exceptions is a bit sketchy...
+
 /// for interfaces which need specific implementations we use:
 
 struct not_implemented
@@ -162,31 +158,13 @@ struct out_of_bounds
 {
 } ;
 
-/// if out-of-bounds coordinates are encountered and the mapping used doesn't replace
-/// them with values inside the bounds (like by mirroring etc.), the mapping code will
-/// produce these two values for the integral and remainder part. Setting the integral
-/// part to 0 insures that even if the failure to map out-of-bounds values is not acted
-/// upon, the code won't access memory outside the coefficient array. Using 1.0 for the
-/// remainder part is (with the widened brace on the right margin) a value which can't
-/// occur with coordinates inside the permitted range, but it's a value which will produce
-/// an in-range value when processed by the evaluation code. Finally, rv_out_of_bounds
-/// is the value assigned to an evaluation of a coordinate tagged with ic_out_of_bounds,
-/// rc_out_of_bounds. It's assigned if the evaluator object has template argument 'use_tag'
-/// set to true; if 'use_tag' is false, the tag values will be processed as if they were
-/// proper results, which may produce unwanted artifacts but is slightly faster.
-
-const long ic_out_of_bounds = 0 ;
-const double rc_out_of_bounds = 1.0 ;
-const double rv_out_of_bounds = 0.0 ;
-
 /// This enumeration is used for codes connected to boundary conditions. There are
-/// three aspects to boundary conditions: During prefiltering, if the implicit scheme is used,
+/// two aspects to boundary conditions: During prefiltering, if the implicit scheme is used,
 /// the initial causal and anticausal coefficients have to be calculated in a way specific to
 /// the chosen boundary conditions. Bracing, both before prefiltering when using the explicit
 /// scheme, and after prefiltering when using the implicit scheme, also needs these codes to
 /// pick the appropriate extrapolation code to extend the knot point data/coefficients beyond
-/// the core array. Finally, mapping of coordinates into the defined range uses some of the
-/// same codes.
+/// the core array.
 
 typedef enum { 
   MIRROR ,    ///< mirror on the bounds
@@ -199,9 +177,14 @@ typedef enum {
   IDENTITY ,  ///< used as solver argument, mostly internal use
   GUESS ,     ///< used with EXPLICIT scheme to keep margin errors low
   SPHERICAL , ///< use for spherical panoramas y axis
-//   SAFE_BEYOND , ///< assume the input extends beyond [0...M]
 } bc_code;
 
+/// 'mapping' in vspline means the application of a functor to real coordinates,
+/// either rejecting them (with an out_of_bounds exception) when out-of-bounds
+/// or folding them into the defined range in various ways. After this initial
+/// processing the coordinates are 'split' into their integral part and remainder.
+/// The specific mapping mode to be used is coded with these values:
+
 typedef enum {
   MAP_MIRROR ,    ///< mirror on the bounds
   MAP_PERIODIC,   ///< periodic mapping
@@ -236,7 +219,6 @@ const std::string bc_name[] =
   "IDENTITY" ,
   "GUESS" ,
   "SPHERICAL" ,
-//   "SAFE_BEYOND" , // currently unused
 } ;
 
 const std::string mc_name[] =
diff --git a/doxy.h b/doxy.h
index 62badcd..1fde37a 100644
--- a/doxy.h
+++ b/doxy.h
@@ -104,7 +104,7 @@ On the evaluation side I provide
  
  On my previous system I had to add -fabi-version=6 to avoid certain issues with Vc.
  
- Please note that an executable using Vc produced on your system may likely not work on a machine with another CPU. It's best to compile on the intended target. Alternatively, the target architecture can be passed explicitly to gcc, please refer to gcc's manual. 'Not work' in this context means that it may as well crash due to an illegal instruction or wrong alignment.
+ Please note that an executable using Vc produced on your system may likely not work on a machine with another CPU. It's best to compile on the intended target. Alternatively, the target architecture can be passed explicitly to the compiler (-march...). 'Not work' in this context means that it may as well crash due to an illegal instruction or wrong alignment.
  
  If you can't use Vc, the code can be made to compile without Vc by omitting -D USE_VC and other flags relevant for Vc:
  
@@ -254,21 +254,21 @@ vspline::remap < coordinate_type , float > ( a , coordinate_array , target_array
  
  vspline::remap < ev_type , 2 > ( ev , coordinate_array , target_array ) ; 
  
- While this routine is also called remap, it has wider scope: 'ev_type' can be any functor providing a suitable interface for providing a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and a vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be just about any data type, you're not limited to things which can be thought of as 'coo [...]
+ While this routine is also called remap, it has wider scope: 'ev_type' can be any functor providing a suitable interface for providing a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and a vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be any data type suitable for processing with vspline (elementary real types and their un [...]
  
  This form of remap might be named 'transform' and is similar to vigra's point operator code, but uses vspline's automatic multithreading and vectorization to make it very efficient. There's a variation of it where the 'coordinate array' and the 'target array' are the same, effectively performing an in-place transformation, which is useful for things like coordinate transformations or colour space manipulations. This variation is called 'apply'.
  
- Finally, there is a variation on remap called 'index_remap'. This one doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in. This variant is helpful when a remap uses a coordinate transformation before evaluating; here, the functor starts out receiving the discrete target coordinates, performs the coordinate transform and then feeds the transformed coordinate to some evaluation routine providing th [...]
+ There is a variation on remap called 'index_remap'. This one doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in. This variant is helpful when a remap uses a coordinate transformation before evaluating; here, the functor starts out receiving the discrete target coordinates, performs the coordinate transform and then feeds the transformed coordinate to some evaluation routine providing the final r [...]
  
  Class vspline::unary_functor is coded to make it easy to implement functors for image processing pipelines. For more complex operations, you'd code a funtor representing your processing pipeline - often by delegating to 'inner' objects also derived from vspline::unary_functor - and finally use remap or index_remap to bulk-process your data with this functor. This is about as efficient as it gets, since the data are only accessed once, and vspline's remapping code does the tedious work o [...]
  
  And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arrays of coordinates. So if you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level object 'bspline' or the remap functions.
  
- While one might argue that the remap routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that only by tightly coupling it with the b-spline code I can make it really fast. And only by processing several values at once (by multithreading and vectorization) the hardware can be exploited fully. But you're free to omit the remap code, the headers build on top of each other, and remap.h is pretty much at the top.
+ While one might argue that the remap routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that only by tightly coupling them with the b-spline code I can make them really fast. And only by processing several values at once (by multithreading and vectorization) the hardware can be exploited fully. But you're free to omit the remap code, the headers build on top of each other, and remap.h is pretty much at the top.
  
 \section speed_sec Speed
 
- While performance will vary widely from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:
+ While performance will vary from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:
 
 ~~~~~~~~~~~~~~~~~~~~~
 testing bc code MIRROR spline degree 3
@@ -306,7 +306,7 @@ Using double precision arithmetics, vectorization doesn't help so much, and pref
  
  To use vectorized evaluation efficiently, incoming data have to be presented to the evaluation code in vectorized form, but usually they will come from interleaved  memory. After the evaluation is complete, they have to be stored again to interleaved memory. The deinterleaving and interleaving operations take time and the best strategy is to load once from interleaved memory, perform all necessary operations on vector data and finally store once. The sequence of operations performed on  [...]
 
- Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via remap from a precalculated warp array. On my system, I have almost reached that goal - my remap times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main tesing ground is pv, my panorama viewer. Here I can often take the spline d [...]
+ Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via remap from a precalculated warp array. On my system, I have almost reached that goal - my remap times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main tesing ground is pv, my panorama viewer. Here I can often take the spline d [...]
  
  On the other hand, even without using vectorization, the code is certainly fast enough for casual use and may suffice for some production scenarios. This way, vigra becomes the only dependency, and the same binary will work on a wide range of hardware.
  
diff --git a/eval.h b/eval.h
index cdd9d06..a5e4a1e 100644
--- a/eval.h
+++ b/eval.h
@@ -78,17 +78,19 @@
     functioanl programming concepts - and it's conceptionally appealing.
     
     Code using class evaluator will probably use it at some core place where it is
-    part of some processing chain. An example would be an image processing program:
+    part of some processing pipeline. An example would be an image processing program:
     one might have some outer loop generating arguments (typically SIMDIZED types)
     which are processed one after the other to yield a result. The processing will
     typically have several stages, like coordinate generation and transformations,
     then use class evaluator to pick an interpolated intermediate result, which is
     further processed by, say, colour or data type manipulations before finally
-    being stored in some target container. The whole processing chain can be
+    being stored in some target container. The whole processing pipeline can be
     coded to become a single functor, with one of class evaluator's eval-type
     routines embedded somewhere in the middle, and all that's left is code to
     efficiently handle the source and destination to provide arguments to the
-    processing chain - like the code in remap.h.
+    pipeline - like the code in remap.h. And since the code in remap.h is made to
+    provide the data feeding and storing, the only coding needed is for the pipeline,
+    which is where the 'interesting' stuff happens.
 */
 
 #ifndef VSPLINE_EVAL_H
@@ -162,8 +164,7 @@ MultiArray < 2 , target_type > calculate_weight_matrix ( int degree , int deriva
 ///
 /// So I coded the general case, which can use any weight generation function. Coding this
 /// introduces a certain degree of complexity, which I feel is justified for the flexibility
-/// gained. And it may turn out useful if we employ gaussians as basis functions to
-/// approximate high-order b-splines. The complexity is mainly due to the fact that, while
+/// gained. The complexity is mainly due to the fact that, while
 /// we can write a simple (templated) function to generate weights (as above), we can't pass
 /// such a template as an object to a function. Instead we use an abstract base class for
 /// the weight functor and inherit from it for specific weight generation methods.
@@ -172,10 +173,10 @@ MultiArray < 2 , target_type > calculate_weight_matrix ( int degree , int deriva
 /// along the different axes, but this introduced too much extra complexity for my taste and
 /// took me too far away from simply providing efficient code for b-splines, so I abandoned
 /// the attempts. Therefore the weight functors for a specific spline all have to have a common
-/// spline_order and generate spline_order weights. The only way to force lesser order weight functors into
-/// this scheme if it has to be done is to set some of the weights to zero. Weight functors
-/// of higher spline_order than the spline can't be accomodated, if that should be necessary, the spline_order
-/// of the entire spline has to be raised.
+/// spline_order and generate spline_order weights. The only way to force lesser order weight
+/// functors into this scheme is to set some of the weights to zero. Weight functors
+/// of higher spline_order than the spline can't be accomodated, if that should be necessary,
+/// the spline_order of the entire spline has to be raised.
 ///
 /// Note the use of 'delta' in the functions below. this is due to the fact that these functors
 /// are called with the fractional part of a real valued coordinate.
diff --git a/example/channels.cc b/example/channels.cc
index 382266b..f81f777 100644
--- a/example/channels.cc
+++ b/example/channels.cc
@@ -97,7 +97,7 @@ int main ( int argc , char * argv[] )
   }
   
   // prefilter the b-spline
-  space.prefilter ( true ) ;
+  space.prefilter() ;
   
   // now make a warp array with 1920X1080 3D coordinates
   warp_type warp ( vigra::Shape2 ( 1920 , 1080 ) ) ;
diff --git a/example/examples.sh b/example/examples.sh
index 6376677..fa7919a 100755
--- a/example/examples.sh
+++ b/example/examples.sh
@@ -6,5 +6,5 @@ for f in $(ls *.cc)
 do
   body=$(basename $f .cc)
   echo compiling $body
-  clang++ -DUSE_VC -O3 -std=c++11 -march=native -pthread -o$body $f -lVc -lvigraimpex
+  clang++ -O3 -std=c++11 -march=native -pthread -o$body $f -lVc -lvigraimpex
 done
diff --git a/example/gsm.cc b/example/gsm.cc
index 79a1c84..e7e3a16 100644
--- a/example/gsm.cc
+++ b/example/gsm.cc
@@ -33,7 +33,8 @@
 ///
 /// implementation of gsm.cc, performing the calculation of the
 /// gradient squared magnitude in a loop using two evaluators for
-/// the two derivatives.
+/// the two derivatives, adding the squared magnitudes and writing
+/// the result to an image file
 ///
 /// compile with:
 /// clang++ -std=c++11 -march=native -o gsm -O3 -pthread -DUSE_VC gsm.cc -lvigraimpex -lVc
diff --git a/example/gsm2.cc b/example/gsm2.cc
index dcf5718..a135b14 100644
--- a/example/gsm2.cc
+++ b/example/gsm2.cc
@@ -71,8 +71,8 @@ typedef vspline::evaluator < coordinate_type , // incoming coordinate's type
                              VSIZE             // vector size
                            > ev_type ;
 
-/// we build a vspline::unary_functor which calculates the gradient squared
-/// magnitude. The code here isn't generic, for a general-purpose gsm evaluator
+/// we build a vspline::unary_functor which calculates the sum of gradient squared
+/// magnitudes. The code here isn't generic, for a general-purpose gsm evaluator
 /// one would need a bit more of a coding effort, but we want to demonstrate
 /// the principle here.
 /// Note how the 'compound evaluator' we construct follows a pattern of
diff --git a/example/slice.cc b/example/slice.cc
index 6047db0..4502330 100644
--- a/example/slice.cc
+++ b/example/slice.cc
@@ -84,7 +84,7 @@ int main ( int argc , char * argv[] )
   }
   
   // prefilter the b-spline
-  space.prefilter ( true ) ;
+  space.prefilter() ;
   
   // get an evaluator for the b-spline
   typedef vspline::evaluator < coordinate_type , voxel_type > ev_type ;
diff --git a/example/slice2.cc b/example/slice2.cc
index a089881..d64ad9e 100644
--- a/example/slice2.cc
+++ b/example/slice2.cc
@@ -196,8 +196,8 @@ int main ( int argc , char * argv[] )
     }
   }
   
-  // prefilter the b-spline (using Vc)
-  space.prefilter ( true ) ;
+  // prefilter the b-spline
+  space.prefilter() ;
   
   // get an evaluator for the b-spline
   ev_type ev ( space ) ;
diff --git a/example/slice3.cc b/example/slice3.cc
index 3c2528e..8080756 100644
--- a/example/slice3.cc
+++ b/example/slice3.cc
@@ -94,8 +94,8 @@ int main ( int argc , char * argv[] )
     }
   }
   
-  // prefilter the b-spline (using Vc)
-  space.prefilter ( true ) ;
+  // prefilter the b-spline
+  space.prefilter() ;
   
   // get an evaluator for the b-spline
   ev_type ev ( space ) ;
diff --git a/filter.h b/filter.h
index 5f45b89..4e2bb70 100644
--- a/filter.h
+++ b/filter.h
@@ -82,8 +82,8 @@
 namespace vspline {
 
 /// overall_gain is a helper routine:
-/// Simply executing the filtering code by itself will attenuate the signal.
-/// Here we calculate the gain which, pre-applied to the signal, will cancel this effect.
+/// Simply executing the filtering code by itself will attenuate the signal. Here
+/// we calculate the gain which, pre-applied to the signal, will cancel this effect.
 /// While this code was initially part of the filter's constructor, I took it out
 /// to gain some flexibility by passing in the gain as a parameter.
 
@@ -112,17 +112,17 @@ double overall_gain ( const int & nbpoles , const double * const pole )
 /// 
 /// the result will be deposited via out_iter, which may be an iterator over
 /// the same data in_iter iterates over, in which case operation is in-place.
-/// in_iter should be a const iterator, it's never used for writing data.
+/// in_iter can be a const iterator, it's never used for writing data.
 ///
-/// class filter needs three template arguments, one for the type of iterator over the incoming
-/// data, one for the type of iterator to the resultant coefficients, and one for the real type
-/// used in arithmetic operations. The iterators' types will usually be the same, but formulating
-/// the code with two separate types makes it more versatile.
-/// The third (optional) template argument, will usually be the elementary type
-/// of the iterator's value_type. When the value_types are vigra aggregates (TinyVectors etc.)
-/// vigra's ExpandElementResult mechanism will provide, but at times we may wish to be explicit
-/// here, e.g. when iterating over simdized types.
-  
+/// class filter needs three template arguments, one for the type of iterator over the
+/// incoming data, one for the type of iterator to the resultant coefficients, and one
+/// for the real type used in arithmetic operations. The iterators' types will usually
+/// be the same, but formulating the code with two separate types makes it more
+/// versatile. The third (optional) template argument, will usually be the elementary
+/// type of the iterator's value_type. When the value_types are vigra aggregates
+/// (TinyVectors etc.) vigra's ExpandElementResult mechanism will provide, but at times
+/// we may wish to be explicit here, e.g. when iterating over simdized types.
+///   
 template < typename in_iter ,   // iterator over the knot point values
            typename out_iter ,  // iterator over the coefficient array
            typename real_type > // type for single real value for calculations
@@ -199,8 +199,7 @@ private:
 
 /// The code for mirrored BCs is adapted from P. Thevenaz' code, the other routines are my
 /// own doing, with aid from a digest of spline formulae I received from P. Thevenaz and which
-/// were helpful to verify the code against a trusted source. Anyway, it's all unit tested now
-/// and runs just fine.
+/// were helpful to verify the code against a trusted source.
 ///
 /// note how, in the routines to find the initial causal coefficient, there are two different
 /// cases: first the 'accelerated loop', which is used when the theoretically infinite sum of
@@ -429,7 +428,8 @@ value_type icc_periodic ( IT c , int k )
  return Sum ;
 }
 
-/// TODO doublecheck this routine!
+// TODO doublecheck this routine!
+
 value_type iacc_periodic ( out_iter c , int k )
 {
   value_type z = value_type ( pole[k] ) ;
@@ -461,43 +461,12 @@ value_type iacc_periodic ( out_iter c , int k )
   return Sum ;
 }
 
-// /// icc_safe_beyond and iacc_safe_beyond are for situations where c[0] ... c[M-1]
-// /// merely define a section of a larger array of data, so that it's safe to access
-// /// horizon[k] values beyond either end of c[0] ... c[M-1].
-// /// currently unused, because it doesn't work with buffering
-// 
-// template < class IT >
-// value_type icc_safe_beyond ( IT c , int k )
-// {
-//   value_type p = value_type ( pole[k] ) ;
-//   int z = horizon[k] ;
-//   value_type icc = c [ - z ] ;
-//   
-//   for ( int i = - z + 1 ; i <= 0 ; i++ )
-//   {
-//     icc += p * icc + c[i] ;
-//   }
-//   
-//   return icc ;
-// }
-// 
-// value_type iacc_safe_beyond ( out_iter c , int k )
-// {
-//   value_type p = value_type ( pole[k] ) ;
-//   int z = horizon[k] ;
-//   value_type iacc = c [ M - 1 + z ] ;
-// 
-//   for ( int i = z - 1 ; i >= 0 ; i-- )
-//     iacc = p * ( iacc - c [ M - 1 + i ] ) ; 
-//   return iacc ;
-// }
-
-// guess the initial coefficient. This tries to minimize the effect
-// of starting out with a hard discontinuity as it occurs with zero-padding,
-// while at the same time requiring little arithmetic effort
-
-// for the forward filter, we guess an extrapolation of the signal to the left
-// repeating c[0] indefinitely, which is cheap to compute:
+/// guess the initial coefficient. This tries to minimize the effect
+/// of starting out with a hard discontinuity as it occurs with zero-padding,
+/// while at the same time requiring little arithmetic effort
+///
+/// for the forward filter, we guess an extrapolation of the signal to the left
+/// repeating c[0] indefinitely, which is cheap to compute:
 
 template < class IT >
 value_type icc_guess ( IT c , int k )
@@ -688,7 +657,7 @@ void solve_identity ( in_iter c , out_iter x )
 /// saved, at the cost of slightly reduced accuracy for large spline degrees. For high
 /// spline degrees and higher dimensions, it's advisable to not use this mechanism and
 /// pass in apply_gain = 1 for all dimensions; the calling code in filter.h decides this
-/// by using a heuristic method.
+/// with a heuristic.
 /// The number of poles and a pointer to the poles themselves are passed in with the
 /// parameters _nbpoles and _pole, respectively.
 /// Finally, the last parameter, tolerance, gives a measure of the acceptable error.
@@ -784,13 +753,6 @@ filter ( int _M ,               ///< number of input/output elements (DataLength
     _p_icc2 = & filter_type::icc_guess<out_iter> ;
     _p_iacc = & filter_type::iacc_guess ;
   }
-// /// currently unused, because it doesn't work with buffering
-//   else if ( bc == SAFE_BEYOND )
-//   {
-//     _p_icc1 = & filter_type::icc_safe_beyond<in_iter> ;
-//     _p_icc2 = & filter_type::icc_safe_beyond<out_iter> ;
-//     _p_iacc = & filter_type::iacc_safe_beyond ;
-//   }
   else
   {
     std::cout << "boundary condition " << bc << " not supported by vspline::filter" << std::endl ;
@@ -803,7 +765,7 @@ filter ( int _M ,               ///< number of input/output elements (DataLength
 // Now that we have generic code for 1D filtering, we want to apply this code to
 // n-dimensional arrays. We use the following strategy:
 // - perform the prefiltering collinear to each axis separately
-// - when processing a specific axis, split the array(s) into chunks and use one thread per chunk
+// - when processing a specific axis, split the array(s) into chunks and use one job per chunk
 // - perform a traverse on each chunk, copying out 1D subsets collinear to the processing axis
 //   to a buffer
 // - perform the filter on the buffer
@@ -910,7 +872,7 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
     // we already know that both arrays have the same shape. If the strides are also the same,
     // both arrays have the same structure in memory.
     // If both arrays have the same structure, we can save ourselves the index calculations
-    // for the second array, since the indexes would come out the same. target_base_adress
+    // for the second array, since the indices would come out the same. target_base_adress
     // may be the same as source_base_adress, in which case the operation is in-place, but
     // we can't derive any performance benefit from the fact.
 
@@ -1026,7 +988,7 @@ bool sequential ( const IT & indexes )
 /// memory as well, if the indexes were contiguous, but surprisingly, this was
 /// slower. I like the concise expression with this code - instead of having
 /// variants for load/store vs. gather/scatter and masked/unmasked operation,
-/// the modus operandi is determined by the indexes and mask passed, which is
+/// the modus operandi is determined by the indices and mask passed, which is
 /// relatively cheap as it occurs only once, while the inner loop can just
 /// rip away.
 
@@ -1138,7 +1100,7 @@ scatter ( const typename source_type::value_type * source ,
   }
 }
 
-/// aggregating_filter (below) keeps a buffer of vector-aligned memory, which it fills
+/// aggregating_filter keeps a buffer of vector-aligned memory, which it fills
 /// with 1D subarrays of the source array which are collinear to the processing axis.
 /// The buffer is then submitted to vectorized forward-backward recursive filtering
 /// and finally stored back to the corresponding memory area in target, which may
@@ -1164,8 +1126,6 @@ void ele_aggregating_filter ( source_view_type &source ,
                               double tolerance
                             )
 {
-//   typedef typename source_type::value_type ele_type ;    // elementary type in source
-
   // for prefiltering, using Vc:Vectors seems faster than using SimdArrays of twice the size,
   // which are used as simdized type in evaluation
 
@@ -1183,7 +1143,7 @@ void ele_aggregating_filter ( source_view_type &source ,
   
   int count = source.shape ( axis ) ;                    // number of vectors we'll process
 
-  // I initially tried to use Vc::Memory, but the semanics of the iterator obtained
+  // I initially tried to use Vc::Memory, but the semantics of the iterator obtained
   // by buffer.begin() didn't work for me.
   // anyway, a MultiArray with the proper allocator works just fine, and the dereferencing
   // of the iterator needed in the solver works without further ado. 
@@ -1416,6 +1376,7 @@ void aggregating_filter ( range_type < typename source_type::difference_type > r
 
 /// Now we have the routines which perform the buffering and filtering for a chunk of data,
 /// We add code for multithreading. This is done by using utility code from multithread.h.
+///
 /// filter_1d, which is the routine processing nD arrays along a specific axis, might as well
 /// be a function. But functions can't be partially specialized (at least not with my compiler)
 /// so I use a functor instead, which, as a class, can be partially specialized. We want a
@@ -1499,12 +1460,15 @@ public:
 }
 } ;
 
-/// now here's the specialization for 1D arrays. It may come as a surprise that it looks
+/// now here's the specialization for *1D arrays*. It may come as a surprise that it looks
 /// nothing like the nD routine. This is due to the fact that we follow a specific strategy:
 /// We 'fold up' the 1D array into a 'fake 2D' array, process this 2D array with the nD code
 /// which is very efficient, and 'mend' the stripes along the margins of the fake 2D array
 /// which contain wrong results due to the fact that some boundary condition appropriate
 /// for the 2D case was applied.
+/// With this 'cheat' we can handle 1D arrays with full multithreading and vectorization,
+/// while the 'orthodox' approach would have to process the data in linear order with
+/// a single thread. Cleaning up the 'dirty' margins is cheap for large arrays.
 
 template < typename input_array_type ,  ///< type of array with knot point data
            typename output_array_type , ///< type of array for coefficients (may be the same)
diff --git a/map.h b/map.h
index b5047e6..eb9efd5 100644
--- a/map.h
+++ b/map.h
@@ -68,7 +68,7 @@
     This code more or less duplicates the capabilities of the 'mmap' object
     class evaluator uses: this object can be constructed with the desired mapping
     modes at run-time and is therefore more flexible, and at times it's code is less
-    involved, since it is less general. But using mapping ocde fixed at compile-time,
+    involved, since it is less general. But using mapping code fixed at compile-time,
     as is done in this body of code, is potentially faster.
     
     On my system I noticed little difference when using clang++, but compiled with
@@ -525,61 +525,6 @@ struct gate_type < _rc_type , vspline::MAP_FUNCTION , _vsize >
   
 } ;
 
-// /// struct mapper1 applies a gate to it's incoming coordinate,
-// /// then splits it according to it's splitting mode
-// 
-// template < typename _gate_type ,
-//            typename _ic_type ,
-//            vspline::bc_code mode >
-// struct mapper1
-// {
-// } ;
-// 
-// /// simple raw mapping (should be one of two, for odd and even)
-// 
-// template < typename _gate_type , typename _ic_type >
-// struct mapper1 < _gate_type , _ic_type , vspline::RAW >
-// {
-//   typedef _gate_type gate_type ;
-//   typedef _ic_type ic_type ;
-// 
-//   typedef typename gate_type::out_ele_type rc_type ;
-//   typedef typename gate_type::out_ele_v rc_v ;
-//   typedef typename vspline::vector_traits
-//                     < _ic_type , gate_type::vsize > :: type ic_v ;
-//   
-//   const gate_type & gate ;
-//   
-//   mapper1 ( const gate_type & _gate )
-//   : gate ( _gate )
-//   { } ;
-// 
-//   // we need two overloads for map(), one for single values, one for SIMD types
-// 
-//   virtual void map ( const rc_type& in ,
-//                      ic_type & out_i ,
-//                      rc_type & out_r ) const
-//   {
-//     rc_type gated ;
-//     rc_type fi ;
-//     gate.eval ( in , gated ) ;
-//     out_r = std::modf ( gated , &fi ) ;
-//     out_i = fi ;
-//   }
-// 
-//   virtual void map ( const rc_v& in ,
-//                      ic_v & out_i ,
-//                      rc_v & out_r ) const
-//   {
-//     rc_v gated ;
-//     rc_v fi ;
-//     gate.eval ( in , gated ) ;
-//     out_r = vspline::v_modf ( gated , &fi ) ;
-//     out_i = fi ;
-//   }
-// 
-// } ;
-
 /// finally we define class mapper which is initialized with a set of
 /// gate objects (of arbitrary type) which are applied to each component
 /// of an incoming nD coordinate in turn.
diff --git a/mapping.h b/mapping.h
index fb0ad3f..0596fb6 100644
--- a/mapping.h
+++ b/mapping.h
@@ -78,7 +78,7 @@
     some user-provided coordinate treatment can be used, otherwise some default
     or simply expressed run-time selectable parameter can be used to set up coordinate
     treatment which is almost as efficient. In fact my trials suggest that when using
-    clang++, using nd_mapping object is pretty much as fast as using handcrafted
+    clang++, using nd_mapping objects is pretty much as fast as using handcrafted
     code - only with g++ I notice a significant but slight (~10%) speed difference.
     
     Another nice feature of using nd_mapping objects is that they determine the
@@ -120,41 +120,6 @@ using namespace vigra ;
 
 #ifdef USE_VC
 
-// /// since Vc doesn't offer a vectorized modf function, we have to code it. We make the effort not
-// /// to simply iterate over the vector's components but write 'proper' vector code to make this
-// /// operation as efficient as possible.
-// 
-// template <typename rc_v>
-// rc_v v_modf ( const rc_v& source ,
-//               rc_v * const iptr )
-// {
-//   typedef typename rc_v::mask_type mask_type ;
-//   typedef typename rc_v::EntryType single_type ;
-//   
-//   mask_type negative ( source < single_type(0) ) ; // Vc::isnegative ( source ) ;
-//   rc_v help ( source ) ;
-//   
-//   // we treat the case that any of the incoming vector components is negative separately
-//   // to avoid the corresponding code altogether in the - hopefully - most common case
-//   // where none of the incoming values are in fact negative.
-// 
-//   if ( any_of ( negative ) )
-//   {
-//     help(negative) = -source ;
-//     (*iptr) = floor ( help ) ;
-//     help -= (*iptr) ;
-//     (*iptr)(negative) = -(*iptr) ;
-//     help(negative) = -help ;
-//     return help ;
-//   }
-//   else
-//   {
-//     // for all positive components, the operation is trivial: 
-//     (*iptr) = floor ( source ) ;
-//     return ( help - *iptr ) ;
-//   }
-// }
-
 /// vectorized fmod function
 
 template <typename rc_v>
@@ -214,9 +179,6 @@ public:
     rc_t fl_i = std::floor ( v ) ;
     fv = v - fl_i ;
     iv = ic_t ( fl_i ) ;
-//     rc_t fl_i ;
-//     fv = modf ( v , &fl_i ) ; // split v into integral and remainder from [0...1[
-//     iv = ic_t ( fl_i ) ;      // set integer part from float representing it
   }
 
 #ifdef USE_VC
@@ -231,9 +193,6 @@ public:
     rc_v fl_i = floor ( v ) ;
     fv = v - fl_i ;
     iv = ic_v ( fl_i )  ;
-//     rc_v fl_i ;
-//     fv = v_modf ( v , &fl_i ) ; // split v into integral and remainder from [0...1[
-//     iv = ic_v ( fl_i )  ;       // set integer part from float representing it
   }
 
 #endif
@@ -254,10 +213,6 @@ public:
     rc_t fl_i = std::round ( v ) ;
     fv = v - fl_i ;
     iv = ic_t ( fl_i ) ;
-// previously I used:
-//     rc_t fl_i ;
-//     fv = modf ( v + rc_t ( 0.5 ) , &fl_i ) - rc_t ( 0.5 ) ;
-//     iv = ic_t ( fl_i ) ;
   }
 
 #ifdef USE_VC
@@ -272,12 +227,6 @@ public:
     rc_v fl_i = round ( v ) ;
     fv = v - fl_i ;
     iv = ic_v ( fl_i ) ;
-// previously I used:
-//     rc_v fl_i ;
-//     v += rc_t ( 0.5 ) ;
-//     fv = v_modf ( v , &fl_i ) ;
-//     fv -= rc_t ( 0.5 ) ;
-//     iv = ic_v ( fl_i )  ;
   }
 
 #endif
@@ -495,11 +444,10 @@ public:
 /// in-bounds in the first place.
 ///
 /// the next mapping is for coefficients/data mirrored at the ends of the defined
-/// range. This is probably the most commonly used type and also the type which
-/// P. Thevenaz recommends. Like most mappings defined in this body of code, it comes
-/// in two variants: one for odd splines and one for even ones. The handling is
-/// different for both cases: for an odd spline, the delta is in [0:1], for even
-/// splines it's in [-0.5:+0.5].
+/// range. This is probably the most commonly used type. Like most mappings defined
+/// in this body of code, it comes in two variants: one for odd splines and one for
+/// even ones. The handling is different for both cases: for an odd spline, the delta
+/// is in [0:1], for even splines it's in [-0.5:+0.5].
 ///
 /// This is mirroring 'on the bounds':
 ///
@@ -638,7 +586,7 @@ public:
 /// .5, 1.5, 2.5 ...
 /// Which fits nicely with an even spline, which has a wider defined range due to
 /// the smaller support. In that case we could possibly even save ourselves the last coefficient.
-/// The first period finishes one unit spacing beyond the location of the last knot
+/// The first period finishes one unit spacing beyond the location of the last given knot
 /// point, so if the spline is constructed over N values, the mapping has to be constructed
 /// with parameter M = N + 1. The bracing applied with the periodic bracer makes sure that
 /// coordinates up to M can be processed.
@@ -1103,34 +1051,6 @@ struct nd_mapping
 #endif // USE_VC
 } ;
 
-// void mapping_test()
-// {
-//   float in[] = { -1.0 , -0.5 , 0.0 , 0.5 , 8.5 , 9.0 , 9.4999 , 9.5 , 9.501 , 10.0 , 10.5 , 11.0 , 11.5 , 12.0 } ;
-//   const int innum = sizeof(in) / sizeof(float) ;
-//   bc_code test_bc[] = { MIRROR , PERIODIC , REFLECT , NATURAL } ;
-//   const int vsize = 8 ;
-//   for ( auto bc : test_bc )
-//   {
-//     mapping < int , float , vsize > m ( bc , 3 , 10 ) ;
-//     for ( auto v : in )
-//     {
-//       int i ;
-//       float f ;
-//       m.map ( v , i , f ) ;
-//       std::cout << "in: " << v << " -> i " << i << " f " << f << std::endl ;
-// #ifdef USE_VC
-//       typedef typename vector_traits < int , vsize > :: ele_v ic_v ;
-//       typedef typename vector_traits < float , vsize > :: ele_v rc_v ;
-//       rc_v vv ( v ) ;
-//       ic_v iv ;
-//       rc_v fv ;
-//       m.map_v ( vv , iv , fv ) ;
-//       std::cout << "in: " << vv << " -> i " << iv << " f " << fv << std::endl ;
-// #endif      
-//     }
-//   }
-// }
-
 } ; // end of namespace vspline
 
 #endif // VSPLINE_MAPPING_H
diff --git a/multithread.h b/multithread.h
index 72b2713..8967446 100644
--- a/multithread.h
+++ b/multithread.h
@@ -122,6 +122,10 @@
 
 namespace vspline
 {
+/// number of CPU cores in the system.
+
+const int ncores = std::thread::hardware_concurrency() ;
+
 /// when multithreading, use this number of jobs per default. This is
 /// an attempt at a compromise: too many jobs will produce too much overhead,
 /// too few will not distribute the load well and make the system vulnerable
diff --git a/prefilter.h b/prefilter.h
index 2cdc7ca..18fb2ea 100644
--- a/prefilter.h
+++ b/prefilter.h
@@ -41,6 +41,9 @@
 
     \brief Code to create the coefficient array for a b-spline.
     
+    Note: the bulk of the code was factored out to filter.h, while this text still
+    outlines the complete filtering process.
+    
     The coefficients can be generated in two ways (that I know of): the first
     is by solving a set of equations which encode the constraints of the spline.
     A good example of how this is done can be found in libeinspline. I term it
@@ -64,11 +67,6 @@
     many qualities. The vectorization is done with Vc, which allowed me to code
     the horizontal vectorization I use in a generic fashion.
     
-    For now, this file offers two implementations in one: the unvectorized version,
-    available as solve_vigra(), and the vectorized version, solve_vc(). Note that
-    the vectorized code is more constrained in what numeric types it can process
-    (namely, only float, double and their aggregates).
-    
     In another version of this code I used vigra's BSPlineBase class to obtain prefilter
     poles. This required passing the spline degree/order as a template parameter. Doing it
     like this allows to make the Poles static members of the solver, but at the cost of
@@ -114,14 +112,6 @@
 #ifndef VSPLINE_PREFILTER_H
 #define VSPLINE_PREFILTER_H
 
-#include <thread>
-#include <math.h>
-#include <complex>
-#include <cmath>
-#include <iostream>
-#include <array>
-#include <assert.h>
-
 #include "common.h"
 #include "filter.h"
 #include "basis.h"
diff --git a/remap.h b/remap.h
index 81640e2..74df84b 100644
--- a/remap.h
+++ b/remap.h
@@ -589,7 +589,7 @@ int remap ( const MultiArrayView
   
   // prefilter, taking data in 'input' as knot point data
   
-  bsp.prefilter ( ncores * 8 , input ) ;
+  bsp.prefilter ( input ) ;
 
   // create an evaluator over the bspline
 

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/vspline.git



More information about the debian-science-commits mailing list