[vspline] 45/72: work on documentation, fixed use_vc bug while writing documentation (front page for doxygen docu) I noticed that calls to high-level routines with use_vc set to a specific value did not always produce correct results. This was due to the flag's value not being carried all the way through to where it should have been evaluated. this should be fixed now, and roundtrip behaves as expected.

Sun Jul 2 09:02:41 UTC 2017

This is an automated email from the git hooks/post-receive script.

kfj-guest pushed a commit to branch master
in repository vspline.

commit aaff60bfda1e785e1e9ae2c6b2d051d41e46a21c
Author: Kay F. Jahnke <kfjahnke at gmail.com>
Date:   Thu Apr 20 09:04:45 2017 +0200

    work on documentation, fixed use_vc bug
    while writing documentation (front page for doxygen docu) I noticed that calls to high-level
    routines with use_vc set to a specific value did not always produce correct results. This was
    due to the flag's value not being carried all the way through to where it should have been
    evaluated. this should be fixed now, and roundtrip behaves as expected.
    
    Nevertheless I feel the use_vc flag is an intrusion, and I may throw it out altogether,
    removing the possibility to run code compiled with USE_VC with the vector code switched off.
---
 doxy.h               | 179 +++++++++++++++++++++++++++++++++------------------
 example/roundtrip.cc |  12 ++--
 remap.h              |  84 +++++++++++++++---------
 vspline.doxy         |   2 +-
 4 files changed, 177 insertions(+), 100 deletions(-)

diff --git a/doxy.h b/doxy.h
index abb5c9a..5ac8291 100644
--- a/doxy.h
+++ b/doxy.h
@@ -45,7 +45,7 @@
 
  vspline is a header-only generic C++ library for the creation and processing of uniform B-splines. It aims to be as comprehensive as feasibly possible, yet at the same time producing code which performs well, so that it can be used in production.
  
- vspline was developed on a Linux system using gcc. It has not been tested with other systems or compilers, and as of this writing I am aware that the code probably isn't portable. My code uses elements from the C++11 standard (mainly the auto keyword and range-based for loops).
+ vspline was developed on a Linux system using clang++ and g++. It has not been tested with other systems or compilers, and as of this writing I am aware that the code probably isn't portable. My code uses elements from the C++11 standard (mainly the auto keyword and range-based for loops).
  
  vspline relies heavily on two other libraries:
  
@@ -81,7 +81,7 @@ On the evaluation side I provide
  
  - evaluation of nD arrays of coordinates (generalized remap function)
  
- - transformation functor based remap function
+ - remap-like functions processing nD arrays of data
  
  \section install_sec Installation
  
@@ -91,11 +91,16 @@ On the evaluation side I provide
  
  While your distro's packages may be sufficient to get vspline up and running, you may need newer versions of VIGRA and Vc. At the time of this writing the latest versions commonly available were Vc 1.2.0 and VIGRA 1.11.0; I compiled Vc and VIGRA from source, using up-to-date pulls from their respective repositories.
  
- To compile software using vspline, I use this g++ call:
+ update: ubuntu 17.04 has vigra and Vc packages which are sufficiently up-to-date.
  
- g++ -D USE_VC -pthread -O3 -march=native --std=c++11 your_code.cc -lVc -lvigraimpex
+ To compile software using vspline, I use clang++:
  
- where the -lvigraimpex can be omitted if vigraimpex (VIGRA's image import/export library) is not used.
+~~~~~~~~~~~~~~
+ clang++ -D USE_VC -pthread -O3 -march=native --std=c++11 your_code.cc -lVc -lvigraimpex
+~~~~~~~~~~~~~~
+ 
+ where the -lvigraimpex can be omitted if vigraimpex (VIGRA's image import/export library) is not used, and linking libVc.a in statically is a good option; on my system
+ the resulting code is faster.
  
  On my previous system I had to add -fabi-version=6 to avoid certain issues with Vc.
  
@@ -103,20 +108,25 @@ On the evaluation side I provide
  
  If you can't use Vc, the code can be made to compile without Vc by omitting -D USE_VC and other flags relevant for Vc:
  
- g++ -pthread -O3 --std=c++11 your_code.cc -lvigraimpex
+~~~~~~~~~~~~~~
+ clang++ -pthread -O3 --std=c++11 your_code.cc -lvigraimpex
+~~~~~~~~~~~~~~
+ 
+ IF you don't want to use clang++, g++ will also work.
  
  All access to Vc in the code is inside #ifdef USE_VC .... #endif statements, so not defining USE_VC will effectively prevent it's use.
  
  For simplicity's sake, even if the code isn't compiled to use Vc, the higher level code may still accept the common use_vc flag in the call signatures, but it's value wont have an effect. When the code is compiled to use Vc, the unvectorized code is still built and available by calling the relevant routines with use_vc set to false. The documentation is built to contain text for vectorized operation, if this is unwanted, change the doxy file.
  
  \section license_sec License
- 
+
  vspline is free software, licensed under this license:
  
+~~~~~~~~~~~~
     vspline - a set of generic tools for creation and evaluation
               of uniform b-splines
 
-            Copyright 2015, 2016 by Kay F. Jahnke
+            Copyright 2015 - 2017 by Kay F. Jahnke
 
     Permission is hereby granted, free of charge, to any person
     obtaining a copy of this software and associated documentation
@@ -139,108 +149,151 @@ On the evaluation side I provide
     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
     OTHER DEALINGS IN THE SOFTWARE.
+~~~~~~~~~~~~
 
  \section quickstart_sec Quickstart
  
- TODO slightly out of data, bspline constructs differently now
- 
  If you stick with the high-level code, using class bspline or the remap function, most of the parametrization is easy. Here are a few examples what you can do.
  
- Let's suppose you have data in a 2D vigra MultiArray 'a'. vspline can handle float and double values, and also their 'aggregates', meaning data types like pixels or vigra's TinyVector. But for now, let's assume you have plain float data. Creating the bspline object is easy:
+ Let's suppose you have data in a 2D vigra MultiArray 'a'. vspline can handle real data like float and double, and also their 'aggregates', meaning data types like pixels or vigra's TinyVector. But for now, let's assume you have plain float data. Creating the bspline object is easy:
+ 
+~~~~~~~~~~~~~~
+#include <vspline/vspline.h>
+
+...
+
+typedef vspline::bspline < float , 2 > spline_type ; // fix the type of the spline
  
- typedef vspline::bspline < float , 2 > spline_type ; // fix the type of the spline
+spline_type bspl ( a.shape() ) ; // create bspline object 'bspl' suitable for your data
  
- spline_type bspl ( a.shape() ) ; // create bspline object 'bspl' suitable for your data
+bspl.core = a ;         // copy the source data into the bspline object's 'core' area
  
- bspl.core = a ;         // copy the source data into the bspline object's 'core' area
+bspl.prefilter() ; // run prefilter() to convert original data to b-spline coefficients
+~~~~~~~~~~~~~~
  
- bspl.prefilter() ; // run prefilter() to convert original data to b-spline coefficients
+ The memory needed to hold the coefficients is allocated when the bspline object is constructed.
  
- Now obviously many things have been done by default here: The default spline degree was used - it's 3, for a cubic spline. Also, boundary treatment mode 'MIRROR' was used per default. Further default parameters cause the spline to be 'braced' so that it can be evaluated with vspline's evaluation routines, Vc (if compiled in) was used for prefiltering, and the code used as many threads as your system has physical cores. You could have passed different values for all the parameters I have [...]
+ Obviously many things have been done by default here: The default spline degree was used - it's 3, for a cubic spline. Also, boundary treatment mode 'MIRROR' was used per default. Further default parameters cause the spline to be 'braced' so that it can be evaluated with vspline's evaluation routines, Vc (if compiled in) was used for prefiltering, and the process is automatically partitioned and run in parallel by a thread pool. The only mandatory template arguments are the value type a [...]
  
- while the sequence of operations indicated here looks a bit verbose (why not create the bspline object by a call like bspl(a) ?), in 'real' code you can use bspl.core straight away as the space to contain your data - you might get the data from a file or by some other process or do something like this  where the bspline object provides the array and you interface it via a view to it's 'core':
+ While the sequence of operations indicated here looks a bit verbose (why not create the bspline object by a call like bspl(a) ?), in 'real' code you would use bspl.core straight away as the space to contain your data - you might get the data from a file or by some other process or do something like this  where the bspline object provides the array and you interface it via a view to it's 'core':
    
- vspline::bspline < double , 1 > bsp ( 10001 , degree , vspline::MIRROR ) ;
+~~~~~~~~~~~~~~
+vspline::bspline < double , 1 > bsp ( 10001 , degree , vspline::MIRROR ) ;
  
- auto v1 = bsp.core ; // get a view to the bspline's 'core'
+auto v1 = bsp.core ; // get a view to the bspline's 'core'
  
- v1 [ 5000 ] = 1.0 ; // assign some value (the core is zero-initialized)
+for ( auto & r : v1 ) r = ... ; // assign some values
  
- bsp.prefilter() ; // perform the prefiltering
-
- Next you may want to evaluate the spline at some pair of coordinates x, y.
- To do so, you can use this idiom:
+bsp.prefilter() ; // perform the prefiltering
+~~~~~~~~~~~~~~
  
- typedef evaluator_type < 2 , float , float > eval_type ; // get the appropriate evaluator type
+ This is a common idiom, because it reflects a common mode of operation where you don't need the original, unfiltered data any more after creating the spline, so the prefiltering is done in-place overwriting the original data. If you do need the original data later, you can also use a third idiom:
  
- eval_type ev ( bspl ) ;                                  // create the evaluator
+~~~~~~~~~~~~~~
+vigra::MultiArrayView < 3 , double > my_data ( vigra::Shape3 ( 5 , 6 , 7 ) ) ;
  
- Now, assuming you have float x and y:
+vspline::bspline < double , 3 > bsp ( my_data.shape() ) ;
  
- float result = ev ( { x , y } ) ; // evaluate at (x,y)
+bsp.prefilter ( my_data ) ;
+~~~~~~~~~~~~~~
  
- the braces are needed because the evaluator expects a 'multi_coordinate' which is a vigra::TinyVector of as many real values as the spline has dimensions. You will encounter parameters of this type throughout: vspline's code is dimension-agnostic, and parameters passed in often have to have as many components as there are dimensions in the concrete object you are handling.
+ Here, the bspline object is first created with the appropriate 'core' size, and prefilter() is called with an array matching the bspline's core. This results in my_data being read into the bspline object during the first pass of the prefiltering process.
  
- Again, some things have happened by default. The evaluator was constructed with a bspline object, making sure that the evaluator is compatible. vspline can also calculate the spline's derivatives. The default is plain evaluation, but you can pass a request for production of derivatives to the evaluator's constructor. Let's assume you want the first derivative along axis 0 (the x axis):
+ There are more ways of setting up a bspline object, please refer to class bspline's constructor. Of course you are also free to directly use vspline's lower-level routines to create a set of coefficients. The lowest level of filtering routine is simply a forward-backward recursive filter with a set of arbitrary poles. This code is in filter.h.
+ 
+ Next you may want to evaluate the spline from the first example at some pair of coordinates x, y. Evaluation of the spline can be done using vspline's 'evaluator' objects. Using the highest level of access, these objects are set up with a bspline object and, after being set up, provide methods to evaluate the spline at given cordinates. Technically, evaluator objects are functors which don't have mutable state (all state is created at creation time and constant afterwards), so they are  [...]
+
+~~~~~~~~~~~~~~
+// for a 2D spline, we want 2D coordinates
+ 
+typedef vigra::TinyVector < float ,2 > coordinate_type ;
+ 
+// get the appropriate evaluator type
+ 
+typedef evaluator_type < coordinate_type , double > eval_type ;
+ 
+// create the evaluator
  
- eval_type eval_dx ( bsp , { 1 , 0 } ) ; // ask for an evaluator producing dx
+eval_type ev ( bspl ) ;
  
- float dx = eval_dx ( { x , y } ) ;      // use the evaluator
+// Now, assuming you have float x and y: 
  
+double result = ev ( coordinate_type ( x , y ) ) ; // evaluate at (x,y)
+~~~~~~~~~~~~~~
+
+ Again, some things have happened by default. The evaluator was constructed with a bspline object, making sure that the evaluator is compatible. vspline can also calculate the spline's derivatives. The default is plain evaluation, but you can pass a request for production of derivatives to the evaluator's constructor. Let's assume you want the first derivative along axis 0 (the x axis):
+
+~~~~~~~~~~~~~
+eval_type eval_dx ( bsp , { 1 , 0 } ) ; // ask for an evaluator producing dx
+ 
+float dx = eval_dx ( { x , y } ) ;      // use the evaluator
+~~~~~~~~~~~~~
+
  For every constellation of derivatives you'll have to create a distinct evaluator.
- This is not an expensive operation - the same coefficients are used in all cases, only
- the weight functors used internally differ. Calculating the spline's derivatives is even
- slightly faster than plain evaluation, since there are less multiplications to perform.
+ This is not an expensive operation (unless you use very high spline degrees) - the same coefficients are used in all cases, only the weight functors used internally differ. Calculating the spline's derivatives is even slightly faster than plain evaluation, since there are less multiplications to perform.
  
- What about the remap functions? The little introduction demonstrated how you can evaluate the spline at a single location. Most of the time, though, you'll require evaluation at many coordinates. This is what remap does. Instead of a single multi_coordinate, you pass a whole vigra::MultiArrayView full of multi_coordinates to it - and another MultiArrayView of the same dimension and shape to accept the results of evaluating the spline at every multi_coordinate in the first array. Here's  [...]
+ What about the remap functions? The little introduction demonstrated how you can evaluate the spline at a single location. Most of the time, though, you'll require evaluation at many coordinates. This is what remap functions do. Instead of a single coordinate, you pass a whole vigra::MultiArrayView full of coordinates to it - and another MultiArrayView of the same dimension and shape to accept the results of evaluating the spline at every coordinate in the first array. Here's a simple e [...]
+
+~~~~~~~~~~~~
+// create a 1D array containing (2D) coordinates into 'a'
  
- // create a 1D array containing (2D) coordinates into 'a'
- vigra::MultiArray < 1 , vigra::TinyVector < float , 2 > > coordinate_array ( 3 ) ;
+vigra::MultiArray < 1 , coordinate_type > coordinate_array ( 3 ) ;
  
- ... // fill in the coordinates
+... // fill in the coordinates
 
- // create an array to accomodate the result of the remap operation
- vigra::MultiArray < 1 , float > target_array ( 3 ) ;
+// create an array to accomodate the result of the remap operation
  
- // perform the remap
- remap < float , float , 2 , 1 > ( a , coordinate_array , target_array ) ;
+vigra::MultiArray < 1 , float > target_array ( 3 ) ;
  
+// perform the remap
+ 
+vspline::remap < coordinate_type , float > ( a , coordinate_array , target_array ) ;
+~~~~~~~~~~~~
+
  now the three resultant values are in the target array.
  
- And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arrays of coordinates. So if you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level object 'bspline' or the remap function, or it's relative, which doesn't construct the spline internally but takes a bspline as a parameter. The transformation-based remap function  [...]
+ This is an 'ad-hoc' remap, passing source data as an array. You can also set up a bspline object and perform a remap using an evaluator for this bspline object:
  
- While one might argue that the remap routine I present shouldn't be lumped together with the 'proper' b-spline code, I feel that only by tightly coupling it with the b-spline code I can make it really fast. And only by processing several coordinates at once (by multithreading and vectorization) the hardware can be exploited fully. I might even make remap a method of the b-spline evaluator - yet another variant of operator(),processing whole arrays of coordinates instead of merely aggreg [...]
+ vspline::remap < ev_type , 2 > ( ev , coordinate_array , target_array ) ; 
+ 
+ While this routine is also called remap, it has wider scope: 'ev_type' can be any functor providing a suitable interface for providing a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and a vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be just about any data type, you're not limited to things which can be thought of as 'coo [...]
+ 
+ This form of remap might be named 'transform' and is similar to vigra's point operator code, but uses vspline's automatic multithreading and vectorization to make it very efficient. There's a variation of it where the 'coordinate array' and the 'target array' are the same, effectively performing an in-place transformation, which is useful for things like coordinate transformations or colour space manipulations. This variation is called 'apply'.
+ 
+ Finally, there is a variation on remap called 'index_remap'. This one doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in. This variant is helpful when a remap uses a coordinate transformation before evaluating; here, the functor starts out receiving the discrete target coordinates, performs the coordinate transform and then feeds the transformed coordinate to some evaluation routine providing th [...]
+ 
+ Class vspline::unary_functor is coded to make it easy to implement functors for image processing pipelines. For more complex operations, you'd code a funtor representing your processing pipeline - often by delegating to 'inner' objects also derived from vspline::unary_functor - and finally use remap or index_remap to bulk-process your data with this functor. This is about as efficient as it gets, since the data are only accessed once, and vspline's remapping code does the tedious work o [...]
+ 
+ And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arrays of coordinates. So if you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level object 'bspline' or the remap functions.
+ 
+ While one might argue that the remap routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that only by tightly coupling it with the b-spline code I can make it really fast. And only by processing several values at once (by multithreading and vectorization) the hardware can be exploited fully. But you're free to omit the remap code, the headers build on top of each other, and remap.h is pretty much at the top.
  
 \section speed_sec Speed
 
- While performance will vary widely from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of ten runs:
+ While performance will vary widely from system to system and between different compiles, I'll quote some measurements from my own system. I include benchmarking code (roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:
 
 ~~~~~~~~~~~~~~~~~~~~~
 testing bc code MIRROR spline degree 3
-avg 10 x prefilter:........................ 15.200000 ms
-avg 10 x remap1 from pre-split coordinates: 70.500000 ms
-avg 10 x remap1 from unsplit coordinates:.. 75.000000 ms
-avg 10 x remap with internal spline:....... 110.300000 ms
-avg 10 x remap with functor & internal bspl 111.500000 ms
-avg 10 x remap with functor & external bspl 70.900000 ms
+avg 32 x prefilter:........................ 13.093750 ms
+avg 32 x remap from unsplit coordinates:... 59.218750 ms
+avg 32 x remap with internal spline:....... 75.125000 ms
+avg 32 x index_remap ...................... 57.781250 ms
 
 testing bc code MIRROR spline degree 3 using Vc
-avg 10 x prefilter:........................ 13.000000 ms
-avg 10 x remap1 from pre-split coordinates: 24.900000 ms
-avg 10 x remap1 from unsplit coordinates:.. 31.600000 ms
-avg 10 x remap with internal spline:....... 51.900000 ms
-avg 10 x remap with functor & internal bspl 51.800000 ms
-avg 10 x remap with functor & external bspl 31.400000 ms
+avg 32 x prefilter:........................ 9.562500 ms
+avg 32 x remap from unsplit coordinates:... 22.406250 ms
+avg 32 x remap with internal spline:....... 35.687500 ms
+avg 32 x index_remap ...................... 21.656250 ms
 ~~~~~~~~~~~~~~~~~~~~~
 
 As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higer-degree splines doesn't take much more time (when using Vc):
 
 ~~~~~~~~~~~~~~~~~~~~~
 testing bc code MIRROR spline degree 5 using Vc
-avg 10 x prefilter:........................ 14.000000 ms
+avg 32 x prefilter:........................ 10.687500 ms
 
 testing bc code MIRROR spline degree 7 using Vc
-avg 10 x prefilter:........................ 15.300000 ms
+avg 32 x prefilter:........................ 13.656250 ms
 ~~~~~~~~~~~~~~~~~~~~~
 
 Using double precision arithmetics, vectorization doesn't help so much, and prefiltering is actually slower on my system when using Vc. Doing a complete roundtrip run on your system should give you an idea about which mode of operation best suits your needs.
@@ -249,11 +302,13 @@ Using double precision arithmetics, vectorization doesn't help so much, and pref
  
  You can probably do everything vspline does with other software - there are several freely available implementations of b-spline interpolation and remap routines. What I wanted to create was an implementation which was as general as possible and at the same time as fast as possible, and, on top of that, comprehensive.
 
- These demands are not easy to satisfy at the same time, but I feel that my design comes  close. While generality is achieved by generic programming, speed needs exploitation of hardware features, and merely relying on the compiler is not enough. The largest speedup I saw was simply multithreading the code. This may seem like a trivial observation, but my design is influenced by it: in order to efficiently multithread, the problem has to be partitioned so that it can be processed by inde [...]
+ These demands are not easy to satisfy at the same time, but I feel that my design comes  close. While generality is achieved by generic programming, speed needs exploitation of hardware features, and merely relying on the compiler is not enough. The largest speedup I saw was from multithreading the code. This may seem like a trivial observation, but my design is influenced by it: in order to efficiently multithread, the problem has to be partitioned so that it can be processed by indepe [...]
  
  Another speedup method is data-parallel processing. This is often thought to be the domain of GPUs, but modern CPUs also offer it in the form of vector units. I chose implementing data-parallel processing in the CPU, as it offers tight integration with unvectorized CPU code. It's almost familiar terrain, and the way from writing conventional CPU code to vector unit code is not too far, when using tools like Vc, which abstract the hardware away. Using horizontal vectorization does requir [...]
+ 
+ To use vectorized evaluation efficiently, incoming data have to be presented to the evaluation code in vectorized form, but usually they will come from interleaved  memory. After the evaluation is complete, they have to be stored again to interleaved memory. The deinterleaving and interleaving operations take time and the best strategy is to load once from interleaved memory, perform all necessary operations on vector data and finally store once. The sequence of operations performed on  [...]
 
- Using both techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in float RGB and full HD, producing the images via remap from a precalculated warp array. On my system, I have almost reached that goal - my remap times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. The idea is to exploit the CPU fully, leaving the GPU, if present, free to do some [...]
+ Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via remap from a precalculated warp array. On my system, I have almost reached that goal - my remap times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main tesing ground is pv, my panorama viewer. Here I can often take the spline d [...]
  
  On the other hand, even without using vectorization, the code is certainly fast enough for casual use and may suffice for some production scenarios. This way, vigra becomes the only dependency, and the same binary will work on a wide range of hardware.
  
diff --git a/example/roundtrip.cc b/example/roundtrip.cc
index c7dda33..066b864 100644
--- a/example/roundtrip.cc
+++ b/example/roundtrip.cc
@@ -90,10 +90,10 @@ void check_diff ( const view_type& a , const view_type& b )
 
 template < class view_type , typename real_type , typename rc_type >
 void run_test ( view_type & data ,
-                 vspline::bc_code bc ,
-                 int DEGREE ,
-                 bool use_vc ,
-                 int TIMES = 16 )
+                vspline::bc_code bc ,
+                int DEGREE ,
+                bool use_vc ,
+                int TIMES = 32 )
 {
   typedef typename view_type::value_type pixel_type ;
   typedef typename view_type::difference_type Shape;
@@ -235,7 +235,7 @@ void run_test ( view_type & data ,
   check_diff<view_type> ( target , data ) ;
 
   // fourth test: perform an index_remap directly using te b-spline evaluator
-  // as the index_remap's interpolator. This is, yet again, the same, because
+  // as the index_remap's functor. This is, yet again, the same, because
   // it evaluates at all discrete positions, but now without the warp array:
   // the index_remap feeds the evaluator with the discrete coordinates.
 
@@ -249,7 +249,7 @@ void run_test ( view_type & data ,
 
 #ifdef PRINT_ELAPSED
   end = std::chrono::system_clock::now();
-  cout << "avg " << TIMES << " x index_remap with external bspl... "
+  cout << "avg " << TIMES << " x index_remap ..................... "
        << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / float(TIMES)
        << " ms" << endl ;
 #endif
diff --git a/remap.h b/remap.h
index ce40338..e1f7a4c 100644
--- a/remap.h
+++ b/remap.h
@@ -212,7 +212,7 @@ struct _fill < generator_type , 1 >
     auto target_it = output.begin() ;  
     int leftover = output.elementCount() ;
 
-  #ifdef USE_VC
+#ifdef USE_VC
 
     if ( use_vc )
     {
@@ -357,6 +357,8 @@ struct warp_generator
   const warp_array_type warp ; // must not use reference here!
   
   const unary_functor_type & itp ;
+  
+  const bool use_vc ;
 
   const unary_functor_type & get_functor()
   {
@@ -365,23 +367,26 @@ struct warp_generator
   
   warp_generator
     ( const warp_array_type & _warp ,
-      const unary_functor_type & _itp )
+      const unary_functor_type & _itp ,
+      bool _use_vc = true
+    )
   : warp ( _warp ) ,
-    itp ( _itp )
+    itp ( _itp ) ,
+    use_vc ( _use_vc )
   { } ;
 
   warp_generator < dimension , unary_functor_type , strided_warp >
     subrange ( const shape_range_type < dimension > & range ) const
   {
     return warp_generator < dimension , unary_functor_type , strided_warp >
-             ( warp.subarray ( range[0] , range[1] ) , itp ) ;
+             ( warp.subarray ( range[0] , range[1] ) , itp , use_vc ) ;
   }
   
   warp_generator < dimension - 1 , unary_functor_type , strided_warp >
     bindOuter ( const int & c ) const
   {
     return warp_generator < dimension - 1 , unary_functor_type , strided_warp >
-             ( warp.bindOuter ( c ) , itp ) ;
+             ( warp.bindOuter ( c ) , itp , use_vc ) ;
   }  
 } ;
 
@@ -406,6 +411,8 @@ struct warp_generator < 1 , unary_functor_type , strided_warp >
   
   const unary_functor_type & itp ;
   
+  const bool use_vc ;
+  
   const unary_functor_type & get_functor()
   {
     return itp ;
@@ -413,16 +420,22 @@ struct warp_generator < 1 , unary_functor_type , strided_warp >
   
   warp_generator
     ( const warp_array_type & _warp ,
-      const unary_functor_type & _itp )
+      const unary_functor_type & _itp ,
+      bool _use_vc = true
+    )
   : warp ( _warp ) ,
     stride ( _warp.stride(0) ) ,
     itp ( _itp ) ,
     witer ( _warp.begin() ) ,
-    data ( _warp.data() )
+    data ( _warp.data() ) ,
+    use_vc ( _use_vc )
   {
 #ifdef USE_VC
-    int aggregates = warp.size() / vsize ;
-    witer += aggregates * vsize ;
+    if ( use_vc )
+    {
+      int aggregates = warp.size() / vsize ;
+      witer += aggregates * vsize ;
+    }
 #endif
   } ;
 
@@ -472,7 +485,7 @@ struct warp_generator < 1 , unary_functor_type , strided_warp >
     subrange ( const shape_range_type < 1 > & range ) const
   {
     return warp_generator < 1 , unary_functor_type , strided_warp >
-             ( warp.subarray ( range[0] , range[1] ) , itp ) ;
+             ( warp.subarray ( range[0] , range[1] ) , itp , use_vc ) ;
   }
 
 } ;
@@ -542,14 +555,14 @@ void remap ( const unary_functor_type & ev ,
   {
     //                                set strided_warp to false vvv
     typedef warp_generator < dim_target , unary_functor_type , false > gen_t ;  
-    gen_t gen ( warp , ev ) ;  
+    gen_t gen ( warp , ev , use_vc ) ;  
     fill < gen_t , dim_target > ( gen , output , use_vc ) ;
   }
   else
   {
     // warp array is strided even in dimension 0
     typedef warp_generator < dim_target , unary_functor_type , true > gen_t ;  
-    gen_t gen ( warp , ev ) ;  
+    gen_t gen ( warp , ev , use_vc ) ;  
     fill < gen_t , dim_target > ( gen , output , use_vc ) ;
   }
 }
@@ -649,6 +662,7 @@ struct index_generator
   
   const unary_functor_type & itp ;
   const shape_range_type < dimension > range ;
+  const bool use_vc ;
   
   const unary_functor_type & get_functor()
   {
@@ -657,17 +671,19 @@ struct index_generator
   
   index_generator
     ( const unary_functor_type & _itp ,
-      const shape_range_type < dimension > _range
+      const shape_range_type < dimension > _range ,
+      bool _use_vc = true
     )
   : itp ( _itp ) ,
-    range ( _range )
+    range ( _range ) ,
+    use_vc ( _use_vc )
   { } ;
 
   index_generator < unary_functor_type , level >
     subrange ( const shape_range_type < dimension > range ) const
   {
     return index_generator < unary_functor_type , level >
-             ( itp , range ) ;
+             ( itp , range , use_vc ) ;
   }
   
   index_generator < unary_functor_type , level - 1 >
@@ -679,7 +695,7 @@ struct index_generator
     slice_end [ level ] = slice_start [ level ] + 1 ;
     
     return index_generator < unary_functor_type , level - 1 >
-             ( itp , shape_range_type < dimension > ( slice_start , slice_end ) ) ;
+             ( itp , shape_range_type < dimension > ( slice_start , slice_end ) , use_vc ) ;
   }  
 } ;
 
@@ -719,7 +735,8 @@ struct index_generator < unary_functor_type , 0 >
 
   const unary_functor_type & itp ;
   const shape_range_type < dimension > range ;
-
+  const bool use_vc ;
+  
   const unary_functor_type & get_functor()
   {
     return itp ;
@@ -727,26 +744,31 @@ struct index_generator < unary_functor_type , 0 >
   
   index_generator
     ( const unary_functor_type & _itp ,
-      const shape_range_type < dimension > _range
+      const shape_range_type < dimension > _range ,
+      bool _use_vc = true
     )
   : itp ( _itp ) ,
-    range ( _range )
+    range ( _range ) ,
+    use_vc ( _use_vc )
   {
     // initially, set the singular index to the beginning of the range
     current = index_type ( range[0] ) ;
     
 #ifdef USE_VC
     
-    // initialize current_v to hold the first simdized index
-    for ( int d = 0 ; d < dimension ; d++ )
-      current_v[d] = index_ele_v ( range[0][d] ) ;
-    current_v[0] += index_ele_v::IndexesFromZero() ;
-    
-    // if vc is used, the singular index will only be used for mop-up action
-    // after all aggregates have been processed.
-    int size = range[1][0] - range[0][0] ;
-    int aggregates = size / vsize ;
-    current[0] += index_ele_type ( aggregates * vsize ) ; // for mop-up
+    if ( use_vc )
+    {
+      // initialize current_v to hold the first simdized index
+      for ( int d = 0 ; d < dimension ; d++ )
+        current_v[d] = index_ele_v ( range[0][d] ) ;
+      current_v[0] += index_ele_v::IndexesFromZero() ;
+      
+      // if vc is used, the singular index will only be used for mop-up action
+      // after all aggregates have been processed.
+      int size = range[1][0] - range[0][0] ;
+      int aggregates = size / vsize ;
+      current[0] += index_ele_type ( aggregates * vsize ) ; // for mop-up
+    }
 
 #endif
 
@@ -783,7 +805,7 @@ struct index_generator < unary_functor_type , 0 >
     subrange ( const shape_range_type < dimension > range ) const
   {
     return index_generator < unary_functor_type , 0 >
-             ( itp , range ) ;
+             ( itp , range , use_vc ) ;
   }
 } ;
 
@@ -827,7 +849,7 @@ void index_remap ( const unary_functor_type & ev ,
   typedef index_generator < unary_functor_type , dim_target - 1 > gen_t ;
 
   shape_range_type < dim_target > range ( nd_ic_type() , output.shape() ) ;  
-  gen_t gen ( ev , range ) ;  
+  gen_t gen ( ev , range , use_vc ) ;  
   fill < gen_t , dim_target > ( gen , output , use_vc ) ;
 }
 
diff --git a/vspline.doxy b/vspline.doxy
index 60b89fd..7412ac1 100644
--- a/vspline.doxy
+++ b/vspline.doxy
@@ -58,7 +58,7 @@ PROJECT_LOGO           =
 # entered, it will be relative to the location where doxygen was started. If
 # left blank the current directory will be used.
 
-OUTPUT_DIRECTORY       = doc
+OUTPUT_DIRECTORY       = /home/kfj/vspline_doc
 
 # If the CREATE_SUBDIRS tag is set to YES, then doxygen will create 4096 sub-
 # directories (in 2 levels) under the output directory of each output format and

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/vspline.git