[vspline] 63/72: changed gather/scatter with extrusion code and it's use Given a simdized type ST, I noticed recently that, apart from using ST::IndexType (which uses int) for gather/scatter operations, I can use any type offering operator[] and holding the required number of indexes. Initially I filed this under nice-to-have, but since the code I'm working on produces indexes initially as ptrdiff_t (which is long on my system), I experimentally coded passing vigra::TinyVector<ptrdiff_t, N> as index type, vigra::TinyVector being a fixed size container type. To my surprise, the resulting code was faster than the code using ST::IndexType, so it looks like a win-win situation: I don't need to safeguard against index overflow (as unlikely as that may be) and I get faster code.
Kay F. Jahnke
kfj-guest at moszumanska.debian.org
Sun Jul 2 09:02:43 UTC 2017
This is an automated email from the git hooks/post-receive script.
kfj-guest pushed a commit to branch master
in repository vspline.
commit 669e7680d2158da0262e1f3fd0e4e5fa67551706
Author: Kay F. Jahnke <kfjahnke at gmail.com>
Date: Mon May 15 09:30:54 2017 +0200
changed gather/scatter with extrusion code and it's use
Given a simdized type ST, I noticed recently that, apart from using
ST::IndexType (which uses int) for gather/scatter operations, I can
use any type offering operator[] and holding the required number of
indexes. Initially I filed this under nice-to-have, but since the code
I'm working on produces indexes initially as ptrdiff_t
(which is long on my system), I experimentally coded passing
vigra::TinyVector<ptrdiff_t,N> as index type, vigra::TinyVector
being a fixed size container type. To my surprise, the resulting
code was faster than the code using ST::IndexType, so it looks like
a win-win situation: I don't need to safeguard against index overflow
(as unlikely as that may be) and I get faster code.
---
common.h | 10 ++
filter.h | 387 ++++++++++++++++++++++++++++++++++++++++++---------------------
2 files changed, 269 insertions(+), 128 deletions(-)
diff --git a/common.h b/common.h
index f086123..6d65172 100644
--- a/common.h
+++ b/common.h
@@ -158,6 +158,16 @@ struct out_of_bounds
{
} ;
+/// exception which is thrown when an assiging an rvalue which is larger than
+/// what the lvalue can hold
+
+struct numeric_overflow
+: std::invalid_argument
+{
+ numeric_overflow ( const char * msg )
+ : std::invalid_argument ( msg ) { } ;
+} ;
+
/// This enumeration is used for codes connected to boundary conditions. There are
/// two aspects to boundary conditions: During prefiltering, if the implicit scheme is used,
/// the initial causal and anticausal coefficients have to be calculated in a way specific to
diff --git a/filter.h b/filter.h
index 4dbd05d..bd467e0 100644
--- a/filter.h
+++ b/filter.h
@@ -58,18 +58,10 @@
While the initial purpose for the code in this file was, of course, b-spline prefiltering,
the generalized version I present here can be used for arbitrary filters. There is probably
- one filter which is most useful in the context of vspline: passing a single positive pole
- in the range of ] 0 , 1 [ smoothes the signal very efficiently.
+ one other filter which is most useful in the context of vspline: passing a single positive
+ pole in the range of ] 0 , 1 [ smoothes the signal very efficiently.
*/
-// TODO: with buffering used throughout, we might make the code more general:
-// input could be of any type convertible to a type the 1D filter can handle,
-// and would be cast to the type the buffer contains when the buffer is filled.
-// after completion of the filter, the data might be submitted to yet another cast
-// on copying them out to the target. With such a scheme, operation on, for example,
-// integral data, would become possible without need for converting the data first to
-// an nD array of real values.
-
// include common.h for the border condition codes
#include <vector>
@@ -121,7 +113,7 @@ double overall_gain ( const int & nbpoles , const double * const pole )
/// type of the iterator's value_type. When the value_types are vigra aggregates
/// (TinyVectors etc.) vigra's ExpandElementResult mechanism will provide, but at times
/// we may wish to be explicit here, e.g. when iterating over simdized types.
-///
+
template < typename in_iter , // iterator over the knot point values
typename out_iter , // iterator over the coefficient array
typename real_type > // type for single real value for calculations
@@ -777,12 +769,16 @@ filter ( int _M , ///< number of input/output elements (DataLength
/// starting at source and deposting compactly at target. scatter performs the reverse
/// operation. source_type and target_type can be different; on assignment source_type is
/// simply cast to target_type.
+/// index_type is passed in as a template argument, allowing for wider types than int,
+/// so these routines can also operate on very large areas of memory.
-template < typename source_type , typename target_type = source_type >
+template < typename source_type ,
+ typename target_type = source_type ,
+ typename index_type = int >
void gather ( const source_type* source ,
target_type* target ,
- const int & stride ,
- int count
+ const index_type & stride ,
+ index_type count
)
{
while ( count-- )
@@ -793,11 +789,13 @@ void gather ( const source_type* source ,
}
}
-template < typename source_type , typename target_type = source_type >
+template < typename source_type ,
+ typename target_type = source_type ,
+ typename index_type = int >
void scatter ( const source_type* source ,
target_type* target ,
- const int & stride ,
- int count
+ const index_type & stride ,
+ index_type count
)
{
while ( count-- )
@@ -815,6 +813,8 @@ void scatter ( const source_type* source ,
/// especially with higher-order filters. On my system, I found I broke even even with only
/// one pole, so there is no special treatment here for low-order filtering (TODO confirm)
/// note the use of range_type<T>, which is from multithread.h
+/// we derive the index type for the call to the monadic gather/scatter routines
+/// automatically, so here it comes out as vigra's difference_type_1
template < class source_view_type ,
class target_view_type ,
@@ -841,7 +841,7 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
const auto source = p_original_source->subarray ( range[0] , range[1] ) ;
auto target = p_original_target->subarray ( range[0] , range[1] ) ;
- int count = source.shape ( axis ) ;
+ auto count = source.shape ( axis ) ;
/// we use a buffer of count value_types
@@ -854,13 +854,9 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
typedef filter < iter_type , iter_type , math_type > filter_type ;
filter_type solver ( count , gain , bc , nbpoles , pole , tolerance ) ;
- // offset into the source slice which will be used for gather/scatter
-
- int source_index ;
-
// next slice is this far away:
- int source_stride = source.stride ( axis ) ;
+ auto source_stride = source.stride ( axis ) ;
auto source_base_adress = source.data() ;
auto buffer_base_adress = buffer.data() ;
@@ -896,7 +892,7 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
// copy from the array to the buffer with a monadic gather, casting to
// math_type in the process
- source_index = &(*source_sliter) - source_base_adress ;
+ auto source_index = &(*source_sliter) - source_base_adress ;
gather < source_type , math_type > ( source_base_adress + source_index ,
buffer_base_adress ,
@@ -931,14 +927,13 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
auto source_sliter_end = source_slice.end() ;
auto target_slice = target.bindAt ( axis , 0 ) ;
- int target_stride = target.stride ( axis ) ;
+ auto target_stride = target.stride ( axis ) ;
auto target_sliter = target_slice.begin() ;
- int target_index ;
while ( source_sliter < source_sliter_end )
{
- source_index = &(*source_sliter) - source_base_adress ;
- target_index = &(*target_sliter) - target_base_adress ;
+ auto source_index = &(*source_sliter) - source_base_adress ;
+ auto target_index = &(*target_sliter) - target_base_adress ;
gather < source_type , math_type > ( source_base_adress + source_index ,
buffer_base_adress ,
@@ -971,8 +966,6 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
/// and deposits in target, which is compact.
/// The scatter routine scatters from source, which points to compact memory,
/// and deposits in target, which points to strided memory.
-/// The routine is templated by the vector type used for the single gather/scatter
-/// operations, it can be a Vc::Vector or a Vc::SimdArray.
/// Initially I coded using load/store operations to access the 'non-compact'
/// memory as well, if the indexes were contiguous, but surprisingly, this was
/// slower. I like the concise expression with this code - instead of having
@@ -980,28 +973,41 @@ void nonaggregating_filter ( range_type < typename source_view_type::difference_
/// the modus operandi is determined by the indices and mask passed, which is
/// relatively cheap as it occurs only once, while the inner loop can just
/// rip away.
-
-template < typename source_type , typename target_type = source_type >
+/// per default, the type used for gather/scatter indices (gs_indexes_type)
+/// will be what Vc deems appropriate. This comes out as an SIMD type composed
+/// of int, and ought to result in the fastest code on the machine level.
+/// But since the only *requirement* on gather/scatter indices is that they
+/// offer a subscript operator (and hold enough indices), other types can be
+/// used as gs_indexes_type as well. Below I make the disticzion and pass in
+/// a TinyVector of ptrdiff_t if int isn't sufficiently large to hold the
+/// intended indices. On my system, this is actually faster.
+
+template < typename source_type , // (singular) source type
+ typename target_type , // (simdized) target type
+ typename index_type , // (singular) index type for stride, count
+ typename gs_indexes_type > // type for gather/scatter indices
void
-gather ( const typename source_type::value_type * source ,
- typename target_type::value_type * target ,
- const typename source_type::IndexType & indexes ,
- const typename source_type::Mask & mask ,
- const int & stride ,
- int count
+gather ( const source_type * source ,
+ target_type * target ,
+ const gs_indexes_type & indexes ,
+ const typename target_type::Mask & mask ,
+ const index_type & stride ,
+ index_type count
)
{
- register source_type x ;
+ // fix the type into which to gather source data
+ enum { vsize = target_type::Size } ;
+ typedef typename Vc::SimdArray < source_type , vsize > simdized_source_type ;
+
// if the mask is all-true, load the data with an unmasked gather operation
if ( mask.isFull() )
{
while ( count-- )
{
- x.gather ( source , indexes ) ;
- auto tx = target_type ( x ) ;
- tx.store ( target , Vc::Aligned ) ;
+ simdized_source_type x ( source , indexes ) ;
+ * target = target_type ( x ) ;
source += stride ;
- target += target_type::Size ;
+ ++ target ;
}
}
else
@@ -1009,35 +1015,39 @@ gather ( const typename source_type::value_type * source ,
// if there is a partially filled mask, perform a masked gather operation
while ( count-- )
{
- x.gather ( source , indexes , mask ) ;
+ simdized_source_type x ( source , indexes , mask ) ;
+ * target = target_type ( x ) ;
source += stride ;
- auto tx = target_type ( x ) ;
- tx.store ( target , Vc::Aligned ) ;
- target += target_type::Size ;
+ ++ target ;
}
}
}
-template < typename source_type , typename target_type = source_type >
+template < typename source_type , // (simdized) source type
+ typename target_type , // (singular) target type
+ typename index_type , // (singular) index type for stride, count
+ typename gs_indexes_type > // type for gather/scatter indices
void
-scatter ( const typename source_type::value_type * source ,
- typename target_type::value_type * target ,
- const typename target_type::IndexType & indexes ,
- const typename target_type::Mask & mask ,
- const int & stride ,
- int count
+scatter ( const source_type * source ,
+ target_type * target ,
+ const gs_indexes_type & indexes ,
+ const typename source_type::Mask & mask ,
+ const index_type & stride ,
+ index_type count
)
{
- register source_type x ;
- // if the mask is all-true, store the data with an unmasked scatter operation
+ // fix the type from which to scatter target data
+ enum { vsize = source_type::Size } ;
+ typedef typename Vc::SimdArray < target_type , vsize > simdized_target_type ;
+
+ // if the mask is full, deposit with an unmasked scatter
if ( mask.isFull() )
{
while ( count-- )
{
- x.load ( source , Vc::Aligned ) ;
- auto tx = target_type ( x ) ;
- tx.scatter ( target , indexes ) ;
- source += source_type::Size ;
+ simdized_target_type x ( *source ) ;
+ x.scatter ( target , indexes ) ;
+ ++ source ;
target += stride ;
}
}
@@ -1046,10 +1056,9 @@ scatter ( const typename source_type::value_type * source ,
// if there is a partially filled mask, perform a masked scatter operation
while ( count-- )
{
- x.load ( source , Vc::Aligned ) ;
- source += source_type::Size ;
- auto tx = target_type ( x ) ;
- tx.scatter ( target , indexes , mask ) ;
+ simdized_target_type x ( *source ) ;
+ x.scatter ( target , indexes , mask ) ;
+ ++ source ;
target += stride ;
}
}
@@ -1065,8 +1074,20 @@ scatter ( const typename source_type::value_type * source ,
/// comparison. Depending on data type, array size and spline degree, sometimes the
/// nonvectorized code is faster. But as both grow, bufering soon comes out on top.
/// ele_aggregating_filter is a subroutine processing arrays of elementary value_type.
-/// It's used by aggregating_filter, either directly if the arrays' value_type is already
-/// an elementary type, or after element-expanding the array(s).
+/// It's used by aggregating_filter, after element-expanding the array(s).
+/// With this vectorized routine and the size of gather/scatter indices used by Vc
+/// numeric overflow could occur: the index type is only int, while it's assigned a
+/// ptrdiff_t, which it may not be able to represent. The overflow can happen when
+/// a gather/scatter spans a too-large memory area. The gather/scatter indices will
+/// be set up so that the first index is always 0 (by using the adress of the first
+/// storee, not the array base adress), but even though this makes it less likely for
+/// the overflow to occur, it still can happen. In this case the code falls back
+/// to using a vigra::TinyVector < ptrdiff_t > as gather/scatter index type, which
+/// may cause Vc to use less performant code for the gather/scatter operations but
+/// is safe.
+/// TODO: On my system, surprisingly, the variant passing TinyVector<ptrdiff_t> to
+/// the gather/scatter routine performs slightly faster than the presumedly optimal
+/// variant.
template < typename source_view_type ,
typename target_view_type ,
@@ -1085,7 +1106,8 @@ void ele_aggregating_filter ( source_view_type &source ,
// which are used as simdized type in evaluation
typedef typename Vc::Vector < math_type > simdized_math_type ;
- const int vsize ( simdized_math_type::Size ) ; // number of math_type in a simdized_type
+ // number of math_type in a simdized_math_type
+ const int vsize ( simdized_math_type::Size ) ;
typedef typename source_view_type::value_type source_type ;
typedef typename Vc::SimdArray < source_type , vsize > simdized_source_type ;
@@ -1093,18 +1115,24 @@ void ele_aggregating_filter ( source_view_type &source ,
typedef typename target_view_type::value_type target_type ;
typedef typename Vc::SimdArray < target_type , vsize > simdized_target_type ;
- typedef typename simdized_math_type::IndexType index_type ; // indexes for gather/scatter
- typedef typename simdized_math_type::MaskType mask_type ; // and mask for masked operation
+ // indexes for gather/scatter. first the 'optimal' type, which Vc produces as
+ // the IndexType for simdized_math_type. Next a wider type composed of std::ptrdiff_t,
+ // to be used initially when calculating the indices, and optionally later for the
+ // actual gather/scatter operations if gs_indexes_type isn't wide enough.
+
+ typedef typename simdized_math_type::IndexType gs_indexes_type ;
+ typedef vigra::TinyVector < std::ptrdiff_t , vsize > comb_type ;
+
+ // mask type for masked operation
+ typedef typename simdized_math_type::MaskType mask_type ;
- int count = source.shape ( axis ) ; // number of vectors we'll process
+ auto count = source.shape ( axis ) ; // number of vectors we'll process
// I initially tried to use Vc::Memory, but the semantics of the iterator obtained
// by buffer.begin() didn't work for me.
// anyway, a MultiArray with the proper allocator works just fine, and the dereferencing
// of the iterator needed in the solver works without further ado.
- // Vc::Memory < simdized_type > buffer ( count * vsize ) ; // doesn't work for me
-
vigra::MultiArray < 1 , simdized_math_type , Vc::Allocator<simdized_math_type> >
buffer ( count ) ;
@@ -1115,7 +1143,7 @@ void ele_aggregating_filter ( source_view_type &source ,
// set of offsets into the source slice which will be used for gather/scatter
- index_type source_indexes ;
+ comb_type source_indexes ;
// while we don't hit the last odd few 1D subarrays the mask is all-true
@@ -1123,7 +1151,7 @@ void ele_aggregating_filter ( source_view_type &source ,
// next slice is this far away:
- int source_stride = source.stride ( axis ) ;
+ auto source_stride = source.stride ( axis ) ;
// we want to use the extended gather/scatter (with 'extrusion'), so we need the
// source and target pointers. Casting buffer's data pointer to math_type is safe,
@@ -1131,9 +1159,12 @@ void ele_aggregating_filter ( source_view_type &source ,
// in disguise.
auto source_base_adress = source.data() ;
- auto buffer_base_adress = (math_type*) (buffer.data()) ;
+ auto buffer_base_adress = buffer.data() ;
auto target_base_adress = target.data() ;
+ gs_indexes_type source_gs_indexes ;
+ gs_indexes_type target_gs_indexes ;
+
// we create a solver object capable of handling the iterator producing the successive
// simdized_types from the buffer. While the unvectorized code can omit passing the third
// template argument (the elementary type used inside the solver) we pass it here, as we
@@ -1166,16 +1197,23 @@ void ele_aggregating_filter ( source_view_type &source ,
auto source_sliter = permuted_slice.begin() ;
auto source_sliter_end = permuted_slice.end() ;
-
+
while ( source_sliter < source_sliter_end )
{
- // try loading vsize successive offsets into an index_type
+ // try loading vsize successive offsets into an comb_type
int e ;
+ // we base the operation so that the first entry in source_indexes
+ // will come out 0.
+
+ auto first_source_adress = &(*source_sliter) ;
+ auto offset = first_source_adress - source_base_adress ;
+ auto first_target_adress = target_base_adress + offset ;
+
for ( e = 0 ; e < vsize && source_sliter < source_sliter_end ; ++e , ++source_sliter )
- source_indexes[e] = &(*source_sliter) - source_base_adress ;
+ source_indexes[e] = &(*source_sliter) - first_source_adress ;
if ( e < vsize )
@@ -1183,39 +1221,91 @@ void ele_aggregating_filter ( source_view_type &source ,
// mask was all-true before, so now we limit it to the first e fields:
mask = ( simdized_math_type::IndexesFromZero() < e ) ;
+
+// The next bit of code, which is commented out, tried to use an 'optimal' type
+// for gather/scatter indices by picking simdized_math_type::IndexType. But it
+// turned out the resulting code was slower on my system and requires testing for
+// overflow. So I leave the code in but I'm not using it:
- // perform extended gather with extrusion parameters to transport the unfiltered data
- // to the buffer
-
- gather < simdized_source_type , simdized_math_type >
- ( source_base_adress ,
- buffer_base_adress ,
- source_indexes ,
- mask ,
- source_stride ,
- count ) ;
-
- // finally (puh): apply the prefilter, using the solver in-place, iterating over
- // the vectors in buffer with maximum efficiency.
-
- solver.solve ( buffer.begin() ) ;
-
- // and perform extended scatter with extrusion parameters to write the filtered data
- // to the destination
-
- scatter < simdized_math_type , simdized_target_type >
- ( buffer_base_adress ,
- target_base_adress ,
- source_indexes ,
- mask ,
- source_stride ,
- count ) ;
+// // next we assign the indices (which are ptrdiff_t) to the intended type
+// // for gather/scatter indices - which is what Vc deems appropriate. This should
+// // be the optimal choice in terms of performance. Yet we can't be certain that
+// // the ptrdiff_t values actually fit into this type, which is usually composed of
+// // int only. So we test if the assigned value compares equal to the assignee.
+// // If the test fails for any of the indices, we switch to code using a
+// // vigra::TinyVector < ptrdiff_t > for the indices, which is permissible, since
+// // TinyVector offers operator[], but may be less efficient. On my system I could
+// // not detect a run-time penalty, but I leave the test and code differentiation
+// // in nevertheless.
+//
+// bool fits = true ;
+// for ( e = 0 ; e < vsize ; e++ )
+// {
+// source_gs_indexes[e] = source_indexes[e] ;
+// if ( source_gs_indexes[e] != source_indexes[e] )
+// fits = false ;
+// }
+//
+// if ( fits )
+// {
+// // perform extended gather with extrusion parameters to transport the unfiltered data
+// // to the buffer, passing in source_gs_indexes for best performance.
+//
+// gather
+// ( first_source_adress ,
+// buffer_base_adress ,
+// source_gs_indexes ,
+// mask ,
+// source_stride ,
+// count ) ;
+//
+// // finally (puh): apply the prefilter, using the solver in-place, iterating over
+// // the vectors in buffer with maximum efficiency.
+//
+// solver.solve ( buffer.begin() ) ;
+//
+// // and perform extended scatter with extrusion parameters to write the filtered data
+// // to the destination
+//
+// scatter
+// ( buffer_base_adress ,
+// first_target_adress ,
+// source_gs_indexes ,
+// mask ,
+// source_stride ,
+// count ) ;
+// }
+// else
+ {
+ // Since the indices did not fit into the optimal type for gather/scatter
+ // indices, we pass in a wider type, which may reduce performance, but is
+ // necessary under the circumstances. But this should rarely happen:
+ // it would mean a gather/scatter spanning several GB.
+
+ gather
+ ( first_source_adress ,
+ buffer_base_adress ,
+ source_indexes ,
+ mask ,
+ source_stride ,
+ count ) ;
+
+ solver.solve ( buffer.begin() ) ;
+
+ scatter
+ ( buffer_base_adress ,
+ first_target_adress ,
+ source_indexes ,
+ mask ,
+ source_stride ,
+ count ) ;
+ }
}
}
else
{
- // pretty much the same as the previouse operation, with the distinction that
- // copying the filtered data from the buffer to the target now needs it's own set of
+ // pretty much the same as the if(...) case, with the distinction that copying
+ // the filtered data from the buffer to the target now needs it's own set of
// indexes etc., since all these may be different.
// TODO we might permute source_slice's strides to ascending and apply the same
@@ -1226,35 +1316,77 @@ void ele_aggregating_filter ( source_view_type &source ,
auto source_sliter_end = source_slice.end() ;
auto target_slice = target.bindAt ( axis , 0 ) ;
- int target_stride = target.stride ( axis ) ;
+ auto target_stride = target.stride ( axis ) ;
auto target_sliter = target_slice.begin() ;
- index_type target_indexes ;
+ comb_type target_indexes ;
while ( source_sliter < source_sliter_end )
{
int e ;
- for ( e = 0 ; e < vsize && source_sliter < source_sliter_end ; ++e , ++source_sliter , ++target_sliter )
+ auto first_source_adress = &(*source_sliter) ;
+ auto first_target_adress = &(*target_sliter) ;
+
+ for ( e = 0 ;
+ e < vsize && source_sliter < source_sliter_end ;
+ ++e , ++source_sliter , ++target_sliter )
{
- source_indexes[e] = &(*source_sliter) - source_base_adress ;
- target_indexes[e] = &(*target_sliter) - target_base_adress ;
+ source_indexes[e] = &(*source_sliter) - first_source_adress ;
+ target_indexes[e] = &(*target_sliter) - first_target_adress ;
}
if ( e < vsize )
mask = ( simdized_math_type::IndexesFromZero() < e ) ;
- gather < simdized_source_type , simdized_math_type >
- ( source_base_adress ,
- buffer_base_adress ,
- source_indexes ,
- mask ,
- source_stride ,
- count ) ;
- solver.solve ( buffer.begin() ) ;
- scatter < simdized_math_type , simdized_target_type >
- ( buffer_base_adress ,
- target_base_adress ,
- target_indexes ,
- mask ,
- target_stride ,
- count ) ;
+
+// The next bit of code, which is commented out, tried to use an 'optimal' type
+// for gather/scatter indices by picking simdized_math_type::IndexType. But it
+// turned out the resulting code was slower on my system and requires testing for
+// overflow. So I leave the code in but I'm not using it:
+
+// bool fits = true ;
+// for ( e = 0 ; e < vsize ; e++ )
+// {
+// source_gs_indexes[e] = source_indexes[e] ;
+// target_gs_indexes[e] = target_indexes[e] ;
+// if ( source_gs_indexes[e] != source_indexes[e]
+// || target_gs_indexes[e] != target_indexes[e] )
+// fits = false ;
+// }
+//
+// if ( fits )
+// {
+// gather
+// ( first_source_adress ,
+// buffer_base_adress ,
+// source_gs_indexes ,
+// mask ,
+// source_stride ,
+// count ) ;
+// solver.solve ( buffer.begin() ) ;
+// scatter
+// ( buffer_base_adress ,
+// first_target_adress ,
+// target_gs_indexes ,
+// mask ,
+// target_stride ,
+// count ) ;
+// }
+// else
+ {
+ gather
+ ( first_source_adress ,
+ buffer_base_adress ,
+ source_indexes ,
+ mask ,
+ source_stride ,
+ count ) ;
+ solver.solve ( buffer.begin() ) ;
+ scatter
+ ( buffer_base_adress ,
+ first_target_adress ,
+ target_indexes ,
+ mask ,
+ target_stride ,
+ count ) ;
+ }
}
}
}
@@ -1369,8 +1501,7 @@ public:
// obtain a partitioning of the data array into subranges. We do this 'manually' here
// because we must instruct shape_splitter not to chop up the current processing axis
- // (by passing the 'forbid' parameter, which is taken as -1 (no effect) in the default
- // partitioning of nD arrays).
+ // (by passing axis as the 'forbid' parameter)
auto partitioning = shape_splitter<dim>::part ( input.shape() , njobs , axis ) ;
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/vspline.git
More information about the debian-science-commits
mailing list