[SCM] libm4ri: library of Method of the Four Russians Inversion annotated tag, debian/20130416-3, created. debian/20130416-3
Cédric Boutillier
boutil at debian.org
Mon Jun 17 20:05:00 UTC 2013
The annotated tag, debian/20130416-3 has been created
at bfcb10289709309a68323afd514ead2cbea4d058 (tag)
tagging 8388256663b8819c5021ea583fd929bb72fbb0ea (commit)
tagged by Cédric Boutillier
on Mon Jun 17 21:44:36 2013 +0200
- Shortlog ------------------------------------------------------------
libm4ri Debian release 20130416-3
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRv2ckAAoJENpJWPYR4UnpSesP/0YCI27xbYSRR2CgnAJ5RphU
iR6gYp3m3VvCEc9N5TaQozUS4xA8Btljuk5u1fJMqvJvOzJ/xFkC6VgiJYhFHXzg
/8PKD/GzeCJepEIflEJ5UROo0aPNiQDR2qqZvnJa3ecPof1+fEjXmrKq+ZsHOx5H
FZqHIIGd24603V6VqQEl4a5MBaQAoGevm4bVV6OoYWJsVtVFWqKzkTOOh4ISDl6d
b7DEoQpeSRFcyuqHZfzmTbnI/u4kwhL//kJpJsyJwxVZEiCVy6efynZkLbirrbaG
/LohlFGShmwJjl2haw9L+WDUlxglkCAw0fS4QR6CGdy76wh+t1ATH3XQGAJpMgT9
NYYtlM8wOU0eIx8Onk8ps++qs/cIJEW3OzX19vF7IrpSyMhrwYv5pLoG3R3ncGbr
OHsCz9cxTNhalZuLRndz5XHfIS2On+KXpXoBuvWpFluM55Jn1fFJB0Z+4Q9FZIQ0
7yu+7bfVSHnsenXC1sFsJ49Pf8JI7CSNiJ92LVBfuP7OmdRwWHP6E26MOP8JSqhT
/GM6i1SQa4auMJq37Zbv9nuMkU3kLryI87Mrrc6L+9rT2UfHiUPVMN93bW8VXnZU
07/JMbVrRxD8rVpUpvDAGWaM6iD2LGCMr0RS+DJCxTGM7OS4sMWcIiYZdSShLQ83
JugKkEOsNwzrM+QiqDjX
=QfRk
-----END PGP SIGNATURE-----
Alexander Dreyer (1):
FIX: m4ri called exit in library code
Carlo Wood (105):
Added cwautomacro's 'autogen.sh' to generate auto tool files.
Use the canonical 64-bit type for word.
Get rid of explicit unsigned long long constants.
Fixed a typo.
Bit hack speed ups and documentation fixes for misc.h.
Fix inclusion of misc.h.
More bit hack improvements.
More micro optimizations and a bug fix.
Final micro optimizations in packedmatrix.h.
Compiler warning fixes.
Add back the +1 to the result of log2_floor.
Make sure the correct library is being used at run time for the testsuite.
Introduction of MIDDLE_BITMASK.
Hijack testsuite/Makefile to compile matops.
Rewrite of _mzd_transpose_direct
Code alignment that makes _mzd_mul_naive-64 20% faster (or not).
Remove dependency on cwautomacros.
Introduction of mzd_read_bits_int.
Implement WRAPWORD
Make things work for both, g++ and gcc.
Work in progress
Explicitely convert a word to BIT.
Explicitely use CONVERT_TO_WORD every time an int (or BIT) is converter into word.
Explicitely use CONVERT_TO_UINT64_T when a word is transformed to integer.
testsuite changes
Use uint64_t, not word, when we are dealing with 64-bit integers.
Bug fix for _mzd_transpose_direct
Use sizeof(word) where appropriate.
Out of range bug fix.
Add braces around expressions with & used as truth value.
Introduction of M4RI_WORDWRAP and the C++ class word.
Fake merge of dead head
Reversed the bit order of the internal representation of class word.
Move reversal from word(uint64_t) to CONVERT_TO_WORD.
Export reversal from CONVERT_TO_WORD to code.
Cancel reversals in word::operator-(void) const and WRITE_BIT.
Change shift operators back to their original state.
Remove word::operator-(int).
Added extra asserts to make sure that shifts are within defined range.
Bring word::convert_to_BIT back to its original state.
Bring word::convert_to_int back to its original state.
Remove last traces of reverse.
Reverse ONE.
Minor cleanup of misc.h.
Use int consistently (needed for wordwrapper)
Remove the FIXME from _mzd_transpose_direct_128
Removed FIXME from _mzd_addmul_weird_weird
Type and whitespace clean up.
Type and whitespace clean up (part 2).
Merge with https://bitbucket.org/malb/m4ri changeset 7d7a103dfba3
Benchmark facelift.
Random benchmark improvements.
Use TOPSRCDIR Makefile var instead of PWD. Inverse random bits when needed.
Bug fix in m4ri_random_word.
Duplicated code of m4ri_randomize m4ri_random_word to benchmarketing.c
Take BENCH_RANDOM_REVERSE into account in bench_randomize.
Fixed copyright header in testsuite/test_random.c.
Added general benchmark program for individual packedmatrix functions.
Merge with malb
Improved printed Usage output.
Bug fix in print_complexity1_human and complexity code updates.
Minor changes to mzd_first_zero_row
Packedmatrix benchmark fixes.
Fix constness of packedmatrix mzd_t input pointers.
Create a randomize matrix for each call for mzd_gauss_delayed and mzd_echelonize_naive.
Add LIKELY/UNLIKELY macros for future use
Fix order of calloc function parameters.
Added support for PAPI.
Determine and use LIBPAPI_PATH.
Also search for papi.h by using -I include flags.
Prefix all exported variables, functions and macros.
Move _mmc_ code from misc.h to packedmatrix.c.
Bug fix for crash of bench_* programs.
Bug fix, forgot a few instances of CPU_L2_CACHE.
__M4RI_ENABLE_MMC juggling and support for posix_memalign
Moved mmc functions to their own file.
Doxygen warning fixes.
Merge with malb
Added forgotten m4/ax_func_posix_memalign.m4
Compiler warning fixes.
Allow to only dump a single counter.
Add dependency on m4ri headers to testsuite.
Make it harder for the compiler to put parts of inlined functions outside our loop.
Do not install or include config.h in header files.
Fix constness of trsm* functions.
More constness fixes.
A few more compiler warning fixes and a const thingy.
More constness and some whitespace issues.
Add --enable-debug-dump.
Documentation fix.
Add new elements to mzd_t and keep them consistent.
Add option --debug-mzd.
Move __M4RI_CPU_L1_CACHE and __M4RI_CPU_L2_CACHE to m4ri_config.h.in.
Added mzd_t::offset_vector and made mzd_t::blocks non-zero also for windowed matrices.
Added row_offset and accessor functions for mzd_t using it.
Implement separate cache for mzd_t.
Compiler warning fixes.
Rewrite of _mzd_addmul_even_weird to use rowstride.
Major improvement of transposing.
Bug fix and general fixups. Testsuite for transpose.
Bug fix in mzd_equal.
Speed up of mzd_col_swap with a factor of two.
Also ignore generated maintainer file ltmain.sh
Add support for transposing multi-block matrices.
Copied the improved code of mzd_col_swap to mzd_col_swap_in_rows and added support for start_row/stop_row.
Clement Pernet (10):
work in progress in lqup
* add permutation window
fixing trsm calls to addmul
* new matrix_addmul with any weird dimensions (still need to be tested)
Martin patch:"more experimental permutation code, needs testing"
some more stuff on the weird addmul
Work in progress on the LQUP front: fixed a bunch of bugs, and get LQUP working on full rank matrices.
fix LQUP doctest
Added the 2 remaining trsm and the corresponding testsuite and benchmarks.
Switch PLUQ -> LQUP
Cédric Boutillier (43):
set distribution to UNRELEASED and add a -1 Debian version
Convert to 3.0 (quilt) Debian source format
Add build-dependency on dh-autoreconf
change upstream tag format for git-buildpackage
Do not ship .la file in libm4ri-dev
Set debhelper compatibility level to 9
Build-depend on dpkg-dev >= 1.16.1, add hardening
Add a debian/watch file
Add unapply-patches in debian/source/local-options
Update upstream-versioning-change patch to fix SONAME
Add strict dependency on the binary lib for the -dev package
Pre-depend on multiarch-support, add Multi-Arch: same
Fix *.install to use multiarch paths
update changelog
convert copyright file to copyright-format/1.0
debian/watch: use uversionmangle instead of dversionmangle
Merge commit 'release-20120613'
prepare for new upstream version
disable upstream-version-change patch
update copyright info
use upstream version numbering
remove changelog entries for versions which never made it to the archive
point to /usr/share/common-licenses/GPL-x for GPL-x+ license text
Add myself to Uploaders
disable sse2 flag
override lintian message about the absence of upstream changelog
target experimental
Add VCS-* fields; Bump Standards-Version: to 3.9.4 (no changes needed)
use canonical value for Vcs-Git: field
Build-depend on pkg-config, libpng-dev (Closes: #699071)
add OpenMP support
add debug package
reformat debian/control with cme fix dpkg-control
update changelog
Merge release-20130416
prepare for 20130416-1
add patches to enable sse2 for non Intel cpu and disable sse3
remove upstream-versioning-change patch (not needed anymore)
do not disable sse2 for x86_64 CPUs
add debian/upstream file
update changelog
use DEB_HOST_ARCH_CPU instead of DEB_BUILD_ARCH_CPU
upload to unstable
Felix Salfelder (13):
libm4ri_0.0.20080521.orig.tar.gz
imported debian from libm4ri_0.0.20080521-2.diff.gz
Merge commit 'release-20111004'
debian/0.0.20111004
Merge commit 'release-20111203'
debian/0.0.20111203
update autogenerated files (do we need them?)
debian/0.0.20111203-1.
remove autogenerated files
switch to dh
Merge commit 'release-20120415'
new upstream release, 20120415
removed ltmain.dh (autogenerated)
Jean-Guillaume Dumas (1):
* added is_zero
Martin Albrecht (480):
initial commit
- refactoring (renaming of functions, files)
Strassen seems to work if the matrix dimensions are exactly right
added support for SSE2 instructions (for now these need to be enabled by hand). The speed-up is
Strassen multiplication seems to work now
- added support for SSE2 if available (autodetection)
fix build on PPC
continued refactoring (should be almost done) and fixed bug in naiv multiplication
refactoring should be done
simplified combine, don't try to outsmart the compiler
doxygen updates
a potentially more cache-friendly implementation, needs checking
misc cleanups
fix version-info
Doxygen coverage 100%
implemented memory efficient strassen multiplication operation schedule
removed dead test code, added strassen.h to m4ri.h
moved mzd_combine to packedmatrix.[c|h]
SAFECHAR = (1.3 * RADIX) is sufficient
slightly improved clearing of target matrix in _mzd_mul_m4rm_impl
marking more parameters const
some cosmetic changes to packedmatrix.c
declaring more parameters const
docstring updates and API unification
fixed compilation under OSX (32-bit) and under OpenSolaris (32-bit)
remove unecessary local variables, add explicit casts as picked up by MSVC
added support for Visual Studio 2008 Express
using XOR directly rather than calling mzd_combine gives a significant speed-up so we do that for now. Need to check if this is related to SSE2 and if we can re-introduce it
adapt documentation: We use Strassen-Winograd not Strassen
more documentation for the Opteron vs. Core2Duo performance compromise
reintroducing SSE2 to m4rm multiplication
unify SSE2_CUTOFF
fix SIGSEGV
some minor documentation updates
compile fix for HAVE_SSE2 == False
don't use free on _mm_malloc'd memory
fixing benchmarking/testing code and adding it to revision control
only call _mm_malloc if it is really available
faster naiv multiplication but still not as fast as is could be.
added William Hart's Block M4RM implementation which gives a significant speed-up!
document M4RM_BLOCKSIZE
make run_bench return min,median,average and max
nicer parameter names for mzd_combine
re-added SSE2 support to mul_m4rm which gives a quite tiny speed-up
faster transpose
block'ing naiv matrix multiplication and using that by default if B->ncols < some threshold
reverting benchmarking code to square matrices
copy window to matrix to improve data locality in strassen multiplication
fix commenting style
removed parameters T and L for M4RM (they weren't used anyway)
new implementation of M4RM multiplication with two Gray code tables. The idea is by Bill Hart
implemented first parallel strassen-winograd multiplication (compile with -fopnemp -DHAVE_OPENMP)
some (style) improvements for SSE2 code by Bill Hart
fixes for the last check-in (all rows are aligned now if no windows are used)
use 8 instead of 2 Graycode tables (implementation and idea by Bill Hart)
allow control over number of Gray code tables via define GRAY8
added support for SSE2 to new _mzd_mul_m4rm_impl this improves performance on C2D considerably,
fix bug in reduction introduced by speeding up make_table
added new testcase, cleanup for valgrind
added more test (corner) cases
fixed bug Bill Hart reported, fix all things Valgrind reported and made code run faster on C2D.
added Bill's cutoff improvement
make OpenMP support configurable
updated MSVC project, added all relevant headers to m4ri.h
fix include order
slightly more clever loop unrolling using a Duff device, doesn't make much of a difference
remove unused variable
slight simplification for process rows and HAVE_SSE2
new M4RI1 routine for matrix reduction, which is still buggy for singular matrices
more small work m4ri1, this is buggy, experimental, play-around code
Michael Brickenstein:
M4RI doesn't fall back to Gaussian elimination so easily anymore. In fact, it never does. This
remove mzd_process_row and changed interface for mzd_process_rows to treate stoprow exclusive (this is more C-ish)
speed improvement for M4RI
more speed improvements for M4RI
implemented using two Gray code tables at the same time, which improves performance.
some slight improvement to mzd_row_add_offset
removing number of parallel processed rows to two.
implement lazy strategy, i.e. attempt to not reduce rows already reduced.
removed old commented-out reduce implementation
removed references to old implementations
renamed GRAY8 macro to M4RM_GRAY8 since it only applies to multiplication
avoid potential memleak in shared library mode where the Gray codes are rebuild several times.
another attempt at speed improvements
4 Graycode tables seem to be good, need to test on Opteron. For large matrices we hit L2 so
don't reduce a row if it is already reduced, slight overhead for random matrices, huge gain
big check-in (sorry):
added documentation for lacking bounds checks
fix Gaussian reduction for full=FALSE, reported by Wael Said
slightly improved the k parameter for reduction, the M4RM k parameter can be adapted for the Core2
adapted parameter k for top_reduce too
work in progress: mzd_addmul_strassen
fix printing for ncols%RADIX == 0
fix typo in documentation
implemented memory efficient addmul
added mzd_col_swap
fixed dimensions of X0,X1,X2 in addmul_strassen
added a bunch of functions and CHANGED THE API!
macros more robust by adding lots of brackets
first version of col_rotate
2nd attempt at col_rotate, doesn't update permutation yet
M4/autoconf trickery
sane default value for Strassen cutoff
API CHANGE, dropping all _impl's. also improved MP Strassen slightly
merging Clement's patch, everything should work
initial untested code for permutations
checking in all files that automake doesn't autogenerate
commenting stuff out that prevents the build
patch bomb:
int/long -> size_t cleanup courtesy of MSVC
added cached memory management option, which is disabled since it doesn't seem to make a difference
removed -fopenmp
if create/destroy_all_codes is called twice ignore the second call.
renamed combineX_sse2 to combineX
improved and enabled memory manager, also introduced shared library constructors and destructors. These seem to work with GCC, needs
thread safe-ness + refined lib constructor/destructor
added extern "C" safeguard
quick rename of one variable, trivial
__SUNCC__ -> __SUNPRO_C__, untested
added "proximity schedule" from FFLAS, but that doesn't seem to improve performance
removed proximity schedule again
adapted parameters for Opteron
changed strategy for parallel multiplication to block-parallel-then-strassen
updated README and AUTHORS
preparation for next release (targeted: Sunday)
renamed reduction to elimination
new strategy for k in M4RI, seems to work well on Opteron and C2D
define CPU_L2_CACHE in misc.h if it isn't there already
- fix compilation with MSVC
fix docs
merge of Clement Pernet's patch:
slight coding-style clean-up after merging Clement's patch
documentation update
new strategy for k for multiplication, should fit Opteron and C2D
fix a SIGSEGV and sometimes wrong results for matrix multiplication
new release
Added tag release-20080826 for changeset 6b307aa254cb
work on LQUP (or LUP right now)
fix warnings issues by ICC & remove unused watch.c/.h
removed watch.h from m4ri.h
more work on LUP, still not correct
fix memleak in addmul
more scratch code for LQUP
release 20080901
Added tag release-20080901 for changeset bf3d55ccb73b
fix cache size detection handling
fix/unify bit shifting bugs as exposed on Itanium
checking in Arnaud Bergeron's cache detection fix for PPC + my adaptation
... and reverted my changes again since they don't work
added RIGHT_BITMASK equivalent for LEFT_BITMASK and (hopefully) made the code more readable
Added tag release-20080904 for changeset ce71e2c84ad1
suppress redundant output
some work on LQUP basecase, not working yet
checking in fix by anakha
playing around with LQUP
anakha's fix again for configure's cache detection
some more work on LQUP but not working yet
LQUP basecase (slow but seemingly correct for square matrices)
make C++ compiler happy by fixing m4ri_die's signature
fix mzd_add speed regression
fixed bug in trsm routines
better benchmarketing code
improved Makefile.am and added make check
new release
enabling LQUP doctests
fix two MSVC warnings
update MSVC project file
fix doctest failure under OpenSolaris
I'm just playing with MMPF LQUP (not to be taken seriously)
update/correct license statement in source files. M4RI was always GPLv2+
do not use rowswap array to swap rows, always copy:
improved testsuite build process
faster LQUP (use Strassen instead of M4RM only) and more comprehensive test suite
PLUQ work in progress
PLUQ MMPF work in progress
fixed a bug introduced by fixing RIGHT_BITMASK
some minor clean-up after fixing the TRSM tests
remove some assert(M->offset==0)
renamed LQUP -> PLUQ
MMPF: deleting L for now for debuggin purposes, once Q is correct, don't kill L
added method for permutation printing
Q seems to be correct now for MMPF
bumped version in Makefile.am to aim for release for end of month
added optimized function for v*A where v is a (1,d) vector and A is a (d,d) matrix. The code is
removed a lot of old functions that were not needed anymore
added (untested) mzd_copy_row function. The function is based on Michael Brickenstein's copy_row.
and added mzd_row_clear_offset again
PLUQ permutations are still wrong, MMPF might be alright
cleaned up some cruft left over from debugging sessions
mzd_submatrix accepts offsets now
remove debug printing
documented/cleanup up MMPF
mzd_col_swap is a bit faster now, fixed memleak in bench_lqup
better crossover and 'better' Q update
doctest should cat LQUP failures for smaller examples
-fixed spelling of naive across the board
a supposedly working PLUQ implementation (doesn't work with MMPF yet)
PLUQ factorisation with MMPF base case seems to be working!
added m4ri_random_word (check randomness of output)
use m4ri_random_word in mzd_randomize (todo: check randomness)
better strategy for column swaps in mzd_pluq_mmpf (still way too slow for matrices with r << n)
factored out PLUQ MMPF and wrote faster MMPF routine
PLUQ is really really slow for e.g. half ranks. Some code to fix this but no luck so far
faster M4RI for sparse matrices
improved (faster) pivot search in MMPF
clarified documentation
allow half rank in bench_elimination
factored out pivot finding to fast subroutine
commented out SSE2 attempt for mzd_col_swap
massive speed-up for sparse matrices
use fast pivot searching code in mzd_reduce_m4ri
slightly faster column swaps?
implemented mzd_echelonize_pluq (mzd_echelonize_FOO is so much better than mzd_reduce_FOO)
fixed MMPF
mzd_row_add_offset not static inline anymore
changed API and updated docs: mzd_reduce_ mzd_echelonize_
mzd_print_matrix -> mzd_print; mzd_mul_m4rm_t removed
merge mzd_row_add_offset move
updated AUTHORS
made some todos more visible
fixed solve for full rank A
fixed warning in test_solve
yet another printf fix
added COPYING file to repository because autotools insist on GPLv3+ while we're GPLv2+
new strategy for dealing with not-full rank submatrices in MMPF
renamed a fullrank -> halfrank in testroutine
better handling of sparse matrices in MMPF
more testcases (mzd_echelonize_m4ri() fails)
fixed some minor bug in TRSM
fixed PLUQ MMPF bug
improved bench_elimination to allow choice of algorithms (mmpf, pluq, naive)
spend less time in mzd_process_rows when in mzd_process_rows2_pluq
added mzd_density()
apply_p_right_trans() more cache friendly
improved cache friendliness of column swaps in LQUP
improvement for sparse matrices in M4RI
fixed memleak in test_solve
preparing for release 20090105
fixed doxygen warning
fixed MSVC compiler warnings
updated MSVC project to include pluq_mmpf and solve
Added tag release-20090105 for changeset 0b25b0a1474a
small clean-up in mzd_cooy()
make bench targets depend on Makefile
make m4ri_coin_flip static inline to remove noise from oprofile run
inlining a couple of often called functions, this should help a bit
renamed Macros' functions _evenb -> _even and the original functions _even -> even_orig to
fixed a SIGSEGV in mzd_echelonize_pluq() when full==0
remove unecessary if() statements in mzd_pluq_mmpf()
improved documentation (added docs on return values) and removed redundant parameters from mzd_echelonize_m4ri()
some trivial doxygen fixes
fix bench_elimination.c vs. new mzd_echelonize_m4ri() API.
call _mzd_mul_va from mzd_mul_naive if appropriate
refactoreded packedmatrix to allow more than one malloc call to allocate the matrix
cleaning up the new code
added Macro as an author
fixed a few warnings and one error as reported by MSVC
set max malloc size to 1GB
fixed release soname
implemented mzd_kernel_left_pluq to compute kernels via PLUQ
added test code for mzd_kernel_left_pluq()
fixed testcode for mzd_kernel_left_pluq()
do not prepend zero in cache size detection since that will trigger octal interpretation of the result
yet another fixing attempt for cache size detection
fix L1 detection on OSX x86
fix compilation with --enable-openmp
experiments with OpenMP
fixing OpenMP doctest failures
added low leverl parallelisation ot process_rows2_pluq and added that the parallel sections in mzd_mul_mp_even() should use num_threads(4)
switch back to using threads if any additional thread is available, don't require at least four
fix bug in mzd_is_zero() where small zero matrices wouldn't be reported as such
implemented adding 3 and 4 rows in one step for PLUQ MMPF and adapted constants accordingly
only swap at the end of the base case not during while finding the pivots. This allows a more
made mzd_apply_p_right and mzd_apply_p_right_trans more efficient to decrease the penalty of column swaps.
Added tag release-20090617 for changeset 46b89e01b348
switching MMPF from PLUQ to LQUP and enabling it
improved performance for LQUP factorisation to roughly match that of PLUQ, still work to be done
fixed a bug which escaped me for the last check in because I didnt check with cutoff=64
some performance improvements for sparse-ish matrices
don't apply permutation if todo rows == 0
added _pluq_mmpf back for debugging etc.
copy submatrix to temporary when switching to MMPF
use L2_CACHE_SIZE for PLUQ cutoff (experimental)
improve performance of mzd_transpose using Hacker's Delight bit-fiddling trick (closes: #15)
don't check for the number of CPUs on configure. The macro is not cross platforms and we don't use it anyway (fixes #16)
whoops, forgot to check in configure.in
implemented timing experiment to calculate L1 and L2 cache size. This isn't working perfectly yet and thus it is only optional for now.
moving 'step 1.5' of LQUP MMPF to _mzd_lqup_submatrix because it caused confusion that the postprocessing is outside of that function.
fixed potential segmentaton fault in mzd_row_add_offset
merge
changing the soname version to 20091101 in preparation for new release
fix bug which lead to wrong results on t2.math.washington.edu
another sizeof(size_t) != sizeof(word) bug
fixing warnings/errors reported by Microsoft Visual Studio
Added tag release-20091101 for changeset 66644740d92d
fixed doxygen warnings
defaulting to '0' instead of 'unkown' in ax_cache_size.m4. This should make things more cross-platform
considerable protability improvement in configure.ac due to David Kirkby
only perform column swaps on non-zero rows in mzd_echelonize_pluq. For some sparse matrices, this gives an advantage
renamed mzd_apply_p_right_tri to mzd_apply_p_right_trans_tri because this is what it does
fixed a bug in permutation which caused segfaults (cf. Sage #8301)
be slightly more clever about selecting 'k' in _mzd_lqup_mmpf() by mirroring M4RI strategy
revert temporary switch to _mzd_lqup_naive in _mzd_lqup (it was just a benchmarketing test)
updated to current Debian version (this file should be removed from revision control, it doesn't belong here)
current OpenMP complaints about return from critical blocks, also removed nested criticial blocks
tuned OpenMP parameters for M4RI on sage.math
implemented heuristic algorithm which starts with M4RI and switches to PLS based
* renamed LQUP functions and filenames to PLS
fixed docstring for PLS decomposition
updating Visual Studio project
allow the user to disable SSE2 instructions
fix default paramters in configure
Added tag release-20100701 for changeset 8513835b2a92
exporting all mzd_process_rowsX variants
refactoring to allow m4rie to reach into some of our fast routines
wide should be a size_t
Cygwin requires no-undefined, otherwise no shared lib is built
make sure the memory managers match!
more robust cache tuning by increasing the number of trials
improved speed of cache tuning, seems to give good results on t2,bsd,road,prai243,redhawk,eno,iras
new release
Added tag release-20100817 for changeset 6758e6a445c0
fixing solving (for systems which are consistent)
yet another fix for system solving. Inconsistent systems *are* detected despite
fixing segfault in corner case of solve.c
implemented simple TRSM upper left using Greasing
rewrote mzd_make_table in order to support offset!=0 needed for M4RI based TRSM (experimental)
a more comprehensive test suite for TRSM
adding optional randomized tests for TRSM
new TRSM passing all tests now
slight speed improvement for TRSM upper left
package passes make distcheck now
adapt testsuite to new build structure
allow generic ranks in bench_elimination so we can improve rank sensitivity
new function _mzd_compress_l which implements compression of L for PLS
more work on compression of L
optimised compression L in _mzd_pls() (fixes #23)
benchmark(et)ing code for sparse-ish matrices
_mzd_pls_submatrix() only considers the currently needed words instead of of whole rows
don't compute the full PLUQ in mzd_echelonize_pluq() if full=0
slightly better _mzd_combine
*** empty log message ***
set the random seed to a fixed value to allow reproducible tests/benchmarks
added benchmarketing "framework" for getting more reliable timings out of bench_ files.
adding swap_bits() function to easy transition for third parties to new matrix layout
removing work arounds for compiler bug (not properly alligned loops) since they are not cross platform
merging with Carlo's random() fixes
adding autogenerated files to .hgignore
improved mzd_add
merge with Carlo's copyright update
I foolishly forgot to add some of the newly added files
correcting a few minor things in bench_packedmatrix
merging in Carlo's benchmarking patches
bench_smallops made obsolete by bench_packedmatrix
matops doesn't exist anymore
add hack to add PAPI include directory.
allow --with-papi=PREFIX when AC_CHECK_LIB cannot find it (i.e. the case it is meant for)
remove ltmain.sh which is autogenerated
install debug_dump.h otherwise programs linking against the library will fail to compile
initialise variables (i.e., take care of Wall reported errors)
follow-up check-in for cache size fix
do not fail if realpath is not installed
only set HAVE_PAPI if we have papi
merging Carlos' swap patches
adapting release version
MS Visual Studio 10 support
fixed typo which prevented compilation
xor is a restricted keyword in C++
updating README and AUTHORS for upcoming 20110601 release
print cycles per bit in bench elimination and multiplication
fixing printing of benchmarketing information
fix compilation and segfaults when OpenMP is enabled
zero out transpose target matrix before writing to it
documentation update on PLE factorisation
disable manual zero-ing out the transpose matrix since our tests indicate it happens on the fly anyway. Added tests though.
adding m4ri_spread_bits and m4ri_shrink_bits + testcases
fix memleak in vector_destruct()
revise PLE decomposition to match new block-iterative algorithm.
flush the buffer in tuning such that the user gets feedback that we are not hanging
use less iterations per experiment in cache tuning but more experiments
added option to pass cache sizes explicitly to configure
adapting soname for upcoming release
Added tag release-20110601 for changeset 75bcfb497a80
Added tag release-20110613 for changeset 68c0b623b59a
Added tag release-20110715 for changeset ab55c3167691
use new-style config.h
handle cflags better
changing version to 20110901 for upcoming release
Added tag release-20110901 for changeset 753358af056e
mzd_cmp() should not compare stuff after ncols
whitespace stuff
bugfix in mzd_is_zero()
renamed SIMD_FLAGS to SIMD_CFLAGS && defined __M4RI_SIMD_CFLAGS and __M4RI_OPENMP_CFLAGS in m4ri_config.h
... and fixed a bug in the last check-in
changing version number for upcoming release
Added tag release-20111004 for changeset 7453821cbd9b
PNG reading/writing & reading of JCF's sparse matrix format
bugfix reading/writing png
invert mono doesn't work for colormap so we do it by hand
removed misleading "this file is broken" comment, it *should* not be broken.
shipping pkg.m4 for systems which don't have pkg-config such as OpenSolaris
use MATHJAX when generating Doxygen HTML docs
fixed png stuff for machines without libpng (i.e., not building png stuff)
changed library release to 20111203
Added tag release-20111203 for changeset 8c2115cc469c
allow -n 1 in benchmarketing code
removed trailing whitespaces
mzd_invert_m4ri replaced by mzd_inv_m4ri() with sane interface, implemented mzd_invert_upper_m4ri for inverting upper triangular matrices
benchmarking & testing code for inversion
bugfixes & asymptotically fast inversion for upper triangular matrices
benchmarking for asymptotically fast trtri for upper triangular matrices
improved benchmarketing code for PLUQ/PLE decomposition (still called PLS in code)
faster & nices trtri for upper triangular matrices
refactoring: renamed files & functions (probably a bit moe clean up to do)
fixed cutting of matrices to fix problems with SSE2 instructions
fixed spelling of Kronrod
Added bench_trsm which unifies bench_trsm_*
removing bench_trsm_* which are replaced by bench_trsm
mzd_extract_u & mzd_extract_l and faster TRSM upper right
corrected guardian ifdef
fixed reference to bench_packedmatrix
define __STDC_LIMIT_MACROS before including <stdint.h> (reported by Jerry James)
fixes #38 (fix suggested by anonymously)
allow for really really big matrices (fixes #39)
do not use HAVE_OPENMP, always use __M4RI_HAVE_OPENMP (fixes #41)
limit size of mzd_t cache to 16*64 (fixes #40)
mzd_transpose() on 0x0 matrix should not crash (fixes #31)
open files in binary mode (for Windows) (fixes: #43)
move OMP stuff to lower level (which gives better results on my 4 core i7)
a few small changes to make scan-build shut up
mzd_inv_m4ri makes no guarantees what happens when the input is not invertible
fixed nasty bug in mzd_t_malloc/free
Added tag release-20120415 for changeset cb1d737cb43e
fixed compiler warning: "expected long long but got word"
remove _startblock's from mzd_combine_weird (it ignored them anyway). fixes #47
be a bit more careful about what to add to LIBADD for libpng
next try to fix png detection
library version to 20120615
drop -lm
preparing for upcoming release
changing release management to make fuck-up's less likely
some whitespace changes so stuff is aligned
fixed bug in mzd_col_swap_in_rows which would break if more than one block was used per matrix
Added tag release-20120613 for changeset d68372939136
removed old unused code and trailing whitespaces in strassen.c
remove autogenerated files
ignore autogenerated files
prettier printing for bench_multiplication
added PAPI support to bench_multiplication (still a bit dirty, might need cleaning up)
use slightly less instructions in M4RI by avoiding mzd_read_bits calls
slight improvement to PLE base-case
use SSE2 more in xor.h
use less instructions to read bits in ple_russian
added support for PAPI in bench_elimination
more data locality in multiplication via copying
detect L3 cache and use it instead of L2 cache
simplified + slightly faster M4RM code
use more tables in ple_russian
fixing bugs introduced in last two patches
faster benchmarking code for bench_elimination
some more fine-tuning of the parameter k
enforce kk <= m4ri_radix
clarifies documentation for JCF format reader and fixes bug in indexing (see #49)
fixed autotools scripts for make dist to work (better)
more fiddling around with cache tuning
explicitly prefix m4ri to all includes, i.e., #include <m4ri/foo.h>
deprecating cache tuning
fall back to L2 size if L3 size is not detected/not present
limit k to 6, which seems to give best performance for now
more flexible bench_multiplication parameter parsing
changing version for next release: 20121224
we don't actually need 2.64
choose better values for k if matrices are small
process rows is pretty internal
slapping a bunch of pragma omp parallel for's on the code, no guarantees whatsoever
for small matrices _mzd_ple_to_e was a bottleneck, now it isn't
adding mzd_process_rowsX_ple back as it is used in triangular_russian.c
aligning at 64-byte boundaries as per advice of Richard Parker
adding mzp_copy() for convenience
preparing for upcoming release
Mate Soos (1):
"long long" is not ANSI-compliant. Chaning it to 'word'
Michael Brickenstein (1):
don't bail out in mm_malloc if asked for nothing
Minh Van Nguyen (1):
more user friendly documentation in README
Peter Jeremey (1):
this patch solves:
Tobias Hansen (2):
Fix gbp.conf
Fix gbp.conf
bodrato at localhost (1):
New Strassen-like sequence for multiplication, and squaring.
bodrato at mail.dm.unipi.it (1):
Added new functions for addmul and addsqr using new sequences.
-----------------------------------------------------------------------
--
libm4ri: library of Method of the Four Russians Inversion
More information about the debian-science-commits
mailing list