[mlpack] branch svn-trunk updated (a0600b4 -> 3518384)
Barak A. Pearlmutter
barak+git at pearlmutter.net
Sat May 2 09:11:02 UTC 2015
This is an automated email from the git hooks/post-receive script.
bap pushed a change to branch svn-trunk
in repository mlpack.
from a0600b4 Update HISTORY.txt.
new 68e46ff Handle variance calculation with zero eigenvalues.
new 55c7415 Remove unnecessary include directives.
new ec700f7 Switch int to size_t in order to fix a very large number of warnings.
new bb74e39 Fix some more signed/unsigned comparison warnings that I introduced with the previous revision.
new 7b41f59 New distribution (a combination of LinearRegression and GaussianDistribution) for implementing HMM Regression.
new 46b7cb2 Implementation of HMM Regression
new 6b532cf Added HMM Regression files
new 9745c6a Comparision-type warning sorted out.
new cb19b47 Fix even more warnings that I've introduced.
new 01cd87c I fixed a little bit, but I know this doesn't fix everything.
new aacd011 Comment out XTreeTraverser test for now as per #368.
new c916117 Add comment pointing out that there is a bug.
new c16a9bf Right now, we can't load a vector, so we load a matrix and extract the last column.
new 32818d0 Move MAX_OVERLAP to be a member in the mlpack::tree namespace to fix errors on Visual Studio. Thanks to SinisterMJ for pointing this out. #369.
new f6795d2 Add explicit declarations of template function specializations for linker fixes on Visual Studio.
new ae1c9ff Tabs to spaces.
new d96c202 Spacing and line length fixes.
new c122733 Fix a couple bugs pointed out by Francois Berrier: SGD isn't actually shuffling, and also the final returned objective may not be correct.
new 2116987 Refactor k-means significantly. Remove overclustering since I think nobody is using it (I don't think it's a very interesting technique) and it may be buggy. Speedups for the situation where only cluster centroids are desired.
new f3b12a6 Implement Elkan's algorithm for k-means (it's pretty fast).
new 78a6f40 Remove comment about overclustering.
new ac719ba Remove references to overclustering from tutorial.
new 4af0264 Refactor test to remove overclustering parameter.
new 79d8b85 Update and clarify build tutorial, since DEBUG and PROFILE are OFF by default in releases.
new 7a74d2b Refactor for different KMeans API.
new 118944b Refactor for API change... forgot to check this one in.
new 7c0fb1e Add a warning if the user wants 0 clusters, because the thing is probably going to crash (but maybe for some LloydIterationType that might be what's desired?).
new cbf99bf Explicit std::sqrt() call.
new 43b3a56 Add implementation of Hamerly's algorithm.
new 7d92dbe Refactor input arguments so --algorithm is an accepted parameter, which provides more flexibility as I add more LloydItreationTypes.
new c432485 Refactor ElkanTest and add a test for Hamerly's algorithm.
new c25d88a Fix distance calculations, and fix residual calculation.
new f69583f Fix a bug; now this algorithm is much faster.
new db21670 Add Pelleg-Moore k-means. This implementation is faster and prunes more tightly than my previous attempts (which I didn't check in). (That, of course, simply means that my previous implementations were wrong, but this one isn't.)
new 5516e65 Add test for Pelleg-Moore k-means clustering.
new 1bc6274 Refactor: only track distanceCalculations, not scores and baseCases. Also remove traversalInfo because it's not used, and count distance calculations during cluster domination calculation.
new 9ab1019 Clean up an unnecessary sort, and remove spareBlacklist.
new b934414 Comment the Rules class a little better.
new 74e96fc Better comments for the PellegMooreKMeans class.
new 384a9d0 I suppose we should exercise at least some caution in the destructor.
new c627691 Don't ignore distance calculations during cluster-moving calculations.
new 252af32 Force C++11 support for future versions of mlpack. I wouldn't be surprised if this breaks the build in some places.
new 890a601 Allow std::cout << mlpackObject, as per #319.
new ad81fcb Test std::ostream << mlpackObject.
new 8c58f47 Now we have C++11, but there's no constructor copypasta problem anymore.
new 01bcc07 Oops, include ostream_extra.hpp.
new 2322bd3 A prototype algorithm for k-means clustering, which probably works best when k is very large and so is N.
new 363349e A test for the DTNN k-means algorithm.
new 98f5341 Safer includes, for the situation where the user does something not smart.
new 8fe11f2 Yet another instance of me failing to commit all my changes. Add a BaseCases() and Scores() function to NeighborSearch, so that a user (or DTNNKMeans) can obtain how much work was done after the search.
new c1c5bd5 Fix a bug that meant that centroidsOther was copied only when it shouldn't have been, and was never copied when it should have been (note the iteration++ at the end of the loop).
new 5d9b8ad Properly handle the case where the tree doesn't rearrange points -- like the cover tree. Then create a CoverTreeDTNNKMeans template typedef so that a user can easily use cover tree DTNNKMeans with KMeans<>.
new e9b01c4 Add test for CoverTreeDTNNKMeans.
new 0ae549e Make the Mahalanobis distance a true metric by default.
new a24ac4c Added Smooth and Filter functions
new 440022b Added Smooth and Filter functions
new ec300e1 Added regression_distribution.hpp/cpp, removed hmm_regression
new 827a2e5 Rename hmm_regression.hpp
new 2091ea6 rename hmm_regression.cpp
new 6c9e7b4 now regression_distribuiton.hpp
new c8f397d Now regression_distribution.cpp
new 45463dd Fix -Wreorder warnings after reordering of data members in class declaration.
new 5eb8c40 Minor formatting changes and streamlining of Armadillo expressions.
new f420226 Minor formatting fixes: tabs->spaces, etc.
new 4388d52 Minor spacing fix.
new dc3195e The const gets ignored (-Wignored-qualifiers).
new c130f50 Fix use of uninitialized value; this should help segfaulting SVDBatch tests.
new 12f3c97 Increase number of samples and give debugging output, in order to try and track down the bug I am seeing in all the Jenkins tests.
new 68784d2 Fix logistic regression tests by enforcing a tighter tolerance for SGD convergence. The changes introduced to SGD in r17196 to cause SGD to shuffle also caused situations where SGD can converge way too early, causing the two tests to fail. Tightening the tolerance to 1e-10 appears to be the solution to this issue.
new 6370c12 This is an experimental method that I am working on. Right now it is not very useful as I have not implemented all of the pruning strategies that I intend to.
new 1081600 Add DualTreeKMeans files to CMakeLists.txt.
new 26ed8e1 Add DualTreeKMeans as an option to the kmeans program (and also DTNN with cover trees).
new bade042 Add a simple test for DualTreeKMeans.
new 01b1057 Be explicit with calls to arma:: functions. Although gcc accepts this as-is, we don't have a guarantee that all compilers will.
new a4750ff Add a semi-hackish breadth-first traverser. The tree abstractions will need to change to support arbitrary traverser types (probably by adding a template parameter) but for now this works to make DualTreeKMeans work.
new 0bbd53a Add breadth first traverser.
new 9bad9dc Here's the file I forgot -- include the BreadthFirstDualTreeTraverser class definition.
new 00157d5 Use FATAL_ERROR instead of FATAL, so that CMake will actually crash when C++11 isn't available.
new 6f0d7b6 Refactor code for better comments and better adherence to coding conventions. No functionality change.
new 068b768 Fix incorrect class name.
new 1687a65 Refactor Elkan-type prune into its own method, for simplicity.
new c254c65 Add Pelleg-Moore type prune. This improves performance -- at least a bit.
new 9b80526 Loosen tolerance until a better solution is devised (currently I am waiting on an email from Nishant).
new 0a5839c Remove debugging random seed.
new e6f0525 strlen() returns the length of the string but you must account for the null terminator yourself. Hence, this code sometimes caused random invalid writes and crashes.
new cca2f52 If gradient2 or gradient1 are zero, then BOOST_REQUIRE_CLOSE will fail, so use BOOST_REQUIRE_SMALL in those situations.
new 4d6dfd8 Not sure how I missed this spelling error...
new 2c13daa Smarter handling of HDF5 dependency search, especially for Debian systems where things are Weird(TM).
new f2134ee The calculation here was actually incorrect.
new 8ae26d8 Refactor CountMostFreq() so it is faster, simpler, and doesn't sometimes return uninitialized values.
new faee609 Better handling of the weird case when includes are needed but the library isn't.
new a8409ff transition now protected, not private
new a10d19b HMM regression method
new cb8ee31 Implementation of HMMRegression class
new c47a0ab hmm_regression.hpp and hmm_regression_impl.hpp added
new 01811d3 Dimensionality() now returns proper values
new 6f20507 Re-ordered initializer lists to fix warnings
new 366efa3 Slightly loosen tolerances for NMF tests.
new ce1ef34 Fix -Wunintialized, reported by govg.
new b49128a Pedanticism: return a value at the end of main().
new 6552d35 Nope, turns out I am wrong. C++03, C++11, and C++14 all assume a program reaching the end of main() without returning anything will return 0.
new a46fb80 Fix uninitialized memory issue (dsPredictions was never set).
new 8ce16e5 Don't test on Armadillo 4.300.0 through 4.400.x because there is a bug in Mat::load(istream&) which prevents loading from type hdf5_binary. (The bug is simply the omission of the hdf5_binary case from the switch() statement actually.)
new a56f8d4 Issue a runtime error if the user is using Armadillo 4.300.x through 4.400.x and tries to load or save HDF5 files, since that is a bug in Armadillo.
new 7ca6257 Refactor GeneralizedRosenbrockTest to deal with intermittent failures better. Also use 4 trials for RastigrinFunctionTest.
new efd784a Refactor for cleaner code and avoid storing WH explicitly if possible.
new 556c4eb Minor code cleanups.
new 8db8c9a Somehow this never got added to the CMakeLists.txt.
new fa4f785 Remove debugging output.
new 9b614a8 Refactor test with negative elements to decompose the random matrix into its proper low-rank decomposition, then test the reconstructed matrix.
new 09aec03 Disable C4519 errors entirely.
new 2c37111 Include prereqs.hpp for compiler definitions and adjustments.
new 884c029 Disable C4519 in prereqs.hpp not core.hpp.
new 4d83888 Handle setting seed properly for Armadillo RNGs past 3.930.
new 51e2f53 Significantly shrink size of test dataset because this test was taking 10 minutes.
new a2f4a47 Fix memory leak.
new d61a240 Widen tolerance slightly.
new 7b2fd34 Widen tolerance slightly.
new 2fa01ec Widen tolerances slightly. Maybe this test scheme isn't the best?
new 350b97b Check frobenius norm overall instead of just for one element.
new 910e557 Comment out NoCholeskySingularityTest in accordance with #373.
new 7c7a33d Reduce noise slightly and increase dataset size, which will slow down the test but make the results more accurate.
new 3dc9533 Tighten convergence tolerance for RastrigrinFunctionTest, since it doesn't seem to be coming close enough to the desired minimum.
new 9b7753d Slightly loosen tolerance.
new e21a41e Better handling of small gradient values.
new 607e092 Minor tolerance widening.
new 515b520 Accidentally checked in unstable code.
new 8a290c0 Slightly widen tolerance.
new f894bde Handle negative gradient values correctly.
new e541549 Fix convergence criterion according to Nishant's suggestion.
new ba95ecd Loosen tolerance a bit, since it seems to fail once in a while. It's definitely not broken though.
new a8623d4 Tweak SGD parameters a little bit.
new 52754b9 Remove random seed to make test reproducible.
new a8f8e02 Widen tolerance for norm difference, and tweak parameters a little bit.
new 9258c15 Minor style fixes.
new d6f3f23 Add maxIterations parameter to limit the number of iterations used in the Newton method.
new 515c521 Fix ambiguous math reference error, to pass the test clang.
new 2901b95 Use maxIterations for Newton method loop instead of nested Armijo line search.
new d06ea2a Fix memory leak, although I'm not sure it's responsible for the i386 failures.
new 431c684 Adjust tolerances.
new 0aca872 The failure probability is already small, but not small enough it seems.
new cd1d564 Better handling of Armadillo configuration files, since ARMA_USE_HDF5 may appear twice incorrectly (reported by Giampaolo).
new f5fed2e Update with 1.0.11 release notes.
new c521cc3 Update notes; more has since been added to the k-means code.
new 2562c77 Merge changes to mlpack-1.0.11 tag.
new 7f68eb9 Why did I merge that change in? It broke everything. Revert...
new 38f458a Tests for Non-linearly separable dataset fixed.
new 7cee14d Tests for Non-linearly separable dataset fixed.
new 3518384 Tests for Non-linearly separable dataset fixed.
The 149 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.
Summary of changes:
CMake/CXX11.cmake | 45 +
CMake/FindArmadillo.cmake | 108 +-
CMakeLists.txt | 19 +-
HISTORY.txt | 39 +
doc/guide/build.hpp | 19 +-
doc/tutorials/kmeans/kmeans.txt | 78 +-
src/mlpack/core.hpp | 3 +-
src/mlpack/core/data/load_impl.hpp | 16 +
src/mlpack/core/dists/CMakeLists.txt | 2 +
src/mlpack/core/dists/regression_distribution.cpp | 78 ++
src/mlpack/core/dists/regression_distribution.hpp | 102 ++
src/mlpack/core/math/random.hpp | 6 +
src/mlpack/core/metrics/mahalanobis_distance.hpp | 6 +-
.../core/optimizers/lrsdp/lrsdp_function.hpp | 10 +
src/mlpack/core/optimizers/sgd/sgd_impl.hpp | 15 +-
src/mlpack/core/optimizers/sgd/test_function.hpp | 6 +-
src/mlpack/core/tree/CMakeLists.txt | 2 +
src/mlpack/core/tree/binary_space_tree.hpp | 2 +
.../tree/binary_space_tree/binary_space_tree.hpp | 3 +
...r.hpp => breadth_first_dual_tree_traverser.hpp} | 22 +-
.../breadth_first_dual_tree_traverser_impl.hpp | 442 +++++++
src/mlpack/core/tree/cosine_tree/cosine_tree.cpp | 132 ++-
src/mlpack/core/tree/cosine_tree/cosine_tree.hpp | 71 +-
.../rectangle_tree/dual_tree_traverser_impl.hpp | 38 +-
.../r_star_tree_descent_heuristic_impl.hpp | 4 +-
.../tree/rectangle_tree/r_star_tree_split_impl.hpp | 74 +-
.../core/tree/rectangle_tree/r_tree_split_impl.hpp | 70 +-
.../tree/rectangle_tree/rectangle_tree_impl.hpp | 12 +-
.../rectangle_tree/single_tree_traverser_impl.hpp | 4 +-
.../core/tree/rectangle_tree/x_tree_split.hpp | 23 +-
.../core/tree/rectangle_tree/x_tree_split_impl.hpp | 224 ++--
src/mlpack/core/util/CMakeLists.txt | 1 +
src/mlpack/core/util/ostream_extra.hpp | 37 +
src/mlpack/core/util/sfinae_utility.hpp | 1 +
src/mlpack/methods/adaboost/adaboost_impl.hpp | 18 +-
.../complete_incremental_termination.hpp | 92 +-
.../simple_residue_termination.hpp | 68 +-
.../simple_tolerance_termination.hpp | 33 +-
.../amf/update_rules/svd_batch_learning.hpp | 33 +-
.../methods/decision_stump/decision_stump.hpp | 2 +-
.../methods/decision_stump/decision_stump_impl.hpp | 66 +-
src/mlpack/methods/det/dtree.cpp | 8 +-
src/mlpack/methods/gmm/gmm_main.cpp | 2 +-
src/mlpack/methods/hmm/CMakeLists.txt | 2 +
src/mlpack/methods/hmm/hmm.hpp | 46 +-
src/mlpack/methods/hmm/hmm_impl.hpp | 52 +-
src/mlpack/methods/hmm/hmm_regression.hpp | 335 ++++++
src/mlpack/methods/hmm/hmm_regression_impl.hpp | 191 ++++
src/mlpack/methods/kmeans/CMakeLists.txt | 14 +
src/mlpack/methods/kmeans/allow_empty_clusters.hpp | 11 +-
src/mlpack/methods/kmeans/dtnn_kmeans.hpp | 98 ++
src/mlpack/methods/kmeans/dtnn_kmeans_impl.hpp | 164 +++
src/mlpack/methods/kmeans/dual_tree_kmeans.hpp | 71 ++
.../methods/kmeans/dual_tree_kmeans_impl.hpp | 118 ++
.../methods/kmeans/dual_tree_kmeans_rules.hpp | 117 ++
.../methods/kmeans/dual_tree_kmeans_rules_impl.hpp | 319 ++++++
.../methods/kmeans/dual_tree_kmeans_statistic.hpp | 96 ++
src/mlpack/methods/kmeans/elkan_kmeans.hpp | 65 ++
src/mlpack/methods/kmeans/elkan_kmeans_impl.hpp | 186 +++
src/mlpack/methods/kmeans/hamerly_kmeans.hpp | 63 +
src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp | 170 +++
src/mlpack/methods/kmeans/kmeans.hpp | 69 +-
src/mlpack/methods/kmeans/kmeans_impl.hpp | 272 ++---
src/mlpack/methods/kmeans/kmeans_main.cpp | 274 +++--
src/mlpack/methods/kmeans/naive_kmeans.hpp | 11 +-
src/mlpack/methods/kmeans/naive_kmeans_impl.hpp | 22 +-
src/mlpack/methods/kmeans/pelleg_moore_kmeans.hpp | 93 ++
.../methods/kmeans/pelleg_moore_kmeans_impl.hpp | 92 ++
.../methods/kmeans/pelleg_moore_kmeans_rules.hpp | 107 ++
.../kmeans/pelleg_moore_kmeans_rules_impl.hpp | 178 +++
.../kmeans/pelleg_moore_kmeans_statistic.hpp | 81 ++
.../linear_regression/linear_regression.cpp | 127 ++-
.../methods/neighbor_search/neighbor_search.hpp | 19 +-
.../neighbor_search/neighbor_search_impl.hpp | 32 +-
src/mlpack/methods/pca/pca.cpp | 8 +-
src/mlpack/methods/sparse_coding/sparse_coding.hpp | 5 +-
.../methods/sparse_coding/sparse_coding_impl.hpp | 39 +-
src/mlpack/prereqs.hpp | 7 +
src/mlpack/tests/adaboost_test.cpp | 259 +++--
src/mlpack/tests/allknn_test.cpp | 4 +-
src/mlpack/tests/cli_test.cpp | 4 +-
src/mlpack/tests/cosine_tree_test.cpp | 72 +-
src/mlpack/tests/data/nonlinsepdata.txt | 200 ----
src/mlpack/tests/data/nonlinsepdata_labels.txt | 200 ----
src/mlpack/tests/data/test_labels_nonlinsep.txt | 600 ++++++++++
src/mlpack/tests/data/test_nonlinsep.txt | 600 ++++++++++
src/mlpack/tests/data/train_labels_nonlinsep.txt | 1200 ++++++++++++++++++++
src/mlpack/tests/data/train_nonlinsep.txt | 1200 ++++++++++++++++++++
src/mlpack/tests/data/vc2.txt | 517 ++++-----
src/mlpack/tests/data/vc2_labels.txt | 105 +-
src/mlpack/tests/data/vc2_test.txt | 67 ++
.../{iris_test_labels.csv => vc2_test_labels.txt} | 26 +-
src/mlpack/tests/decision_stump_test.cpp | 10 +-
src/mlpack/tests/distribution_test.cpp | 3 +-
src/mlpack/tests/gmm_test.cpp | 22 +-
src/mlpack/tests/hmm_test.cpp | 14 +-
src/mlpack/tests/kmeans_test.cpp | 247 +++-
src/mlpack/tests/lars_test.cpp | 3 +-
src/mlpack/tests/load_save_test.cpp | 4 +-
src/mlpack/tests/logistic_regression_test.cpp | 15 +-
src/mlpack/tests/nmf_test.cpp | 12 +-
src/mlpack/tests/radical_test.cpp | 2 +-
src/mlpack/tests/rectangle_tree_test.cpp | 2 +
src/mlpack/tests/regularized_svd_test.cpp | 11 +-
src/mlpack/tests/sa_test.cpp | 29 +-
src/mlpack/tests/softmax_regression_test.cpp | 40 +-
src/mlpack/tests/sparse_coding_test.cpp | 4 +-
src/mlpack/tests/svd_batch_test.cpp | 53 +-
src/mlpack/tests/to_string_test.cpp | 48 +-
109 files changed, 9002 insertions(+), 2161 deletions(-)
create mode 100644 CMake/CXX11.cmake
create mode 100644 src/mlpack/core/dists/regression_distribution.cpp
create mode 100644 src/mlpack/core/dists/regression_distribution.hpp
copy src/mlpack/core/tree/binary_space_tree/{dual_tree_traverser.hpp => breadth_first_dual_tree_traverser.hpp} (75%)
create mode 100644 src/mlpack/core/tree/binary_space_tree/breadth_first_dual_tree_traverser_impl.hpp
create mode 100644 src/mlpack/core/util/ostream_extra.hpp
create mode 100644 src/mlpack/methods/hmm/hmm_regression.hpp
create mode 100644 src/mlpack/methods/hmm/hmm_regression_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/dtnn_kmeans.hpp
create mode 100644 src/mlpack/methods/kmeans/dtnn_kmeans_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/dual_tree_kmeans.hpp
create mode 100644 src/mlpack/methods/kmeans/dual_tree_kmeans_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/dual_tree_kmeans_rules.hpp
create mode 100644 src/mlpack/methods/kmeans/dual_tree_kmeans_rules_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/dual_tree_kmeans_statistic.hpp
create mode 100644 src/mlpack/methods/kmeans/elkan_kmeans.hpp
create mode 100644 src/mlpack/methods/kmeans/elkan_kmeans_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/hamerly_kmeans.hpp
create mode 100644 src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/pelleg_moore_kmeans.hpp
create mode 100644 src/mlpack/methods/kmeans/pelleg_moore_kmeans_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/pelleg_moore_kmeans_rules.hpp
create mode 100644 src/mlpack/methods/kmeans/pelleg_moore_kmeans_rules_impl.hpp
create mode 100644 src/mlpack/methods/kmeans/pelleg_moore_kmeans_statistic.hpp
delete mode 100644 src/mlpack/tests/data/nonlinsepdata.txt
delete mode 100644 src/mlpack/tests/data/nonlinsepdata_labels.txt
create mode 100644 src/mlpack/tests/data/test_labels_nonlinsep.txt
create mode 100644 src/mlpack/tests/data/test_nonlinsep.txt
create mode 100644 src/mlpack/tests/data/train_labels_nonlinsep.txt
create mode 100644 src/mlpack/tests/data/train_nonlinsep.txt
create mode 100644 src/mlpack/tests/data/vc2_test.txt
copy src/mlpack/tests/data/{iris_test_labels.csv => vc2_test_labels.txt} (78%)
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/mlpack.git
More information about the debian-science-commits
mailing list