[mlpack] 38/58: * added AMF tutorial
Barak A. Pearlmutter
barak+git at cs.nuim.ie
Tue Sep 9 13:19:42 UTC 2014
This is an automated email from the git hooks/post-receive script.
bap pushed a commit to branch svn-trunk
in repository mlpack.
commit c93b769ce968af93a056331d8d818ee7a8584389
Author: sumedhghaisas <sumedhghaisas at 9d5b8971-822b-0410-80eb-d18c1038ef23>
Date: Thu Aug 21 02:04:18 2014 +0000
* added AMF tutorial
git-svn-id: http://svn.cc.gatech.edu/fastlab/mlpack/trunk@17093 9d5b8971-822b-0410-80eb-d18c1038ef23
---
doc/tutorials/amf/amf.txt | 205 +++++++++++++++++++++
doc/tutorials/tutorials.txt | 2 +-
src/mlpack/methods/amf/amf.hpp | 6 +-
.../methods/amf/init_rules/random_acol_init.hpp | 12 +-
.../complete_incremental_termination.hpp | 6 +-
.../incomplete_incremental_termination.hpp | 6 +-
src/mlpack/methods/amf/update_rules/nmf_als.hpp | 19 +-
.../methods/amf/update_rules/nmf_mult_dist.hpp | 16 +-
.../methods/amf/update_rules/nmf_mult_div.hpp | 24 +--
9 files changed, 255 insertions(+), 41 deletions(-)
diff --git a/doc/tutorials/amf/amf.txt b/doc/tutorials/amf/amf.txt
new file mode 100644
index 0000000..6b4bf9a
--- /dev/null
+++ b/doc/tutorials/amf/amf.txt
@@ -0,0 +1,205 @@
+/*!
+
+ at file amf.txt
+ at author Sumedh Ghaisas
+ at brief Tutorial for how to use the AMF class.
+
+ at page amftutorial Alternating Matrix Factorization tutorial.
+
+ at section intro_amftut Introduction
+
+Alternating Matrix Factorization
+
+Alternating matrix factorization decomposes matrx V in the form \f$ V \approx WH \f$
+where W is called the basis matrix and H is called the encoding matrix. V is
+taken to be of size n x m and the obtained W is n x r and H is r x m. The size
+r is called the rank of the factorization. Factorization is done by alternately
+calculating W and H respectively while holding the other matrix constant.
+
+\b mlpack provides:
+
+ - a \ref amf_amftut "simple C++ interface" to perform Alternating Matrix Factorization
+
+ at section toc_amftut Table of Contents
+
+A list of all the sections this tutorial contains.
+
+ - \ref intro_amftut
+ - \ref toc_amftut
+ - \ref amf_amftut
+ - \ref t_policy_amftut
+ - \ref init_rule_amftut
+ - \ref update_rule_amftut
+ - \ref nmf_amftut
+ - \ref svd_amftut
+ - \ref further_doc_amftut
+
+ at section amf_amftut The 'AMF' class
+
+The AMF class is templatized with 3 parameters; the first contains the policy
+used to determine when the algorithm has converged; the second contains the
+initialization rule for the W and H matrix; the last contains the update rule
+to be used during each iteration. This templatization allows the user to try
+various update rules, initialization rules, and termination policies (including
+ones not supplied with MLPACK) for factorization.
+
+The class provides the following method that performs factorization
+ at code
+template<typename MatType> double Apply(const MatType& V,
+ const size_t r,
+ arma::mat& W,
+ arma::mat& H);
+ at endcode
+
+ at subsection t_policy_amftut Using different termination policies
+
+The AMF implementation comes with different termination policies to support many
+implemented algorithms. Every termination policy implements the following method
+which returns the status of convergence.
+ at code
+bool IsConverged(arma::mat& W, arma::mat& H)
+ at endcode
+
+list of all the termination policies
+
+ - \ref mlpack::amf::SimpleResidueTermination
+ - \ref mlpack::amf::SimpleToleranceTermination
+ - \ref mlpack::amf::ValidationRMSETermination
+
+In SimpleResidueTermination, termination decision depends on two factors, value
+of residue and number of iteration. If the current value of residue drops below
+the threshold or the number of iterations goes beyond the threshold, positive
+termination signal is passed to AMF.
+
+In SimpleToleranceTermination, termination criterion is met when increase in
+residue value drops below the given tolerance. To accomodate spikes, certain
+number of successive residue drops are accepted. Secondary termination criterion
+terminates algorithm when iteration count goes beyond the threshold.
+
+ValidationRMSETermination divids the data into 2 sets, training set and
+validation set. Entries of validation set are nullifed in the input matrix.
+Termination criterion is met when increase in validation set RMSe value drops
+below the given tolerance. To accomodate spikes certain number of successive
+validation RMSE drops are accepted. This upper imit on successive drops can be
+adjusted with reverseStepCount. Secondary termination criterion terminates
+algorithm when iteration count goes above the threshold. Though this termination
+policy is better measure of convergence than the above 2 termination policies,
+it may cause a overhead in performance.
+
+On the other hand \ref mlpack::amf::CompleteIncrementalTermination
+"CompleteIncrementalTermination" and \ref mlpack::amf::IncompleteIncrementalTermination
+are just wrapper classes for other termination policies. These policies are used
+when AMF is applied with \ref mlpack::amf::SVDCompleteIncrementalLearning
+"SVDCompleteIncrementalLearning" and \ref mlpack::amf::SVDIncompleteIncrementalLearning
+"SVDIncompleteIncrementalLearning" respectively.
+
+ at subsection init_rule_amftut Using different initialization policies
+
+The AMF class comes with 2 initialization policies
+ - \ref mlpack::amf::RandomInitialization "RandomInitialization"
+ - \ref mlpack::amf::RandomAcolInitialization "RandomAcolInitialization"
+
+RandomInitialization initializes matrices W and H with random uniform distribution
+while RandomAcolInitialization initializes the W matrix by averaging p randomly
+chosen columns of V. In case of RandomAcolIntialization, p is a template parameter.
+
+To implement their own initialization policy, users need to define the following
+function in their class.
+ at code
+template<typename MatType>
+inline static void Initialize(const MatType& V,
+ const size_t r,
+ arma::mat& W,
+ arma::mat& H)
+ at endcode
+
+ at subsection update_rule_amftut Using different update rules
+
+AMF supports following update rules
+ - \ref mlpack::amf::NMFALSUpdate "AMFALSUpdate"
+ - \ref mlpack::amf::NMFMultiplicativeDistanceUpdate "NMFMultiplicativeDistanceUpdate"
+ - \ref mlpack::amf::NMFMultiplicativeDivergenceUpdate "NMFMultiplicativeDivergenceUpdate"
+ - \ref mlpack::amf::SVDBatchLearning "SVDBatchLearning"
+ - \ref mlpack::amf::SVDIncompleteIncrementalLearning "SVDIncompleteIncrementalLearning"
+ - \ref mlpack::amf::SVDCompleteIncrementalLearning "SVDCompleteIncrementalLearning"
+
+Non-Negative Matrix factorization can be achieved with NMFALSUpdate,
+NMFMultiplicativeDivergenceUpdate or NMFMultiplicativeDivergenceUpdate.
+NMFALSUpdate implements simple Alternating Least Square optimization while
+the other rules implement algorithms given in paper 'Algorithms for Non-negative
+Matrix Factorization'.
+
+The remaining update rules perform Singular Value Decomposition of matrix V.
+This SVD factorization is optimized for the use by Collaborative Filtering. This
+use of SVD factorizers for Collaborative Filtering is described in the paper
+'A Guide to singular Value Decomposition' by Chih-Chao Ma. For further details
+about the algorithms refer to the respective class documentation.
+
+ at subsection nmf_amftut Using Non-Negative Matrix Factorization with AMF
+
+The use of AMF for Non-Negative Matrix factorization is simple. The AMF module
+defines \ref mlpack::amf::NMFALSFactorizer "NMFALSFactorizer" which can be used
+directly without knowing the internal structure of AMF. For example -
+
+ at code
+#include <iostream>
+#include <mlpack/core.hpp>
+#include <mlpack/methods/amf/amf.hpp>
+
+using namespace std;
+using namespace arma;
+using namespace mlpack::amf;
+
+int main()
+{
+ NMFALSFactorizer nmf;
+ mat W, H;
+ mat V = randu<mat>(100, 100);
+ double residue = Apply(V, W, H);
+ reeturn 1;
+}
+ at endcode
+
+NMFALSFactorizer uses SimpleResidueTermination which is most prefered with
+Non-Negative Matrix factorizers. Initialization of W and H in NMFALSFactorizer
+is random. The Apply function returns the residue obtained by comparing the
+constructed matrix W * H with the original matrix V.
+
+ at subsection svd_amftut Using Singular Value Decomposition with AMF
+
+AMF implementation supports following SVD factorizers
+ - \ref mlpack::amf::SVDBatchFactorizer "SVDBatchFactorizer"
+ - \ref mlpack::amf::SparseSVDBatchFactorizer "SparseSVDBatchFactorizer"
+ - \ref mlpack::amf::SVDIncompleteIncrementalFactorizer "SVDIncompleteIncrementalFactorizer"
+ - \ref mlpack::amf::SparseSVDIncompleteIncrementalFactorizer "SparseSVDIncompleteIncrementalFactorizer"
+ - \ref mlpack::amf::SVDCompleteIncrementalFactorizer "SVDCompleteIncrementalFactorizer"
+ - \ref mlpack::amf::SparseSVDCompleteIncrementalFactorizer "SparseSVDCompleteIncrementalFactorizer"
+
+The sparse version of factorizers can be used with Armadillo's sparse matrix
+support. These specialized implementations boost runtime performance when the
+matrix to be factorized is relatively sparse.
+
+ at code
+#include <mlpack/core.hpp>
+#include <mlpack/methods/amf/amf.hpp>
+
+using namespace std;
+using namespace arma;
+using namespace mlpack::amf;
+
+int main()
+{
+ sp_mat V = randu<sp_mat>(100,100);
+ mat W, H;
+
+ SparseSVDBatchFactorizer svd;
+ double residue = svd.Apply(V, W, H);
+}
+ at endcode
+
+ at section further_doc_amftut Further documentation
+
+For further documentation on the AMF class, consult the \ref mlpack::amf::AMF
+"complete API documentation".
+
+*/
diff --git a/doc/tutorials/tutorials.txt b/doc/tutorials/tutorials.txt
index 760cb76..3579146 100644
--- a/doc/tutorials/tutorials.txt
+++ b/doc/tutorials/tutorials.txt
@@ -31,5 +31,5 @@ examples and progress to complex, extensible uses.
- \ref kmtutorial
- \ref fmkstutorial
- \ref emst_tutorial
-
+ - \ref amftutorial
*/
diff --git a/src/mlpack/methods/amf/amf.hpp b/src/mlpack/methods/amf/amf.hpp
index c87d76a..d6ac467 100644
--- a/src/mlpack/methods/amf/amf.hpp
+++ b/src/mlpack/methods/amf/amf.hpp
@@ -4,6 +4,8 @@
* @author Mohan Rajendran
* @author Ryan Curtin
*
+ * Alternating Matrix Factorization
+ *
* The AMF (alternating matrix factorization) class, from which more commonly
* known techniques such as incremental SVD, NMF, and batch-learning SVD can be
* derived.
@@ -25,7 +27,7 @@
#include <mlpack/methods/amf/termination_policies/simple_tolerance_termination.hpp>
namespace mlpack {
-namespace amf {
+namespace amf /** Alternating Matrix Factorization **/ {
/**
* This class implements AMF (alternating matrix factorization) on the given
@@ -62,7 +64,7 @@ namespace amf {
* @tparam UpdateRule The update rule for calculating W and H matrix at each
* iteration.
*
- * @see NMF_MultiplicativeDistanceUpdate, SimpleResidueTermination
+ * @see NMFMultiplicativeDistanceUpdate, SimpleResidueTermination
*/
template<typename TerminationPolicyType = SimpleResidueTermination,
typename InitializationRuleType = RandomInitialization,
diff --git a/src/mlpack/methods/amf/init_rules/random_acol_init.hpp b/src/mlpack/methods/amf/init_rules/random_acol_init.hpp
index 722e59a..6d2066d 100644
--- a/src/mlpack/methods/amf/init_rules/random_acol_init.hpp
+++ b/src/mlpack/methods/amf/init_rules/random_acol_init.hpp
@@ -2,11 +2,7 @@
* @file random_acol_init.hpp
* @author Mohan Rajendran
*
- * Intialization rule for Non-Negative Matrix Factorization. This simple
- * initialization is performed by the random Acol initialization introduced in
- * the paper 'Algorithms, Initializations and Convergence' by Langville et al.
- * This method sets each of the columns of W by averaging p randomly chosen
- * columns of V.
+ * Intialization rule for Alternating Matrix Factorization.
*/
#ifndef __MLPACK_METHODS_LMF_RANDOM_ACOL_INIT_HPP
#define __MLPACK_METHODS_LMF_RANDOM_ACOL_INIT_HPP
@@ -17,9 +13,11 @@ namespace mlpack {
namespace amf {
/**
- * This class initializes the W matrix of the NMF algorithm by averaging p
+ * This class initializes the W matrix of the AMF algorithm by averaging p
* randomly chosen columns of V. In this case, p is a template parameter. H is
- * then set randomly.
+ * then set randomly This simple initialization is performed by the random
+ * Acol initialization introduced in the paper 'Algorithms, Initializations and
+ * Convergence' by Langville et al.
*
* @tparam The number of random columns to average for each column of W.
*/
diff --git a/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp b/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp
index d3d2077..44cf852 100644
--- a/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp
+++ b/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp
@@ -24,7 +24,11 @@ template <class TerminationPolicy>
class CompleteIncrementalTermination
{
public:
- //! empty constructor
+ /**
+ * Empty constructor
+ *
+ * @param t_policy object of wrapped class.
+ */
CompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy())
: t_policy(t_policy) {}
diff --git a/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp b/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp
index a6cd415..519afd5 100644
--- a/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp
+++ b/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp
@@ -23,7 +23,11 @@ template <class TerminationPolicy>
class IncompleteIncrementalTermination
{
public:
- //! empty constructor
+ /**
+ * Empty constructor
+ *
+ * @param t_policy object of wrapped class.
+ */
IncompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy())
: t_policy(t_policy) {}
diff --git a/src/mlpack/methods/amf/update_rules/nmf_als.hpp b/src/mlpack/methods/amf/update_rules/nmf_als.hpp
index a3269b4..ea96630 100644
--- a/src/mlpack/methods/amf/update_rules/nmf_als.hpp
+++ b/src/mlpack/methods/amf/update_rules/nmf_als.hpp
@@ -2,13 +2,7 @@
* @file nmf_als.hpp
* @author Mohan Rajendran
*
- * Update rules for the Non-negative Matrix Factorization. This follows a method
- * titled 'Alternating Least Squares' described in the paper 'Positive Matrix
- * Factorization: A Non-negative Factor Model with Optimal Utilization of
- * Error Estimates of Data Values' by P. Paatero and U. Tapper. It uses least
- * squares projection formula to reduce the error value of
- * \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ by alternately calculating W and H
- * respectively while holding the other matrix constant.
+ * Update rules for the Non-negative Matrix Factorization.
*/
#ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_ALS_HPP
#define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_ALS_HPP
@@ -19,12 +13,17 @@ namespace mlpack {
namespace amf {
/**
- * The alternating least square update rules of matrices W and H.
+ * This class implements a method titled 'Alternating Least Squares' described
+ * in the paper 'Positive Matrix Factorization: A Non-negative Factor Model with
+ * Optimal Utilization of Error Estimates of Data Values' by P Paatero and
+ * U Tapper. It uses least squares projection formula to reduce the error
+ * value of \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ by alternately calculating W
+ * and H respectively while holding the other matrix constant.
*/
class NMFALSUpdate
{
public:
- // Empty constructor required for the UpdateRule template.
+ //! Empty constructor required for the UpdateRule template.
NMFALSUpdate() { }
template<typename MatType>
@@ -93,7 +92,7 @@ class NMFALSUpdate
}
}
}
-};
+}; // class NMFALSUpdate
}; // namespace amf
}; // namespace mlpack
diff --git a/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp b/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp
index f29efe4..7fb9c97 100644
--- a/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp
+++ b/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp
@@ -2,12 +2,7 @@
* @file nmf_mult_dist.hpp
* @author Mohan Rajendran
*
- * Update rules for the Non-negative Matrix Factorization. This follows a method
- * described in the paper 'Algorithms for Non-negative Matrix Factorization'
- * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures
- * that the Frobenius norm \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ is
- * non-increasing between subsequent iterations. Both of the update rules
- * for W and H are defined in this file.
+ * Update rules for the Non-negative Matrix Factorization.
*/
#ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIST_UPDATE_RULES_HPP
#define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIST_UPDATE_RULES_HPP
@@ -18,12 +13,17 @@ namespace mlpack {
namespace amf {
/**
- * The multiplicative distance update rules for matrices W and H.
+ * The multiplicative distance update rules for matrices W and H. This follows
+ * a method described in the paper 'Algorithms for Non-negative Matrix Factorization'
+ * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures
+ * that the Frobenius norm \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ is
+ * non-increasing between subsequent iterations. Both of the update rules
+ * for W and H are defined in this file.
*/
class NMFMultiplicativeDistanceUpdate
{
public:
- // Empty constructor required for the UpdateRule template.
+ // Empty constructor required for the UpdateRule template.
NMFMultiplicativeDistanceUpdate() { }
template<typename MatType>
diff --git a/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp b/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp
index 3b17f13..24600b2 100644
--- a/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp
+++ b/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp
@@ -2,17 +2,7 @@
* @file mult_div_update_rules.hpp
* @author Mohan Rajendran
*
- * Update rules for the Non-negative Matrix Factorization. This follows a method
- * described in the paper 'Algorithms for Non-negative Matrix Factorization'
- * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures
- * that the Kullback–Leibler divergence
- * \f$ \sum_i \sum_j (V_{ij} log\frac{V_{ij}}{(WH)_{ij}}-V_{ij}+(WH)_{ij}) \f$is
- * non-increasing between subsequent iterations. Both of the update rules
- * for W and H are defined in this file.
- *
- * This set of update rules is not meant to work with sparse matrices. Using
- * sparse matrices often causes NaNs in the output, so other choices of update
- * rules are better in that situation.
+ * Update rules for the Non-negative Matrix Factorization.
*/
#ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIV_HPP
#define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIV_HPP
@@ -22,6 +12,18 @@
namespace mlpack {
namespace amf {
+/**
+ * This follows a method described in the paper 'Algorithms for Non-negative
+ * Matrix Factorization' by D. D. Lee and H. S. Seung. This is a multiplicative
+ * rule that ensures that the Kullback–Leibler divergence
+ * \f$ \sum_i \sum_j (V_{ij} log\frac{V_{ij}}{(WH)_{ij}}-V_{ij}+(WH)_{ij}) \f$
+ * is non-increasing between subsequent iterations. Both of the update rules
+ * for W and H are defined in this file.
+ *
+ * This set of update rules is not meant to work with sparse matrices. Using
+ * sparse matrices often causes NaNs in the output, so other choices of update
+ * rules are better in that situation.
+ */
class NMFMultiplicativeDivergenceUpdate
{
public:
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-science/packages/mlpack.git
More information about the debian-science-commits
mailing list